BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper โข 2508.10975 โข Published Aug 14 โข 59