Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers Paper • 2506.14702 • Published Jun 17 • 3
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 267
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers Paper • 2507.20527 • Published Jul 28 • 5
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28 • 81
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26 • 36