Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published 17 days ago • 55
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony Paper • 2510.11345 • Published 19 days ago • 15
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Paper • 2506.06122 • Published Jun 6 • 7
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published Feb 26 • 28
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 26
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Paper • 2411.07140 • Published Nov 11, 2024 • 35
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20, 2024 • 41
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20, 2024 • 41