GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published 21 days ago • 25
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Paper • 2510.12693 • Published 20 days ago • 26
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30 • 53
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 21
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 23
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published Jun 23 • 40
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published May 28 • 6
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning Paper • 2505.24846 • Published May 30 • 15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 138
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 97
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5 • 25
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17 • 93
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning Paper • 2503.18769 • Published Mar 24 • 11
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 56
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published Feb 27 • 46
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 37