Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published 20 days ago • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published 19 days ago • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 25 days ago • 45
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published 15 days ago • 51
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published 17 days ago • 35
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 15 days ago • 31
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 15 days ago • 168