ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning Paper • 2509.21070 • Published Sep 25 • 9
Residual Off-Policy RL for Finetuning Behavior Cloning Policies Paper • 2509.19301 • Published Sep 23 • 18
CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling Paper • 2509.21114 • Published Sep 25 • 15
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24 • 117