22 29 31

Rui Yang

Ray2333

https://yangrui2015.github.io

YangRui2015

AI & ML interests

Deep Reinforcement Learning

Recent Activity

commented on a paper 2 days ago

MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

authored a paper 19 days ago

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

upvoted a paper 19 days ago

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

View all activity

Organizations

upvoted 2 papers 19 days ago

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Paper • 2510.11769 • Published 21 days ago • 25

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published 20 days ago • 26

upvoted a paper about 1 month ago

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 53

upvoted a paper about 2 months ago

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 21

upvoted a paper 3 months ago

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Paper • 2507.13158 • Published Jul 17 • 23

upvoted a paper 4 months ago

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

Paper • 2506.18945 • Published Jun 23 • 40

upvoted 5 papers 5 months ago

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3 • 52

AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models

Paper • 2505.22662 • Published May 28 • 6

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30 • 15

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97

upvoted a paper 6 months ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5 • 25

upvoted 2 papers 7 months ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published Apr 17 • 93

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

Paper • 2503.18769 • Published Mar 24 • 11

upvoted 4 papers 8 months ago

upvoted 2 papers 9 months ago

Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Paper • 2502.13131 • Published Feb 18 • 37

Rui Yang

AI & ML interests

Recent Activity

Organizations

Ray2333's activity