Longxu Dou's picture

Longxu Dou

dreamerdeo

·

https://longxudou.github.io/

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a paper 28 days ago

GEM: A Gym for Agentic LLMs

upvoted a paper about 1 month ago

Variational Reasoning for Language Models

upvoted a paper about 1 month ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

View all activity

Organizations

upvoted a paper 28 days ago

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1 • 87

upvoted 2 papers about 1 month ago

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26 • 68

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 67

upvoted a collection about 1 month ago

cwm

Collection for Code World Model, an agentic coding model from FAIR. • 3 items • Updated Sep 24 • 17

upvoted a paper 4 months ago

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 47

upvoted 3 papers 5 months ago

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 29

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 23

upvoted 2 papers 6 months ago

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19 • 45

upvoted an article 6 months ago

Article

Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick

By

•

Oct 24, 2024

• 13

upvoted 2 papers 6 months ago

Could Thinking Multilingually Empower LLM Reasoning?

Paper • 2504.11833 • Published Apr 16 • 29

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published Apr 21 • 47

upvoted a collection 7 months ago

NoisyRollout

8 items • Updated May 20 • 6

upvoted a paper 7 months ago

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published Apr 17 • 19

upvoted a collection 7 months ago

🚀 Active PRM

Efficient Process Reward Model Training via Active Learning. • 4 items • Updated Apr 16 • 3

upvoted a paper 7 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 56

upvoted a collection 7 months ago

🌾Oat-Zero: Understanding R1-Zero-Like Training

5 items • Updated Apr 10 • 7

upvoted a paper 7 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14 • 13

upvoted an article 8 months ago

Article

双流并行(DualPipe) 没有双流会更好

By

•

Feb 28

• 7