4 18

Linfeng Song

freesunshine0316

https://freesunshine0316.github.io/

AI & ML interests

Researcher @Tencent AI Lab working on reasoning and RLAIF with LLM, especially search + RL. Working on NLP since 2010.

Recent Activity

upvoted a paper 10 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

upvoted a paper about 1 month ago

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

upvoted a paper about 1 month ago

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

View all activity

Organizations

upvoted a paper 10 days ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published 11 days ago • 18

upvoted 4 papers about 1 month ago

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Paper • 2510.01444 • Published Oct 1 • 19

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2 • 26

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published Sep 23 • 67

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Paper • 2509.15194 • Published Sep 18 • 33

upvoted 2 papers about 2 months ago

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Paper • 2509.12603 • Published Sep 16 • 9

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Paper • 2509.09675 • Published Sep 11 • 28

upvoted 2 papers 3 months ago

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1 • 91

UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26 • 14

upvoted a paper 4 months ago

Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

Paper • 2507.06804 • Published Jul 7 • 15

upvoted a paper 6 months ago

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

Paper • 2505.10962 • Published May 16 • 8

upvoted a paper 7 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 23

upvoted a paper 10 months ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 41

upvoted a paper about 1 year ago

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Paper • 2410.06508 • Published Oct 9, 2024 • 11

upvoted 4 papers over 1 year ago

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Linfeng Song

AI & ML interests

Recent Activity

Organizations

freesunshine0316's activity