18 29 36

Kaiyan Zhang

iseesaw

https://iseesaw.github.io/

AI & ML interests

Large Reasoning Models, Reinforcement Learning, Agent

Recent Activity

authored a paper about 1 month ago

FlowRL: Matching Reward Distributions for LLM Reasoning

upvoted a paper about 1 month ago

FlowRL: Matching Reward Distributions for LLM Reasoning

upvoted a collection about 1 month ago

DeepSeek-V3.2

View all activity

Organizations

upvoted a paper about 1 month ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 111

upvoted a collection about 1 month ago

DeepSeek-V3.2

Collection

2 items • Updated Sep 29 • 441

upvoted a paper about 1 month ago

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Paper • 2509.16117 • Published Sep 19 • 21

upvoted 4 papers about 2 months ago

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2 • 123

upvoted a paper 3 months ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 94

upvoted 2 papers 5 months ago

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 130

upvoted a paper 6 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

upvoted a paper 7 months ago

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 62

upvoted a collection 7 months ago

Gemma 3 Release

Collection

28 items • Updated Aug 11 • 522

upvoted a paper 7 months ago

Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24 • 90

upvoted a paper 8 months ago

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14 • 28

upvoted an article 8 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 295

upvoted a collection 8 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated Jul 21 • 347

upvoted a paper 9 months ago

Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18

upvoted 2 articles 9 months ago

Article

Our Transformers Code Agent beats the GAIA benchmark!

Jul 1, 2024

• 98

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.31k

Kaiyan Zhang

AI & ML interests

Recent Activity

Organizations

iseesaw's activity

Open R1: Update #3

Our Transformers Code Agent beats the GAIA benchmark!

Open-source DeepResearch – Freeing our search agents