SII-Yibin Wang's picture

SII-Yibin Wang

CodeGoat24

·

https://codegoat24.github.io/

CodeGoat24

AI & ML interests

I'm part of Shanghai Innovation Institute, focusing on Multimodal RL and Generation.

Recent Activity

updated a model 2 days ago

CodeGoat24/UnifiedReward-Edit-qwen-72b

updated a model 2 days ago

CodeGoat24/UnifiedReward-7b-v1.5

updated a model 2 days ago

CodeGoat24/UnifiedReward-7b

View all activity

Organizations

upvoted 2 papers 10 days ago

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

Paper • 2510.17722 • Published 12 days ago • 18

UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation

Paper • 2510.18701 • Published 11 days ago • 66

upvoted a paper 16 days ago

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Paper • 2510.10201 • Published 21 days ago • 35

upvoted 2 papers 23 days ago

G^2RPO: Granular GRPO for Precise Reward in Flow Models

Paper • 2510.01982 • Published about 1 month ago • 5

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published 25 days ago • 52

upvoted 2 papers about 1 month ago

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22 • 4

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

Paper • 2509.22647 • Published Sep 26 • 31

upvoted a collection about 2 months ago

UnifiedReward 2.0 Models

14 items • Updated 2 days ago • 1

upvoted a collection 2 months ago

Pref-GRPO & UniGenBench

6 items • Updated 8 days ago • 1

upvoted a paper 2 months ago

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89

upvoted 2 papers 3 months ago

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6 • 52

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1 • 62

upvoted a collection 5 months ago

UnifiedReward 1.0 Qwen Models GGUF

9 items • Updated Sep 3 • 2

upvoted a paper 5 months ago

GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization

Paper • 2506.07160 • Published Jun 8 • 3

upvoted 3 papers 6 months ago

Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

Paper • 2505.02686 • Published May 5 • 16

MagicFace: Training-free Universal-Style Human Image Customized Synthesis

Paper • 2408.07433 • Published Aug 14, 2024 • 1

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6 • 93

upvoted 2 collections 7 months ago

UnifiedReward 1.0 Qwen Models

6 items • Updated Sep 3 • 10

UnifiedReward Training Data

14 items • Updated 4 days ago • 6

upvoted a paper 7 months ago

DreamText: High Fidelity Scene Text Synthesis

Paper • 2405.14701 • Published May 23, 2024 • 1