UFT: Unifying Supervised and Reinforcement Fine-Tuning
Mingyang Liu
liumy2010
AI & ML interests
None yet
Organizations
None yet
models
75
liumy2010/Qwen2.5-3B-math-UFT
Text Generation
•
3B
•
Updated
liumy2010/Qwen2.5-3B-math-SFT-RFT
Text Generation
•
3B
•
Updated
liumy2010/Qwen2.5-3B-math-SFT
Text Generation
•
3B
•
Updated
liumy2010/Qwen2.5-3B-math-RFT
Text Generation
•
3B
•
Updated
liumy2010/Qwen2.5-3B-math-R3
Text Generation
•
3B
•
Updated
•
2
liumy2010/Qwen2.5-3B-kk_logic-UFT
Text Generation
•
3B
•
Updated
•
1
liumy2010/Qwen2.5-3B-kk_logic-SFT-RFT
Text Generation
•
3B
•
Updated
liumy2010/Qwen2.5-3B-kk_logic-SFT
Text Generation
•
3B
•
Updated
•
1
liumy2010/Qwen2.5-3B-kk_logic-RFT
Text Generation
•
3B
•
Updated
liumy2010/Qwen2.5-3B-kk_logic-R3
Text Generation
•
3B
•
Updated