Kazuki1450/Qwen3-1.7B-Base_rgym_chain_sum_1p0_0p0_1p0_grpo_42_MixupStrategies.UNIFORM Text Generation • 2B • Updated 2 days ago • 60
Kazuki1450/Qwen2.5-3B_lightr1_stage1_0p75_0p25_1p0_grpo Text Generation • 3B • Updated 6 days ago • 59
Kazuki1450/Qwen2.5-3B_lightr1_stage1_1p0_0p25_1p0_grpo Text Generation • 3B • Updated 6 days ago • 62
Kazuki1450/Qwen2.5-3B_lightr1_stage1_0p75_0p0_1p0_grpo Text Generation • 3B • Updated 6 days ago • 54
Kazuki1450/Qwen2.5-1.5B-Instruct_rgym-chain-sum_1p0_0p0_1p0_grpo Text Generation • 2B • Updated 6 days ago • 66