A dataset and RL-zero pipeline for advanced mathematical reasoning of informal theorem proving.
Jiahao Xu
Jiahao004
AI & ML interests
Sentence Emebddings; Neural Machine Translation
Organizations
models
13
Jiahao004/agentllm_SFT-template-3_1_qwen-train-Qwen3-8B-1e-5LR_best
Text Generation
•
308k
•
Updated
•
14
Jiahao004/SFT-agentllm-template2-train4-Qwen3-0.6B-1e-6LR-3Epochs-32768Tokens-1BS-think-step-by-step
0.6B
•
Updated
Jiahao004/SFT-agentllm-template2-train3-1example-Qwen3-0.6B-1e-5LR-50Epochs-checkpoint-50
0.6B
•
Updated
Jiahao004/SFT-agentllm-template1-train2-Qwen3-0.6B-1e-5LR-50Epochs-32768Tokens-1BS-think-step-by-step
Text Generation
•
0.6B
•
Updated
Jiahao004/SFT-agentllm-template1-Qwen3-0.6B-5e-5LR-3Epochs-32768Tokens-1BS-think-step-by-step
0.6B
•
Updated
•
4
Jiahao004/agentllm-SFT-baseline-Qwen3-8B-5e-5LR-3Epochs
308k
•
Updated
Jiahao004/SFT-agentllm-template1-Qwen3-8B-5e-5LR-3Epochs-32768Tokens-1BS-think-step-by-step
8B
•
Updated
Jiahao004/SFT-agentllm-template1-Qwen3-8B-5e-5LR-3Epochs-32768Tokens
8B
•
Updated
•
1
Jiahao004/test
Updated
Jiahao004/SFT-agentllm-template1-Qwen3-0.6B-5e-5LR-3Epochs-32768Tokens-1BS-1GA-flash-attn2-8GPUs-1Nodes
0.6B
•
Updated