weizechen/RL-Compositionality-Stage1-RFT-Data
Viewer
•
Updated
•
118k
•
49
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones. https://huggingface.co/papers/2509.25123
Note Stage-1 RFT data from Llama 3.1 8b Instruct. We use this data to fine-tune Llama 3.1 8b Instruct.
Note The test data containing our string task problem ranges from Level-1 to Level-8. Not intended for training.
Note The model after Stage 1 training. It is also the base model for Stage 2 for Section 4.1 and Section 4.2.