RL Compositionality - a weizechen Collection

weizechen 's Collections

RL Compositionality

RL Compositionality

updated 15 days ago

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones. https://huggingface.co/papers/2509.25123

weizechen/RL-Compositionality-Stage1-RFT-Data

Viewer • Updated 15 days ago • 118k • 49

Note Stage-1 RFT data from Llama 3.1 8b Instruct. We use this data to fine-tune Llama 3.1 8b Instruct.
weizechen/RL-Compositionality-Stage2-RL-Level1-TrainData

Viewer • Updated 15 days ago • 500k • 26
weizechen/RL-Compositionality-Stage2-RL-Level2-TrainData

Viewer • Updated 15 days ago • 500k • 28
weizechen/RL-Compositionality-Stage2-RL-Level8-TestData

Viewer • Updated 15 days ago • 2.05k • 25

Note The test data containing our string task problem ranges from Level-1 to Level-8. Not intended for training.
weizechen/RL-Compositionality-Stage-1-Model

8B • Updated 15 days ago • 23

Note The model after Stage 1 training. It is also the base model for Stage 2 for Section 4.1 and Section 4.2.