@sergiopaniego on Hugging Face: "Training long-context LLMs is getting easier! TRL now supports Context…"

Post

452

Training long-context LLMs is getting easier!

TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly 💆
Combine TRL and accelerate, and you can run it effortlessly!

With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM 📈

Check out the full guide here 👉 https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism

If you want to learn more about Context Parallelism, check out the Ultrascale Playbook 👉 nanotron/ultrascale-playbook

Join the conversation