Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sergiopaniegoΒ 
posted an update Sep 16
Post
452
Training long-context LLMs is getting easier!

TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly πŸ’†
Combine TRL and accelerate, and you can run it effortlessly!

With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM πŸ“ˆ

Check out the full guide here πŸ‘‰ https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism

If you want to learn more about Context Parallelism, check out the Ultrascale Playbook πŸ‘‰ nanotron/ultrascale-playbook
In this post