Post
452
Training long-context LLMs is getting easier!
TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly π
Combine TRL and accelerate, and you can run it effortlessly!
With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM π
Check out the full guide here π https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism
If you want to learn more about Context Parallelism, check out the Ultrascale Playbook π nanotron/ultrascale-playbook
TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly π
Combine TRL and accelerate, and you can run it effortlessly!
With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM π
Check out the full guide here π https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism
If you want to learn more about Context Parallelism, check out the Ultrascale Playbook π nanotron/ultrascale-playbook