Spaces:
Running
Running
| # Pipeline Parallelism Emulation | |
| This project provides tools for emulating and visualizing pipeline parallelism strategies used in large language model training. | |
| ## Overview | |
| Pipeline parallelism is a technique used to train large models by partitioning the model across multiple devices and processing data in a pipelined fashion. This project allows you to: | |
| - Simulate different pipeline parallelism strategies (1F1B, Interleaved) | |
| - Visualize the execution schedule on multiple devices | |
| - Compare different strategies for efficiency | |
| ## Features | |
| - Supported Pipeline Stragegies: | |
| - 1F1B | |
| - Interleaved 1F1B | |
| - Visualization: | |
| - Interactive visualization dashboard using Plotly/Dash | |
| - Config: | |
| - Configurable simulation parameters through Hydra | |
| - Each stage | |
| ## Installation | |
| This project uses [uv](https://github.com/astral-sh/uv) for dependency management. | |
| Setup `uv` if not installed in your computer: | |
| ``` | |
| # On macOS and Linux. | |
| curl -LsSf https://astral.sh/uv/install.sh | sh | |
| ``` | |
| ## Usage | |
| Running for 1F1B strategy: | |
| ```bash | |
| uv run python main.py strategy=1f1b num_devices=4 num_stages=4 num_batches=8 | |
| ``` | |
|  | |
| Running for interleave strategy: | |
| ```bash | |
| uv run python main.py strategy=interleave num_devices=4 num_stages=8 num_batches=8 | |
| ``` | |
|  | |
| Running for ZB-1P strategy: | |
| ```bash | |
| uv run python main.py strategy=zb1p num_devices=4 num_stages=4 num_batches=8 | |
| ``` | |
|  | |
| Running for 1F1B-batch-overlap strategy: | |
| ```bash | |
| uv run python main.py strategy=1f1b_overlap num_devices=4 num_stages=4 num_batches=8 | |
| ``` | |
|  | |
| Running for 1F1B-interleave-overlap strategy: | |
| ```bash | |
| uv run python main.py strategy=1f1b_interleave_overlap num_devices=4 num_stages=8 num_batches=8 | |
| ``` | |
|  | |
| ## Configuration | |
| The default configuration is in `conf/config.yaml`. You can override any parameter on the command line or create configuration groups for different scenarios. | |
| ### Using Different Configuration Files | |
| You can use different configuration files with Hydra in several ways: | |
| #### Recommended Approach | |
| 1. Create multiple configuration files in the `conf` directory for different use cases: | |
| ``` | |
| conf/ | |
| โโโ config.yaml # Default configuration | |
| โโโ model_A.yaml # Create your own config with stage-specific latency for performance projection. | |
| ``` | |
| 2. Run with your desired configuration using the `--config-name` flag: | |
| ```bash | |
| uv run python main.py --config-name=model_A | |
| ``` | |
| #### Override Specific Parameters | |
| You can also override specific parameters at runtime: | |
| ```bash | |
| uv run python main.py op_times.forward=0.5 op_times.backward=1.0 num_batches=6 | |
| ``` | |
| ## Project Structure | |
| ``` | |
| PP-Emulation/ | |
| โโโ conf/ # Hydra configuration files | |
| โ โโโ config.yaml # Default configuration | |
| โโโ src/ # Source code | |
| โ โโโ __init__.py # Package initialization | |
| โ โโโ execution_model.py # Schedule execution models | |
| โ โโโ strategies.py # Pipeline parallelism strategies | |
| โ โโโ visualizer.py # Visualization utilities | |
| โโโ main.py # Main entry point | |
| โโโ pyproject.toml # Project metadata and dependencies | |
| โโโ README.md # This file | |
| ``` | |
| ## Refences | |
| 1. _PipeDream: Fast and Efficient Pipeline Parallel DNN Training_. [arxiv](https://arxiv.org/abs/1806.03377) | |
| 2. _Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM_. [arxiv](https://arxiv.org/abs/2104.04473) | |
| 3. _Zero Bubble Pipeline Parallelism_ [arxiv](https://arxiv.org/abs/2401.10241) | |
| 4. ๅบไบ1F1B็MoE A2A้ไฟก่ฎก็ฎOverlap [blog](https://zhuanlan.zhihu.com/p/28463368206) | |
| ## License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## Contributing | |
| Contributions are welcome! Please feel free to submit a Pull Request. |