Commit 
							
							·
						
						8c53520
	
1
								Parent(s):
							
							4cb792b
								
Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -217,25 +217,25 @@ The performance may vary depending on the prompt. For BLOOMZ models, we recommen | |
| 217 |  | 
| 218 | 
             
            ## Model
         | 
| 219 |  | 
| 220 | 
            -
            - Architecture | 
| 221 | 
            -
            - Finetuning steps | 
| 222 | 
            -
            - Finetuning tokens | 
| 223 | 
            -
            - Finetuning layout | 
| 224 | 
            -
            - Precision | 
| 225 |  | 
| 226 | 
             
            ## Hardware
         | 
| 227 |  | 
| 228 | 
            -
            -  | 
| 229 | 
            -
            - 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links
         | 
| 230 | 
            -
            - NCCL-communications network | 
| 231 | 
            -
             | 
| 232 |  | 
| 233 | 
             
            ## Software
         | 
| 234 |  | 
| 235 | 
            -
            - [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
         | 
| 236 | 
            -
            - [DeepSpeed](https://github.com/microsoft/DeepSpeed) | 
| 237 | 
            -
            - [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
         | 
| 238 | 
            -
            - [apex](https://github.com/NVIDIA/apex)
         | 
| 239 |  | 
| 240 | 
             
            # Evaluation
         | 
| 241 |  | 
|  | |
| 217 |  | 
| 218 | 
             
            ## Model
         | 
| 219 |  | 
| 220 | 
            +
            - **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file
         | 
| 221 | 
            +
            - **Finetuning steps:** 498
         | 
| 222 | 
            +
            - **Finetuning tokens:** 2.09 billion
         | 
| 223 | 
            +
            - **Finetuning layout:** 72x pipeline parallel, 1x tensor parallel, 4x data parallel
         | 
| 224 | 
            +
            - **Precision:** bfloat16
         | 
| 225 |  | 
| 226 | 
             
            ## Hardware
         | 
| 227 |  | 
| 228 | 
            +
            - **CPUs:** AMD CPUs with 512GB memory per node
         | 
| 229 | 
            +
            - **GPUs:** 288 A100 80GB GPUs (36 nodes) with 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links
         | 
| 230 | 
            +
            - **Communication:** NCCL-communications network with a fully dedicated subnet
         | 
| 231 | 
            +
             | 
| 232 |  | 
| 233 | 
             
            ## Software
         | 
| 234 |  | 
| 235 | 
            +
            - **Orchestration:** [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
         | 
| 236 | 
            +
            - **Optimizer & parallelism:** [DeepSpeed](https://github.com/microsoft/DeepSpeed)
         | 
| 237 | 
            +
            - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
         | 
| 238 | 
            +
            - **FP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
         | 
| 239 |  | 
| 240 | 
             
            # Evaluation
         | 
| 241 |  | 

