bigscience
/

bloomz

@@ -217,25 +217,25 @@ The performance may vary depending on the prompt. For BLOOMZ models, we recommen
 ## Model
-- Architecture: Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file
-- Finetuning steps: 498
-- Finetuning tokens: 2.09 billion
-- Finetuning layout: 72x pipeline parallel, 1x tensor parallel, 4x data parallel
-- Precision: bfloat16
 ## Hardware
-- 288 A100 80GB GPUs (36 nodes)
-- 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links
-- NCCL-communications network: a fully dedicated subnet
-- AMD CPUs with 512GB memory per node
 ## Software
-- [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
-- [DeepSpeed](https://github.com/microsoft/DeepSpeed))
-- [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
-- [apex](https://github.com/NVIDIA/apex)
 # Evaluation

 ## Model
+- **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file
+- **Finetuning steps:** 498
+- **Finetuning tokens:** 2.09 billion
+- **Finetuning layout:** 72x pipeline parallel, 1x tensor parallel, 4x data parallel
+- **Precision:** bfloat16
 ## Hardware
+- **CPUs:** AMD CPUs with 512GB memory per node
+- **GPUs:** 288 A100 80GB GPUs (36 nodes) with 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links
+- **Communication:** NCCL-communications network with a fully dedicated subnet
 ## Software
+- **Orchestration:** [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
+- **Optimizer & parallelism:** [DeepSpeed](https://github.com/microsoft/DeepSpeed)
+- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
+- **FP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
 # Evaluation