Batch Size

#4
by winglian - opened

The model card lists: Batch Size (effective): 32 (8B), 128 (70B), 256(405B), but are 8B and 70B reversed?

the 70B config seems like it should be 16 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_70b.yaml lists

per_device_train_batch_size: 1
gradient_accumulation_steps: 2 # designed for 8 GPUs, so batch size 128

whereas the 8B should be 128 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_8b.yaml lists

per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # designed for 8 GPUs, so batch size 128
Ai2 org

Ah, good catch! Looking at the original runs for the 8B and 70B models, they should both be 128. The 70b yaml should say designed for 8 nodes (so effective bsz of 882 = 128)

hamishivi changed discussion status to closed

Sign up or log in to comment