Batch Size

by winglian - opened Jun 10

Jun 10

The model card lists: Batch Size (effective): 32 (8B), 128 (70B), 256(405B), but are 8B and 70B reversed?

the 70B config seems like it should be 16 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_70b.yaml lists

per_device_train_batch_size: 1
gradient_accumulation_steps: 2 # designed for 8 GPUs, so batch size 128

whereas the 8B should be 128 @ https://github.com/allenai/open-instruct/blob/main/configs/train_configs/tulu3/tulu3_dpo_8b.yaml lists

per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # designed for 8 GPUs, so batch size 128

hamishivi

Ai2 org Jun 11

Ah, good catch! Looking at the original runs for the 8B and 70B models, they should both be 128. The 70b yaml should say designed for 8 nodes (so effective bsz of 882 = 128)

hamishivi changed discussion status to closed Jun 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment