Not enough parameters in 4b config.json
#14
by
krammnic
- opened
Hello there!
We are adding Gemma3 to the torchtune: https://github.com/pytorch/torchtune/pull/2485
Unfortunately, it seems to me that there are not enough parameters in config.json the 4b model.
For instance, let's compare 4B "text_config" and 12B "text_config"
4B:
"text_config": {
"hidden_size": 2560,
"intermediate_size": 10240,
"model_type": "gemma3_text",
"num_hidden_layers": 34,
"rope_scaling": {
"factor": 8.0,
"rope_type": "linear"
},
"sliding_window": 1024
},
12B
"text_config": {
"hidden_size": 3840,
"intermediate_size": 15360,
"model_type": "gemma3_text",
"num_attention_heads": 16,
"num_hidden_layers": 48,
"num_key_value_heads": 8,
"rope_scaling": {
"factor": 8.0,
"rope_type": "linear"
},
"sliding_window": 1024
},
If it is a desired config for some reason, please let me know. Unfortunately, it makes the integration less clean, as this information is required on the converting stage.
Thanks!
same question about this
Hi,
Apologies for the late reply, these Gemma models are in different variants and in different parameters sizes (1b, 2b, 4b..etc). Based on the model requirements and architectural design these default parameters are configured in best suitable manner.
Thanks.