Not enough parameters in 4b config.json

#14
by krammnic - opened

Hello there!

We are adding Gemma3 to the torchtune: https://github.com/pytorch/torchtune/pull/2485

Unfortunately, it seems to me that there are not enough parameters in config.json the 4b model.

For instance, let's compare 4B "text_config" and 12B "text_config"

4B:


"text_config": {

    "hidden_size": 2560,

    "intermediate_size": 10240,

    "model_type": "gemma3_text",

    "num_hidden_layers": 34,

    "rope_scaling": {

      "factor": 8.0,

      "rope_type": "linear"

    },

    "sliding_window": 1024

  },

12B


"text_config": {

    "hidden_size": 3840,

    "intermediate_size": 15360,

    "model_type": "gemma3_text",

    "num_attention_heads": 16,

    "num_hidden_layers": 48,

    "num_key_value_heads": 8,

    "rope_scaling": {

      "factor": 8.0,

      "rope_type": "linear"

    },

    "sliding_window": 1024

  },

If it is a desired config for some reason, please let me know. Unfortunately, it makes the integration less clean, as this information is required on the converting stage.

Thanks!

same question about this

Google org

Hi,

Apologies for the late reply, these Gemma models are in different variants and in different parameters sizes (1b, 2b, 4b..etc). Based on the model requirements and architectural design these default parameters are configured in best suitable manner.

Thanks.

Sign up or log in to comment