Not enough parameters in 4b config.json

#14

by krammnic - opened Mar 12

Discussion

krammnic

Mar 12

Hello there!

We are adding Gemma3 to the torchtune: https://github.com/pytorch/torchtune/pull/2485

Unfortunately, it seems to me that there are not enough parameters in config.json the 4b model.

For instance, let's compare 4B "text_config" and 12B "text_config"

4B:


"text_config": {

    "hidden_size": 2560,

    "intermediate_size": 10240,

    "model_type": "gemma3_text",

    "num_hidden_layers": 34,

    "rope_scaling": {

      "factor": 8.0,

      "rope_type": "linear"

    },

    "sliding_window": 1024

  },

12B


"text_config": {

    "hidden_size": 3840,

    "intermediate_size": 15360,

    "model_type": "gemma3_text",

    "num_attention_heads": 16,

    "num_hidden_layers": 48,

    "num_key_value_heads": 8,

    "rope_scaling": {

      "factor": 8.0,

      "rope_type": "linear"

    },

    "sliding_window": 1024

  },

If it is a desired config for some reason, please let me know. Unfortunately, it makes the integration less clean, as this information is required on the converting stage.

Thanks!

yuimo

Mar 13

same question about this

BalakrishnaCh

Google org Sep 17

Hi,

Apologies for the late reply, these Gemma models are in different variants and in different parameters sizes (1b, 2b, 4b..etc). Based on the model requirements and architectural design these default parameters are configured in best suitable manner.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment