Does not works with bitsandbytes 4bit and 8bit

#27

by zokica - opened Mar 25

Mar 25

While 1B model works fine in 4bits, 4b model does not, why is that?

full precision answer:
#####################################
outputs ["user\nYou are a helpful assistant.\n\nWrite a poem on Hugging Face, the company\nmodel\nOkay, here's a poem about Hugging Face, aiming to capture its spirit and impact:\n\nThe Open Embrace\n\nIn realms of code, a vibrant hue,\nHugging Face emerges, fresh and new.\nNot just a name, a welcoming plea,\nFor AI’s future, wild"]
###################

8bit answer:
#############################
outputs ['user\nYou are a helpful assistant.\n\nWrite a poem on Hugging Face, the company\nmodel\nThisrvice gaក៏ forতি지만 Senhor noname συ᱕ Brain freeze not기는 объявitic fregataamataء rou अॅ⿻ffassoääntsmannetworks 둘이ई क्रिप्टोकर策划 deメリット\u200cی� भाgal gydant recovering the গঙ্গన్న आएगी भी Olméis सबके/ヶ月𝔦 मलया ഒ საერთ出しिष्ट সকলে齊,']
##################

from transformers import AutoTokenizer, BitsAndBytesConfig, Gemma3ForCausalLM,Gemma3ForConditionalGeneration
import torch

model_id = "google/gemma-3-4b-it"

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = Gemma3ForConditionalGeneration.from_pretrained(
model_id, quantization_config=quantization_config
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
[
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."},]
},
{
"role": "user",
"content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
},
],
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)#.to(torch.bfloat16)

print(223)

with torch.inference_mode():
outputs = model.generate(**inputs, max_new_tokens=64)

outputs = tokenizer.batch_decode(outputs)

print("outputs",outputs)

fgoricha

Aug 18

•

edited Aug 18

I am playing with the 4b today and I also found the same things. At 4 bit its gives what you got but full precision is no problem. Have you figured anything out?

8 bits was working for me when I updated bitsandbytes

BalakrishnaCh

Google org Aug 27

•

edited Aug 27

Hi @zokica ,

Apologies for the late reply, thanks for reaching out to us. I could able to done the 4bit quantization using bitsandbytes for the google/gemma-3-4b-it model. However I have made few changes to the quantization parameters and it's working fine without any issues and producing the output for the given prompt. Please find the following gist file for you reference. Please let us know if you required any further assistance.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment