Great model, but issue with multiple images in chat of different sizes

#32
by digitalesingulary - opened

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]

When I use this with multiple user and assistant messages, I get the following error:

RuntimeError: Sizes of tensors must match except in dimension 0.
Expected size 756 but got size 1288 for tensor number 1 in the list.

Additional error:

ValueError: Image features and image tokens do not match:
expected 5184 tokens but got 11648 tokens

How can I solve this issue correctly so the model produces the best possible output?

so i use huggingface

I am also using mistral-common, but for some reason the preprocessing does not work.
and the installation from: https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506#transformers

Using one image works fine, and two images also work as long as they have the same size. However, I would like to use multiple images of different sizes without running into errors. How can this be done with the transformers lib?

do i miss something?

Sign up or log in to comment