Great model, but issue with multiple images in chat of different sizes
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]
When I use this with multiple user and assistant messages, I get the following error:
RuntimeError: Sizes of tensors must match except in dimension 0.
Expected size 756 but got size 1288 for tensor number 1 in the list.
Additional error:
ValueError: Image features and image tokens do not match:
expected 5184 tokens but got 11648 tokens
How can I solve this issue correctly so the model produces the best possible output?
so i use huggingface
I am also using mistral-common, but for some reason the preprocessing does not work.
and the installation from: https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506#transformers
Using one image works fine, and two images also work as long as they have the same size. However, I would like to use multiple images of different sizes without running into errors. How can this be done with the transformers lib?
do i miss something?