Visual Document Retrieval
PEFT
Safetensors

All OmniEmbed examples appear to be broken

#2
by nurban96 - opened

Using transformers version 4.56.2 and torch version 2.8.0 the examples for OmniEmbed from the model card seem to be broken. I am getting ´ValueError: Videos features and imagve tokens do not match: tokens: 0, features 16744´ for video retrieval and a similar error for the image document retrieval example.

Which version of transformers was originally used to demonstrate functionality? This may be related to known issues concerning the Qwen base-model, see https://github.com/QwenLM/Qwen3-VL/issues/556, but this is already known for a year so I am not sure.

Nevermind, found the issue. texts = processor.apply_chat_template(message, tokenize=False, add_generation_prompt=True) now returns a string (maybe it returned a list in earlier versions?), so slicing it with [0] always makes texts equal to <<|endoftext|>" as only the first '<' from the original string is appended to the EOS string.

Sign up or log in to comment