Spaces:

TIGER-Lab
/

MMEB-Leaderboard

Running

Typo in IFM-TTE-7B results for ViDoRe-V2 under Visual Doc

#73

by Hrant - opened 9 days ago

9 days ago

IFM-TTE-7B results for ViDoRe-V2 under Visual Doc are reported to be 71.5, while in the paper (Table 2, page 7) the score is reported to be 63.6 under VDRv2. The number also corresponds to the average of individual scores reported in Table 13 in the Appendix, page 19.

ziyjiang

TIGER-Lab org 8 days ago

@Hrant
Thanks for pointing it out! I’ve pinged their author who added this model here: https://huggingface.co/spaces/TIGER-Lab/MMEB-Leaderboard/discussions/69

haoyubu

7 days ago

Hi @Hrant , we'd like to clarify that the scores reported on the leaderboard are for IFM-TTE-7B, which is different from the TTE model in the paper. We incorporated additional techniques as described here to further boost the performance over the TTE model.

Hrant

7 days ago

hi @haoyubu thanks for the clarification. makes sense.

do you plan to update the paper and include the new model as well? if not, mind summarizing which techniques exactly drove the main boost wrt vanilla TTE?? the diff. is huge on VDRv2 and Video especially.

kekekeke

1 day ago

hi @haoyubu @ziyjiang
IFM-TTE-7B demonstrates outstanding overall performance on the visdoc task. I noticed that its scores on several datasets are significantly better than other models:

ViDoRe_esg_reports_human_labeled_v2: +21%
ViDoRe_esg_reports_v2_multilingual: +22%
VisRAG_PlotQA: +19%
ViDoSeek-page: +27%
MMLongBench-page: +12%

I found that these datasets all have many additional corpus-ids that do not appear in the qrels. However, according to the official evaluation script, these additional corpus-ids should also be added to the candidate set as negative samples. I want to confirm whether IFM-TTE-7B only used all the corpus-ids from the qrels as the candidate set, and did not use the additional corpus-ids from the corpus?

ziyjiang

TIGER-Lab org about 23 hours ago

Hi @kekekeke , thanks for raising this! From the VLM2Vec/MMEB side, I can confirm that these additional corpus_ids are included in the candidate set during evaluation.(https://github.com/TIGER-AI-Lab/VLM2Vec/blob/main/src/data/eval_dataset/vidore_dataset.py#L59) I think IFM-TTE-7B follows the same approach, but I’ll leave the final confirmation to the authors of that paper.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment