Typo in IFM-TTE-7B results for ViDoRe-V2 under Visual Doc

#73
by Hrant - opened

IFM-TTE-7B results for ViDoRe-V2 under Visual Doc are reported to be 71.5, while in the paper (Table 2, page 7) the score is reported to be 63.6 under VDRv2. The number also corresponds to the average of individual scores reported in Table 13 in the Appendix, page 19.

image

TIGER-Lab org

@Hrant
Thanks for pointing it out! I’ve pinged their author who added this model here: https://huggingface.co/spaces/TIGER-Lab/MMEB-Leaderboard/discussions/69

Hi @Hrant , we'd like to clarify that the scores reported on the leaderboard are for IFM-TTE-7B, which is different from the TTE model in the paper. We incorporated additional techniques as described here to further boost the performance over the TTE model.

hi @haoyubu thanks for the clarification. makes sense.

do you plan to update the paper and include the new model as well? if not, mind summarizing which techniques exactly drove the main boost wrt vanilla TTE?? the diff. is huge on VDRv2 and Video especially.

hi @haoyubu @ziyjiang
IFM-TTE-7B demonstrates outstanding overall performance on the visdoc task. I noticed that its scores on several datasets are significantly better than other models:

  • ViDoRe_esg_reports_human_labeled_v2: +21%
  • ViDoRe_esg_reports_v2_multilingual: +22%
  • VisRAG_PlotQA: +19%
  • ViDoSeek-page: +27%
  • MMLongBench-page: +12%

I found that these datasets all have many additional corpus-ids that do not appear in the qrels. However, according to the official evaluation script, these additional corpus-ids should also be added to the candidate set as negative samples. I want to confirm whether IFM-TTE-7B only used all the corpus-ids from the qrels as the candidate set, and did not use the additional corpus-ids from the corpus?

Hi @kekekeke , thanks for raising this! From the VLM2Vec/MMEB side, I can confirm that these additional corpus_ids are included in the candidate set during evaluation.(https://github.com/TIGER-AI-Lab/VLM2Vec/blob/main/src/data/eval_dataset/vidore_dataset.py#L59) I think IFM-TTE-7B follows the same approach, but I’ll leave the final confirmation to the authors of that paper.

Sign up or log in to comment