Spaces:
				
			
			
	
			
			
		Paused
		
	
	
	
			
			
	
	
	
	
		
		A newer version of the Gradio SDK is available:
									5.49.1
VLM
We follow InternVL2 to evaluate the performance on MME, MMBench, MMMU, MMVet, MathVista and MMVP.
Data prepration
Please follow the InternVL2 to prepare the corresponding data. And the link the data under vlm.
The final directory structure is:
data
βββ MathVista
βββ mmbench
βββ mme
βββ MMMU
βββ mm-vet
βββ MMVP
Evaluation
Directly run scripts/eval/run_eval_vlm.sh to evaluate different benchmarks. The output will be saved in $output_path.
- Set $model_pathand$output_pathfor the path for checkpoint and log.
- Increase GPUSif you want to run faster.
- For MMBench, please use the official evaluation server.
- For MMVet, please use the official evaluation server.
- For MathVista, please set $openai_api_keyinscripts/eval/run_eval_vlm.shandyour_api_urlineval/vlm/eval/mathvista/utilities.py. The default GPT version isgpt-4o-2024-11-20.
- For MMMU, we use CoT in the report, which improve the accuracy by about 2%. For evaluation of the oprn-ended answer, we use GPT-4o for judgement.
GenEval
We modify the code in GenEval for faster evaluation.
Setup
Install the following dependencies:
pip install open-clip-torch
pip install clip-benchmark
pip install --upgrade setuptools
sudo pip install -U openmim
sudo mim install mmengine mmcv-full==1.7.2
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection; git checkout 2.x
pip install -v -e .
Download Detector:
cd ./eval/gen/geneval
mkdir model
bash ./evaluation/download_models.sh ./model
Evaluation
Directly run scripts/eval/run_geneval.sh to evaluate GenEVAL. The output will be saved in $output_path.
- Set $model_pathand$output_pathfor the path for checkpoint and log.
- Set metadata_fileto./eval/gen/geneval/prompts/evaluation_metadata.jsonlfor original GenEval prompts.
WISE
We modify the code in WISE for faster evaluation.
Evaluation
Directly run scripts/eval/run_wise.sh to evaluate WISE. The output will be saved in $output_path.
- Set $model_pathand$output_pathfor the path for checkpoint and log.
- Set $openai_api_keyinscripts/eval/run_wise.shandyour_api_urlineval/gen/wise/gpt_eval_mp.py. The default GPT version isgpt-4o-2024-11-20.
- Use thinkfor thinking mode.
GEdit-Bench
Please follow GEdit-Bench for evaluation.
IntelligentBench
TBD
