vllm (pretrained=/root/autodl-tmp/Impish_Magic_24B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.904	±	0.0187

vllm (pretrained=/root/autodl-tmp/Impish_Magic_24B,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7545	±	0.0034
- humanities	2	none	acc	↑	0.6782	±	0.0064
- other	2	none	acc	↑	0.8169	±	0.0067
- social sciences	2	none	acc	↑	0.8469	±	0.0064
- stem	2	none	acc	↑	0.7168	±	0.0077

vllm (pretrained=/root/autodl-tmp/90-512-2048-9999999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.908	±	0.0183
		strict-match	5	exact_match	↑	0.908	±	0.0183

vllm (pretrained=/root/autodl-tmp/90-512-2048-9999999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7527	±	0.0034
- humanities	2	none	acc	↑	0.6810	±	0.0064
- other	2	none	acc	↑	0.8072	±	0.0068
- social sciences	2	none	acc	↑	0.8463	±	0.0064
- stem	2	none	acc	↑	0.7146	±	0.0077