bluebench

Running

jbnayahu commited on Jul 2

Commit

0650525

unverified ·

1 Parent(s): bc7425f

Updated execution guide

Signed-off-by: Jonathan Bnayahu <bnayahu@il.ibm.com>

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -89,7 +89,8 @@ To reproduce our results, here is the commands you can run:
 ```
 pip install unitxt[bluebench]
-unitxt-evaluate --tasks "benchmarks.bluebench" --model cross_provider --model_args "model_name=$MODEL_TO_EVALUATE_IN_LITELLM_FORMAT,max_tokens=256" --output_path ./results/bluebench --log_samples --trust_remote_code --batch_size 8
 ```
 """

 ```
 pip install unitxt[bluebench]
+unitxt-evaluate --tasks "benchmarks.bluebench" --model cross_provider --model_args "model_name=MODEL_TO_EVALUATE_IN_LITELLM_FORMAT,max_tokens=1024" --output_path ./results/bluebench --log_samples --trust_remote_code --batch_size 8
+unitxt-summarize ./results/bluebench
 ```
 """