open_pl_llm_leaderboard

Runtime error

djstrong commited on Jun 13, 2024

Commit

26d544d

1 Parent(s): c7cf816

description update

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -73,11 +73,7 @@ Almost every task has two versions: regex and multiple choice.
 * _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
 * _mc suffix means that a model is scored against every possible class (suitable also for base models)
-Average columns are normalized against scores by "Baseline (majority class)". Tasks taken into account while calculating averages:
-* Average: {', '.join(all_tasks)}
-* Avg g: {', '.join(g_tasks)}
-* Avg mc: {', '.join(mc_tasks)}
-* Acg RAG: {', '.join(rag_tasks)}
 * `,chat` suffix means that a model is tested using chat templates
 * `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
@@ -102,6 +98,12 @@ or join our [Discord SpeakLeash](https://discord.gg/FfYp4V6y3R)
 ## Tasks
 | Task                            | Dataset                               | Metric    | Type            |
 |---------------------------------|---------------------------------------|-----------|-----------------|
 | polemo2_in                      | allegro/klej-polemo2-in               | accuracy  | generate_until  |

 * _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
 * _mc suffix means that a model is scored against every possible class (suitable also for base models)
+Average columns are normalized against scores by "Baseline (majority class)".
 * `,chat` suffix means that a model is tested using chat templates
 * `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
 ## Tasks
+Tasks taken into account while calculating averages:
+* Average: {', '.join(all_tasks)}
+* Avg g: {', '.join(g_tasks)}
+* Avg mc: {', '.join(mc_tasks)}
+* Avg RAG: {', '.join(rag_tasks)}
 | Task                            | Dataset                               | Metric    | Type            |
 |---------------------------------|---------------------------------------|-----------|-----------------|
 | polemo2_in                      | allegro/klej-polemo2-in               | accuracy  | generate_until  |