Spaces:
Runtime error
Runtime error
description update
Browse files- src/about.py +7 -5
src/about.py
CHANGED
|
@@ -73,11 +73,7 @@ Almost every task has two versions: regex and multiple choice.
|
|
| 73 |
* _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
|
| 74 |
* _mc suffix means that a model is scored against every possible class (suitable also for base models)
|
| 75 |
|
| 76 |
-
Average columns are normalized against scores by "Baseline (majority class)".
|
| 77 |
-
* Average: {', '.join(all_tasks)}
|
| 78 |
-
* Avg g: {', '.join(g_tasks)}
|
| 79 |
-
* Avg mc: {', '.join(mc_tasks)}
|
| 80 |
-
* Acg RAG: {', '.join(rag_tasks)}
|
| 81 |
|
| 82 |
* `,chat` suffix means that a model is tested using chat templates
|
| 83 |
* `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
|
|
@@ -102,6 +98,12 @@ or join our [Discord SpeakLeash](https://discord.gg/FfYp4V6y3R)
|
|
| 102 |
|
| 103 |
## Tasks
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
| Task | Dataset | Metric | Type |
|
| 106 |
|---------------------------------|---------------------------------------|-----------|-----------------|
|
| 107 |
| polemo2_in | allegro/klej-polemo2-in | accuracy | generate_until |
|
|
|
|
| 73 |
* _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
|
| 74 |
* _mc suffix means that a model is scored against every possible class (suitable also for base models)
|
| 75 |
|
| 76 |
+
Average columns are normalized against scores by "Baseline (majority class)".
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
* `,chat` suffix means that a model is tested using chat templates
|
| 79 |
* `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
|
|
|
|
| 98 |
|
| 99 |
## Tasks
|
| 100 |
|
| 101 |
+
Tasks taken into account while calculating averages:
|
| 102 |
+
* Average: {', '.join(all_tasks)}
|
| 103 |
+
* Avg g: {', '.join(g_tasks)}
|
| 104 |
+
* Avg mc: {', '.join(mc_tasks)}
|
| 105 |
+
* Avg RAG: {', '.join(rag_tasks)}
|
| 106 |
+
|
| 107 |
| Task | Dataset | Metric | Type |
|
| 108 |
|---------------------------------|---------------------------------------|-----------|-----------------|
|
| 109 |
| polemo2_in | allegro/klej-polemo2-in | accuracy | generate_until |
|