Spaces:
Running
Running
Commit
·
054ed2d
1
Parent(s):
f89f357
up
Browse files
app.py
CHANGED
|
@@ -42,7 +42,7 @@ def avg_over_rewardbench(dataframe_core, dataframe_prefs):
|
|
| 42 |
2. Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
| 43 |
3. Safety: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
| 44 |
4. Code: Includes the code subsets (hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
| 45 |
-
|
| 46 |
"""
|
| 47 |
new_df = dataframe_core.copy()
|
| 48 |
dataframe_prefs = dataframe_prefs.copy()
|
|
|
|
| 42 |
2. Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
| 43 |
3. Safety: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
| 44 |
4. Code: Includes the code subsets (hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
| 45 |
+
5. Test Sets: Includes the test sets (anthropic_helpful, mtbench_gpt4, shp, summarize)
|
| 46 |
"""
|
| 47 |
new_df = dataframe_core.copy()
|
| 48 |
dataframe_prefs = dataframe_prefs.copy()
|
src/md.py
CHANGED
|
@@ -9,6 +9,7 @@ We average over 4 core sections (per prompt weighting):
|
|
| 9 |
2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
| 10 |
3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
| 11 |
4. **Code**: Includes the code subsets (hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
|
|
|
| 12 |
|
| 13 |
We include multiple types of reward models in this evaluation:
|
| 14 |
1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
|
|
|
|
| 9 |
2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
| 10 |
3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
| 11 |
4. **Code**: Includes the code subsets (hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
| 12 |
+
5. **Test Sets**: Includes the test sets (anthropic_helpful, mtbench_gpt4, shp, summarize)
|
| 13 |
|
| 14 |
We include multiple types of reward models in this evaluation:
|
| 15 |
1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
|