Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Update src/md.py
Browse files
    	
        src/md.py
    CHANGED
    
    | @@ -9,7 +9,12 @@ We average over 4 core sections (per prompt weighting): | |
| 9 | 
             
            2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
         | 
| 10 | 
             
            3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
         | 
| 11 | 
             
            4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
         | 
| 12 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
| 13 |  | 
| 14 | 
             
            We include multiple types of reward models in this evaluation:
         | 
| 15 | 
             
            1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
         | 
|  | |
| 9 | 
             
            2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
         | 
| 10 | 
             
            3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
         | 
| 11 | 
             
            4. **Reasoning**: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            For Reasoning, we increase the weight of the PRM-Math subset so code and math abilities are weighed equally in the final number, rather than increasing the relevance of code.
         | 
| 14 | 
            +
            We add a final column, **Prior Sets** -- includes the test sets ([anthropic_helpful](https://huggingface.co/datasets/Anthropic/hh-rlhf), [anthropic_hhh](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment), [shp](https://huggingface.co/datasets/stanfordnlp/SHP), [summarize](https://huggingface.co/datasets/openai/summarize_from_feedback))
         | 
| 15 | 
            +
             | 
| 16 | 
            +
            Once all subsets weighted averages are achieved, the final RewardBench score is the average across the 5 subset scores.
         | 
| 17 | 
            +
             | 
| 18 |  | 
| 19 | 
             
            We include multiple types of reward models in this evaluation:
         | 
| 20 | 
             
            1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
         | 

