Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Commit 
							
							·
						
						65e180d
	
1
								Parent(s):
							
							b7aaef4
								
details
Browse files
    	
        src/md.py
    CHANGED
    
    | @@ -2,13 +2,23 @@ ABOUT_TEXT = """ | |
| 2 | 
             
            We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
         | 
| 3 | 
             
            A win is when the score for the chosen response is higher than the score for the rejected response.
         | 
| 4 |  | 
|  | |
|  | |
| 5 | 
             
            We average over 4 core sections (per prompt weighting):
         | 
| 6 | 
            -
            1. Chat | 
| 7 | 
            -
            2. Chat Hard | 
| 8 | 
            -
            3. Safety | 
| 9 | 
            -
            4. Code | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 10 |  | 
| 11 | 
            -
             | 
| 12 |  | 
| 13 | 
             
            Total number of the prompts is: 2538, filtered from 4676.
         | 
| 14 |  | 
|  | |
| 2 | 
             
            We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
         | 
| 3 | 
             
            A win is when the score for the chosen response is higher than the score for the rejected response.
         | 
| 4 |  | 
| 5 | 
            +
            ## Overview
         | 
| 6 | 
            +
             | 
| 7 | 
             
            We average over 4 core sections (per prompt weighting):
         | 
| 8 | 
            +
            1. **Chat**: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium)
         | 
| 9 | 
            +
            2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
         | 
| 10 | 
            +
            3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
         | 
| 11 | 
            +
            4. **Code**: Includes the code subsets (hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            We include multiple types of reward models in this evaluation:
         | 
| 14 | 
            +
            1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
         | 
| 15 | 
            +
            2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
         | 
| 16 | 
            +
            3. **DPO**: Models trained with Direct Preference Optimization (DPO), with modifiers such as `-ref-free` or `-norm` changing how scores are computed.
         | 
| 17 | 
            +
            4. **Random**: Random choice baseline.
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            Others, such as **Generative Judge** are coming soon.
         | 
| 20 |  | 
| 21 | 
            +
            ### Subset Details
         | 
| 22 |  | 
| 23 | 
             
            Total number of the prompts is: 2538, filtered from 4676.
         | 
| 24 |  | 

