Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Commit 
							
							·
						
						61c1fca
	
1
								Parent(s):
							
							7e0e569
								
update to reasoning
Browse files
    	
        app.py
    CHANGED
    
    | @@ -41,7 +41,7 @@ def avg_over_rewardbench(dataframe_core, dataframe_prefs): | |
| 41 | 
             
                1. Chat: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium)
         | 
| 42 | 
             
                2. Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
         | 
| 43 | 
             
                3. Safety: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
         | 
| 44 | 
            -
                4.  | 
| 45 | 
             
                5. Classic Sets: Includes the test sets (anthropic_helpful, mtbench_human, shp, summarize)
         | 
| 46 | 
             
                """
         | 
| 47 | 
             
                new_df = dataframe_core.copy()
         | 
|  | |
| 41 | 
             
                1. Chat: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium)
         | 
| 42 | 
             
                2. Chat Hard: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
         | 
| 43 | 
             
                3. Safety: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
         | 
| 44 | 
            +
                4. Reasoning: Includes the code and math subsets (math-prm, hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
         | 
| 45 | 
             
                5. Classic Sets: Includes the test sets (anthropic_helpful, mtbench_human, shp, summarize)
         | 
| 46 | 
             
                """
         | 
| 47 | 
             
                new_df = dataframe_core.copy()
         | 

