jkhouja commited on
Commit
e840038
·
verified ·
1 Parent(s): 0f04a65

Update leaderboard.csv

Browse files
Files changed (1) hide show
  1. leaderboard.csv +16 -16
leaderboard.csv CHANGED
@@ -1,17 +1,17 @@
1
- Model,Provider,Type,Baseline score,Obfuscated score
2
- Aya 23 35B,Cohere,Open source,0.10654349746757057,0.057081801
3
- Claude 3.5 Sonnet,Anthropic,Closed source,0.48255271180599657,0.2810140963355337
4
- Claude 3.7 Sonnet,Anthropic,Closed source,0.604975,0.428881
5
- GPT 4.5,OpenAI,Closed source,0.4208265195574057,0.2545024812218498
6
- GPT 4o,OpenAI,Closed source,0.31371291749661456,0.1563339989919302
7
- Gemini 1.5 Pro,Google,Closed source,0.3690345167304693,0.20461522579355207
8
- Llama 3.3 70B-Instruct,Meta,Open source,0.11452795751175084,0.082131188
9
- Phi4,Microsoft,Open source,0.1809802769595679,0.10996628714372364
10
- DeepSeek R1,DeepSeek,Open source,0.3965527162895584,0.2649618642615188
11
- o1-preview,OpenAI,Closed source,0.47730527712315257,0.3222020975619888
12
- o3-mini (high),OpenAI,Closed source,0.42172257807447155,0.3059086523804619
13
- o3-mini (low),OpenAI,Closed source,0.249751,0.122204
14
- Gemini 2.5 Pro,Google,Closed source,0.589055,0.423539
15
- GPT-5,OpneAI,Closed source,0.609,0.467
16
- Claude Opus 4.1,Anthropic,Closed source,0.592,0.458
17
  DeepSeek-V3.1-Terminus,DeepSeek,Open source,0.554,0.422
 
1
+ Model,Provider,Type,Baseline score,Obfuscated score
2
+ Aya 23 35B,Cohere,Open source,0.10654349746757057,0.057081801
3
+ Claude 3.5 Sonnet,Anthropic,Closed source,0.48255271180599657,0.2810140963355337
4
+ Claude 3.7 Sonnet,Anthropic,Closed source,0.604975,0.428881
5
+ GPT 4.5,OpenAI,Closed source,0.4208265195574057,0.2545024812218498
6
+ GPT 4o,OpenAI,Closed source,0.31371291749661456,0.1563339989919302
7
+ Gemini 1.5 Pro,Google,Closed source,0.3690345167304693,0.20461522579355207
8
+ Llama 3.3 70B-Instruct,Meta,Open source,0.11452795751175084,0.082131188
9
+ Phi4,Microsoft,Open source,0.1809802769595679,0.10996628714372364
10
+ DeepSeek R1,DeepSeek,Open source,0.3965527162895584,0.2649618642615188
11
+ o1-preview,OpenAI,Closed source,0.47730527712315257,0.3222020975619888
12
+ o3-mini (high),OpenAI,Closed source,0.42172257807447155,0.3059086523804619
13
+ o3-mini (low),OpenAI,Closed source,0.249751,0.122204
14
+ Gemini 2.5 Pro,Google,Closed source,0.589055,0.423539
15
+ GPT-5,OpenAI,Closed source,0.609,0.467
16
+ Claude Opus 4.1,Anthropic,Closed source,0.592,0.458
17
  DeepSeek-V3.1-Terminus,DeepSeek,Open source,0.554,0.422