Spaces:
Sleeping
Sleeping
Create verifact_data.csv
Browse files- verifact_data.csv +25 -0
verifact_data.csv
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
tier,model,f1,precision,recall
|
| 2 |
+
Overall,GPT4o,67.41,80.59,57.93
|
| 3 |
+
FactBench,GPT4o,80.93,85.11,77.13
|
| 4 |
+
Reddit,GPT4o,42.76,74.04,30.06
|
| 5 |
+
Overall,Claude 3.5-Sonnet,63.65,78.37,53.58
|
| 6 |
+
FactBench,Claude 3.5-Sonnet,75.68,83.28,69.35
|
| 7 |
+
Reddit,Claude 3.5-Sonnet,42.90,71.25,30.69
|
| 8 |
+
Overall,Gemini 1.5-Flash,64.10,80.72,53.16
|
| 9 |
+
FactBench,Gemini 1.5-Flash,77.38,85.45,70.71
|
| 10 |
+
Reddit,Gemini 1.5-Flash,40.26,73.87,27.67
|
| 11 |
+
Overall,Llama3.1-8b,48.62,60.91,40.46
|
| 12 |
+
FactBench,Llama3.1-8b,60.71,68.87,54.28
|
| 13 |
+
Reddit,Llama3.1-8b,28.86,49.36,20.39
|
| 14 |
+
Overall,Llama3.1-70b,55.12,68.09,46.30
|
| 15 |
+
FactBench,Llama3.1-70b,65.83,76.05,58.00
|
| 16 |
+
Reddit,Llama3.1-70b,38.61,56.54,29.31
|
| 17 |
+
Overall,Llama3.1-405B,60.61,72.80,51.92
|
| 18 |
+
FactBench,Llama3.1-405B,73.23,78.80,68.40
|
| 19 |
+
Reddit,Llama3.1-405B,38.98,64.10,28.00
|
| 20 |
+
Overall,Qwen2.5-8b,55.78,72.45,45.34
|
| 21 |
+
FactBench,Qwen2.5-8b,69.23,77.18,58.66
|
| 22 |
+
Reddit,Qwen2.5-8b,37.25,65.58,26.01
|
| 23 |
+
Overall,Qwen2.5-32b,60.00,77.79,47.52
|
| 24 |
+
FactBench,Qwen2.5-32b,71.31,82.74,62.77
|
| 25 |
+
Reddit,Qwen2.5-32b,37.34,70.60,25.38
|