On Robustness and Reliability of Benchmark-Based Evaluation of LLMs Paper • 2509.04013 • Published Sep 4 • 4
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs Paper • 2509.04013 • Published Sep 4 • 4 • 2
Geospatial Mechanistic Interpretability of Large Language Models Paper • 2505.03368 • Published May 6 • 10 • 1
The COVID-19 Infodemic: Can the Crowd Judge Recent Misinformation Objectively? Paper • 2008.05701 • Published Aug 13, 2020
Can the Crowd Judge Truthfulness? A Longitudinal Study on Recent Misinformation about COVID-19 Paper • 2107.11755 • Published Jul 25, 2021
The Many Dimensions of Truthfulness: Crowdsourcing Misinformation Assessments on a Multidimensional Scale Paper • 2108.01222 • Published Aug 3, 2021
kevinr/Confidence-bert-base-uncased-Loss_MSE-Bin_Nobin Text Classification • Updated Jun 14, 2022 • 1
kevinr/Confidence-bert-base-uncased-Loss_CrossEntropy-Bin_Nobin Text Classification • Updated Jun 14, 2022 • 1
kevinr/Confidence-bert-base-uncased-Loss_CrossEntropy-Bin_01234-5 Text Classification • Updated Jun 14, 2022 • 1
kevinr/Confidence-bert-base-uncased-Loss_CrossEntropy-Bin_0123-45 Text Classification • Updated Jun 14, 2022 • 2
kevinr/Confidence-bert-base-uncased-Loss_CrossEntropy-Bin_012-345 Text Classification • Updated Jun 14, 2022 • 1
kevinr/Confidence-bert-base-uncased-Loss_CrossEntropy-Bin_01-2345 Text Classification • Updated Jun 14, 2022 • 1
kevinr/Confidence-bert-base-uncased-Loss_CrossEntropy-Bin_0-12345 Text Classification • Updated Jun 14, 2022 • 2