Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Paper • 2510.18279 • Published 7 days ago • 3
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation Paper • 2508.13144 • Published Aug 18
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge Paper • 2404.06664 • Published Apr 10, 2024 • 1
CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs Paper • 2410.02677 • Published Oct 3, 2024 • 1
Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning Paper • 2502.14860 • Published Feb 20
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning Paper • 2406.00922 • Published Jun 3, 2024
PrefPalette: Personalized Preference Modeling with Latent Attributes Paper • 2507.13541 • Published Jul 17 • 8
Medical Hallucinations in Foundation Models and Their Impact on Healthcare Paper • 2503.05777 • Published Feb 26
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning Paper • 2305.19759 • Published May 31, 2023
Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It Paper • 2510.00177 • Published 27 days ago • 3
Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning Paper • 2508.19202 • Published Aug 26 • 7
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11 • 43
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance Paper • 2502.08395 • Published Feb 12