Ai2

Team

non-profit

Verified

https://allenai.org/

allen_ai

allenai

AI & ML interests

Building breatkthrough AI to solve the world's biggest problems.

Recent Activity

stellalisy authored a paper 22 days ago

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

stellalisy authored a paper 22 days ago

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

stellalisy authored a paper 22 days ago

Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

View all activity

Papers

olmOCR 2: Unit Test Rewards for Document OCR

MolmoAct: Action Reasoning Models that can Reason in Space

View all Papers

Articles

Introducing the Open Chain of Thought Leaderboard

yanhong-l

authored a paper 5 days ago

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs

Paper • 2510.18279 • Published 7 days ago • 3

davidheineman

authored 3 papers 10 days ago

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation

Paper • 2508.13144 • Published Aug 18

Evaluating LLMs on Chinese Idiom Translation

Paper • 2508.10421 • Published Aug 14

Fluid Language Model Benchmarking

Paper • 2509.11106 • Published Sep 14

stellalisy

authored 10 papers 22 days ago

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Paper • 2404.06664 • Published Apr 10, 2024 • 1

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Paper • 2410.02677 • Published Oct 3, 2024 • 1

Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Paper • 2502.14860 • Published Feb 20

BLAB: Brutally Long Audio Bench

Paper • 2505.03054 • Published May 5 • 1

Spurious Rewards: Rethinking Training Signals in RLVR

Paper • 2506.10947 • Published Jun 12 • 1

MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Paper • 2406.00922 • Published Jun 3, 2024

PrefPalette: Personalized Preference Modeling with Latent Attributes

Paper • 2507.13541 • Published Jul 17 • 8

Medical Hallucinations in Foundation Models and Their Impact on Healthcare

Paper • 2503.05777 • Published Feb 26

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Paper • 2305.19759 • Published May 31, 2023

Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It

Paper • 2510.00177 • Published 27 days ago • 3

lihaoxin2020

authored a paper about 2 months ago

Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning

Paper • 2508.19202 • Published Aug 26 • 7

jaslee20

authored a paper 3 months ago

MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 43

valpy

authored 4 papers 3 months ago

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Paper • 2502.08395 • Published Feb 12

RewardBench 2: Advancing Reward Model Evaluation

Paper • 2506.01937 • Published Jun 2 • 7

Generalizing Verifiable Instruction Following

Paper • 2507.02833 • Published Jul 3 • 1