Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published Mar 10 • 23
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 104
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 165