The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published 5 days ago • 42
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs Paper • 2506.15211 • Published Jun 18 • 37
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16 • 60