mini-coder Collection Small models for agentic SWE research: https://ricardodominguez.github.io/blogs/minicoder.html • 4 items • Updated Oct 2 • 2
Answer Matching Outperforms Multiple Choice for Language Model Evaluation Paper • 2507.02856 • Published Jul 3 • 8
answer-matching Collection Free-form datasets, human annotations, and sample-level model outputs for "Answer Matching Outperforms Multiple Choice for Language Model Evaluation" • 2 items • Updated Jul 3 • 2