Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation Paper โข 2505.09027 โข Published May 13
A Case Study of Web App Coding with OpenAI Reasoning Models Paper โข 2409.13773 โข Published Sep 19, 2024 โข 6
WebApp1K: A Practical Code-Generation Benchmark for Web App Development Paper โข 2408.00019 โข Published Jul 30, 2024 โข 1
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper โข 2409.05177 โข Published Sep 8, 2024 โข 7