import gradio as gr def build_page(): with gr.Column(elem_id="about-page-content-wrapper"): # --- Section 1: About AstaBench --- gr.HTML( """

About AstaBench

AstaBench is a novel AI agents evaluation framework, providing a challenging new test for AI agents: the first benchmark challenge that evaluates agents’ scientific abilities on a broad spectrum of research skills, including literature understanding, data analysis, planning, tool use, coding, and search. Asta’s set of standard tools makes it easy to build general-purpose science agents and to compare their performance in an apples-to-apples manner.

""" ) gr.Markdown("---", elem_classes="divider-line") # --- Section 2: Why AstaBench? --- gr.HTML( """

Why AstaBench?

Most current benchmarks test agentic AI and isolated aspects of scientific reasoning, but rarely evaluate AI agentic behavior rigorously or capture the full skill set scientific research requires. Agents can appear effective despite inconsistent results and high compute use, often outperforming others by consuming more resources. Advancing scientific AI requires evaluations that emphasize reproducibility, efficiency, and the real complexity of research.


AstaBench fills this gap: an agents evaluation framework and suite of open benchmarks for evaluating scientific AI assistants on core scientific tasks that require novel reasoning. AstaBench helps scientists identify which agents best support their needs through task-relevant leaderboards, while giving AI developers a standard execution environment and tools to test the scientific reasoning capabilities of their agents compared to well-known baselines from the literature, including both open and closed LLM foundation models.

""" ) gr.Markdown("---", elem_classes="divider-line") # --- Section 3: What Does AstaBench Include? --- gr.HTML( """

What Does AstaBench Include?

AstaBench includes a rigorous agents evaluation framework and a suite of benchmarks consisting of over 2,400 problems across 11 benchmarks, organized into four core categories:

Plus: a large suite of integrated agents and leaderboards with results from extensive evaluation of agents and models.

🔍 Learn more in the AstaBench technical blog post

""" ) gr.Markdown("---", elem_classes="divider-line") # --- Section 4: Understanding the Leaderboards --- gr.HTML( """

Understanding the Leaderboards

The AstaBench Overall Leaderboard provides a high-level view of overall agent performance and efficiency:

Each category leaderboard provides:

""" ) gr.Markdown("---", elem_classes="divider-line") # --- Section 5: Scoring & Aggregation --- gr.HTML( """

Scoring & Aggregation

AstaBench encourages careful, transparent evaluation. Here's how we handle scoring, cost, and partial results:

Scores

Cost

Note: Cost values reflect pricing and infrastructure conditions at a fixed point in time. We recognize that compute costs may change over time and vary by provider, and are actively working on methods to keep costs up-to-date and normalized for fair comparisons.

Coverage

These design choices ensure fair comparison while penalizing cherry-picking and omissions.

""" ) gr.Markdown("---", elem_classes="divider-line") # --- Section 6: Learn More --- gr.HTML( """

Learn More

""" ) # Floating feedback button floating_feedback_button_html = """
Have feedback?
""" gr.HTML(floating_feedback_button_html)