Spaces:

meta-agents-research-environments
/

leaderboard

Running on CPU Upgrade

App Files Files Community

leaderboard / content.py

RomainFroger

Added citation

dbfdab8 about 1 month ago

raw

history blame contribute delete

4.1 kB

	OWNER = "meta-agents-research-environments"
	SUBMISSION_DATASET = f"{OWNER}/leaderboard_submissions_internal"
	CONTACT_DATASET = f"{OWNER}/leaderboard_contact_info_internal"
	RESULTS_DATASET = f"{OWNER}/leaderboard_results"
	LEADERBOARD_PATH = f"{OWNER}/leaderboard"

	TITLE = """
	<div style="text-align: center; padding: 20px 0; background: linear-gradient(135deg, #1877f2 0%, #42a5f5 100%); border-radius: 15px; margin-bottom: 30px;">
	<h1 style="color: white; font-size: 2em; margin: 0; font-weight: 700; text-shadow: 2px 2px 4px rgba(0,0,0,0.3);">
	Gaia2 Leaderboard 🏆
	</h1>
	</div>
	"""

	SCENARIO_LIST = [
	"adaptability",
	"mini_noise",
	"time",
	"execution",
	"ambiguity",
	"mini_agent2agent",
	"search",
	]

	MAX_PARALLELISM = 10

	INTRODUCTION_TEXT = """
	[Gaia2](https://huggingface.co/datasets/meta-agents-research-environments/gaia2) is a benchmark designed to measure general agent capabilities. Beyond traditional search and execution tasks, Gaia2 runs asynchronously, requiring agents to handle ambiguities and noise, adapt to dynamic environments, collaborate with other agents, and operate under temporal constraints. As of publication, no system dominates across the task spectrum: stronger reasoning often comes at the cost of efficiency & the ability to complete sensitive tasks in due time.

	Gaia2 evaluates agents across the following dimensions: Execution (instruction following, multi-step tool-use), Search (information retrieval), Ambiguity (handling unclear or incomplete instructions), Adaptability (responding to dynamic environment changes), Time (managing temporal constraints and scheduling), Noise (operating effectively despite irrelevant information and random tool failures) and Agent-to-Agent (collaboration and coordination with other agents).

	⚠️ All scores on this page are self reported. Associated traces are made available to the open-source community in order to enable deeper study of the tradeoffs between model behavior vs performance on Gaia2.
	"""

	SUBMISSION_TEXT = """
	You can find a complete setup guide [there](https://facebookresearch.github.io/meta-agents-research-environments/user_guide/gaia2_evaluation.html), but here are some simplified instructions.

	First, install Meta's agent research environment in your python environment of choice (uv, conda, virtualenv, ...)
	```bash
	pip install meta-agents-research-environments
	```

	Then, run the benchmark for all configurations: adaptability, mini_noise, time, execution, ambiguity, mini_agent2agent, search.
	Don't forget to upload all results to the hub with the `hf_upload` kwarg!

	```bash
	are-benchmark gaia2-run \\
	--hf meta-agents-research-environments/gaia2 \\
	--hf-split validation \\
	--hf-config CONFIGURATION \\
	--model YOUR_MODEL \\
	--provider YOUR_PROVIDER \\
	--agent default \\
	--max-concurrent-scenarios 2 \\
	--scenario-timeout 300 \\
	--output-dir ./monitored_test_results \\
	--hf-upload YOUR_HUB_DATASET_TO_SAVE_RESULTS
	```

	Add all the relevant information about your model in the README!

	Finally, log in to this page, complete the informations for logging, and provide the path to your submission dataset.
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""@misc{andrews2025arescalingagentenvironments,
	title={ARE: Scaling Up Agent Environments and Evaluations},
	author={Pierre Andrews and Amine Benhalloum and Gerard Moreno-Torres Bertran and Matteo Bettini and Amar Budhiraja and Ricardo Silveira Cabral and Virginie Do and Romain Froger and Emilien Garreau and Jean-Baptiste Gaya and Hugo Laurençon and Maxime Lecanu and Kunal Malkan and Dheeraj Mekala and Pierre Ménard and Grégoire Mialon and Ulyana Piterbarg and Mikhail Plekhanov and Mathieu Rita and Andrey Rusakov and Thomas Scialom and Vladislav Vorotilov and Mengjue Wang and Ian Yu},
	year={2025},
	eprint={2509.17158},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2509.17158},
	}"""