Spaces:
Runtime error
title: Browsergym_env Environment Server
emoji: π
colorFrom: gray
colorTo: purple
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Browsergym_env Environment Server
FastAPI server for browsergym_env environment powered by Meta's OpenEnv.
About
This Space provides a containerized environment for browsergym_env interactions. Built with FastAPI and OpenEnv framework.
Web Interface
This deployment includes an interactive web interface for exploring the environment:
- HumanAgent Interface: Interact with the environment using a web form
- State Observer: Real-time view of environment state and action history
- Live Updates: WebSocket-based real-time updates
Access the web interface at: /web
API Documentation
Visit /docs for interactive API documentation.
Health Check
The environment provides a health check endpoint at /health.
BrowserGym Environment
BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv.
Why BrowserGym?
BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.
What are these benchmarks?
MiniWoB++ (Training): 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. No external setup needed - tasks run in isolated browser sessions.
WebArena (Evaluation): 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multi-step, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. Requires running 7 backend services (shopping site, GitLab instance, etc).
VisualWebArena: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.
WorkArena: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.
The training β evaluation pipeline:
- Train on MiniWoB (simple, controlled, fast iterations)
- Evaluate on WebArena (complex, realistic, measures real-world capability)
Key advantage: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.
Quick Start - Training (MiniWoB)
No Setup Required! π
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Create environment for MiniWoB training task
env = BrowserGymEnv.from_docker_image(
"ghcr.io/openenv/browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "miniwob",
"BROWSERGYM_TASK_NAME": "click-test", # or "click-button", "click-dialog", etc.
}
)
# Train your agent!
for episode in range(1000):
result = env.reset()
print(f"Goal: {result.observation.goal}")
done = False
while not done:
# Your agent decides what to do
action_str = agent.get_action(result.observation.text)
action = BrowserGymAction(action_str=action_str)
result = env.step(action)
done = result.done
print(f"Reward: {result.reward}")
env.close()
Available Tasks by Benchmark
MiniWoB++ Tasks (Training - 100+ tasks)
MiniWoB tasks are organized by difficulty and type. Here are the main categories:
Click Tasks (Basic interaction)
| Task Name | Description | Difficulty |
|---|---|---|
click-test |
Click a single button | β Easy |
click-button |
Click button with specific text | β Easy |
click-button-sequence |
Click buttons in order | ββ Medium |
click-checkboxes |
Select specific checkboxes | ββ Medium |
click-checkboxes-soft |
Select checkboxes (multiple valid) | ββ Medium |
click-checkboxes-large |
Many checkboxes to select from | ββ Medium |
click-checkboxes-transfer |
Transfer learning variation | ββ Medium |
click-dialog |
Click correct button in dialog | β Easy |
click-dialog-2 |
More complex dialog | ββ Medium |
click-link |
Click on a link | β Easy |
click-option |
Select from dropdown | ββ Medium |
click-pie |
Click on pie chart slice | ββ Medium |
click-scroll-list |
Click item in scrollable list | βββ Hard |
click-shades |
Click on specific color shade | ββ Medium |
click-shape |
Click on specific shape | ββ Medium |
click-tab |
Switch between tabs | ββ Medium |
click-tab-2 |
More complex tab switching | βββ Hard |
click-widget |
Click on UI widget | ββ Medium |
Text Entry Tasks (Typing and forms)
| Task Name | Description | Difficulty |
|---|---|---|
enter-text |
Type text into input field | β Easy |
enter-text-dynamic |
Dynamic text entry | ββ Medium |
enter-text-2 |
Multiple text fields | ββ Medium |
enter-password |
Fill password field | β Easy |
enter-date |
Enter a date | ββ Medium |
enter-time |
Enter a time | ββ Medium |
login-user |
Complete login form | ββ Medium |
login-user-popup |
Login via popup | βββ Hard |
Navigation Tasks (Multi-step interaction)
| Task Name | Description | Difficulty |
|---|---|---|
navigate-tree |
Navigate through tree structure | βββ Hard |
search-engine |
Use search interface | ββ Medium |
use-autocomplete |
Interact with autocomplete | βββ Hard |
book-flight |
Book a flight (complex form) | ββββ Very Hard |
choose-date |
Pick date from calendar | βββ Hard |
choose-date-easy |
Simplified date picker | ββ Medium |
choose-date-medium |
Medium difficulty date picker | βββ Hard |
choose-list |
Select from long list | ββ Medium |
Visual/Spatial Tasks (Requires visual understanding)
| Task Name | Description | Difficulty |
|---|---|---|
count-sides |
Count sides of shape | ββ Medium |
count-shape |
Count specific shapes | ββ Medium |
find-word |
Find word in text | ββ Medium |
focus-text |
Focus on text element | β Easy |
focus-text-2 |
More complex focus task | ββ Medium |
grid-coordinate |
Click grid coordinate | ββ Medium |
guess-number |
Guess a number game | βββ Hard |
identify-shape |
Identify shape type | ββ Medium |
read-table |
Extract info from table | βββ Hard |
read-table-2 |
More complex table reading | βββ Hard |
Email/Social Tasks (Realistic scenarios)
| Task Name | Description | Difficulty |
|---|---|---|
email-inbox |
Manage email inbox | ββββ Very Hard |
email-inbox-forward |
Forward emails | ββββ Very Hard |
email-inbox-nl |
Natural language email task | ββββ Very Hard |
email-inbox-star-reply |
Star and reply to emails | ββββ Very Hard |
social-media |
Social media interaction | ββββ Very Hard |
social-media-some |
Partial social media task | βββ Hard |
Total: 100+ tasks across all categories
Usage:
# Easy task for quick testing
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"})
# Medium difficulty for training
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"})
# Hard task for evaluation
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"})
WebArena Tasks (Evaluation - 812 tasks)
WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811.
By Website:
| Website | Task Count | Description | Example Tasks |
|---|---|---|---|
| Shopping | ~200 | E-commerce site | Search products, add to cart, checkout |
| Shopping Admin | ~150 | Admin panel | Manage products, orders, customers |
| ~150 | Forum/social | Post, comment, search discussions | |
| GitLab | ~200 | Code repository | Create issues, merge requests, review code |
| Wikipedia | ~100 | Knowledge base | Search, read, extract information |
| Map | ~12 | Location service | Find places, get directions |
By Difficulty:
| Difficulty | Task Count | Steps Required | Example |
|---|---|---|---|
| Easy | ~200 | 1-5 steps | "Find the price of product X" |
| Medium | ~400 | 5-15 steps | "Add cheapest laptop to cart" |
| Hard | ~212 | 15+ steps | "Create merge request for bug fix" |
Usage:
# Task 0 (usually easy)
env = BrowserGymEnv(environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0",
"SHOPPING": "http://your-server:7770",
# ... other URLs
})
# Task 156 (GitLab merge request)
env = BrowserGymEnv(environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "156",
# ... URLs
})
Note: WebArena tasks require the full backend infrastructure. See WebArena setup guide.
VisualWebArena Tasks (910 tasks)
Similar to WebArena but requires visual understanding. Tasks involve:
- Image-based reasoning
- Visual element identification
- Multimodal interaction (text + images)
WorkArena Tasks
Enterprise software automation tasks:
- CRM operations
- Project management
- Business workflows
Full task lists:
Evaluation (WebArena)
Prerequisites
WebArena requires setting up backend infrastructure. See the WebArena documentation.
Usage
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Create environment for WebArena evaluation
env = BrowserGymEnv.from_docker_image(
"ghcr.io/openenv/browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0", # Task ID
# WebArena backend URLs (required)
"SHOPPING": "http://your-server:7770",
"SHOPPING_ADMIN": "http://your-server:7780/admin",
"REDDIT": "http://your-server:9999",
"GITLAB": "http://your-server:8023",
"MAP": "http://your-server:3000",
"WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing",
"HOMEPAGE": "http://your-server:4399",
}
)
# Evaluate your trained agent
result = env.reset()
while not result.done:
action_str = agent.get_action(result.observation)
action = BrowserGymAction(action_str=action_str)
result = env.step(action)
print(f"Success: {result.reward}")
env.close()
Building the Docker Image
Prerequisites
- Base Image: Build the OpenEnv base image first:
# From the OpenEnv repository root
docker build -t openenv-base:latest -f src/core/containers/images/Dockerfile .
Build the BrowserGym Environment
# From the OpenEnv repository root
docker build -t browsergym-env:latest -f src/envs/browsergym_env/server/Dockerfile .
Run the Server
For MiniWoB (Training):
docker run -p 8000:8000 \
-e BROWSERGYM_BENCHMARK="miniwob" \
-e BROWSERGYM_TASK_NAME="click-test" \
browsergym-env:latest
For WebArena (Evaluation):
docker run -p 8000:8000 \
-e BROWSERGYM_BENCHMARK="webarena" \
-e BROWSERGYM_TASK_NAME="0" \
-e SHOPPING="http://your-server:7770" \
-e SHOPPING_ADMIN="http://your-server:7780/admin" \
-e REDDIT="http://your-server:9999" \
-e GITLAB="http://your-server:8023" \
-e MAP="http://your-server:3000" \
-e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \
-e HOMEPAGE="http://your-server:4399" \
browsergym-env:latest
Environment Details
Action
Actions in BrowserGym are natural language strings that describe browser operations:
from envs.browsergym_env import BrowserGymAction
# Click actions
action = BrowserGymAction(action_str="click('Submit button')")
action = BrowserGymAction(action_str="click('element_id_123')")
# Type actions
action = BrowserGymAction(action_str="fill('username', 'john@example.com')")
action = BrowserGymAction(action_str="fill('password', 'secret123')")
# Navigate actions
action = BrowserGymAction(action_str="goto('https://example.com')")
# Keyboard actions
action = BrowserGymAction(action_str="press('Enter')")
action = BrowserGymAction(action_str="press('Tab')")
# Scroll actions
action = BrowserGymAction(action_str="scroll('down')")
Observation
Observations contain multiple modalities:
result = env.step(action)
obs = result.observation
# Text observations
print(obs.text) # Primary text representation (AXTree or DOM)
print(obs.axtree_txt) # Accessibility tree
print(obs.pruned_html) # Pruned HTML (interactive elements only)
# Page metadata
print(obs.url) # Current URL
print(obs.goal) # Task goal/instruction
# Visual (if enabled)
if obs.screenshot is not None:
print(obs.screenshot.shape) # [height, width, channels]
# Error handling
if obs.last_action_error:
print(f"Action failed: {obs.error}")
# Episode status
print(obs.done) # True if episode ended
print(obs.reward) # Reward for the step
# Access full BrowserGym data (includes timestamps, etc.)
print(obs.metadata["browsergym_obs"]) # Full observation dict from BrowserGym
print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.)
Advanced: Accessing Raw BrowserGym Data
For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in metadata:
result = env.step(action)
# Access timestamps (if available)
info = result.observation.metadata["browsergym_info"]
if "timestamp" in info:
print(f"Action timestamp: {info['timestamp']}")
# Access additional observation fields
obs_dict = result.observation.metadata["browsergym_obs"]
if "dom_object" in obs_dict:
dom = obs_dict["dom_object"]
# Work with raw DOM object
# Access page performance data
if "performance" in info:
print(f"Page load time: {info['performance']}")
State
The environment state tracks progress:
state = env.state()
print(f"Benchmark: {state.benchmark}") # 'miniwob', 'webarena', etc.
print(f"Task: {state.task_name}") # Task name/ID
print(f"Episode: {state.episode_id}") # Unique episode ID
print(f"Steps: {state.step_count}") # Number of steps taken
print(f"Total Reward: {state.cum_reward}") # Cumulative reward
print(f"Goal: {state.goal}") # Task instruction
print(f"URL: {state.current_url}") # Current page URL
Configuration
Environment variables:
Common Settings
BROWSERGYM_BENCHMARK: Benchmark to use (miniwob,webarena,visualwebarena,workarena)BROWSERGYM_TASK_NAME: Specific task name (optional, will use first available if not set)BROWSERGYM_HEADLESS: Run browser in headless mode (default:true)BROWSERGYM_VIEWPORT_WIDTH: Browser viewport width (default:1280)BROWSERGYM_VIEWPORT_HEIGHT: Browser viewport height (default:720)BROWSERGYM_TIMEOUT: Action timeout in milliseconds (default:10000)
WebArena-Specific (only needed for WebArena benchmark)
SHOPPING: Shopping website URLSHOPPING_ADMIN: Shopping admin panel URLREDDIT: Reddit-like forum URLGITLAB: GitLab instance URLMAP: Map service URLWIKIPEDIA: Wikipedia instance URLHOMEPAGE: Homepage URL
Supported Benchmarks
1. MiniWoB++ (Training) β Recommended for Training
- 100+ tasks ranging from simple (click buttons) to complex (form filling, navigation)
- Fast: Instant resets, quick episodes
- Randomized: Task variations for generalization
- No setup: Works out-of-the-box
- Dense rewards: Immediate feedback for learning
Use Case: Train agents on fundamental web navigation skills
2. WebArena (Evaluation) π Benchmark
- 812 realistic tasks across 6 websites
- Complex: Multi-step reasoning, real web interfaces
- Requires setup: Need to run 7 backend services
- Sparse rewards: Binary success/failure
- Evaluation-focused: Test real-world performance
Use Case: Evaluate agents on realistic web tasks
3. VisualWebArena (Evaluation) ποΈ Visual Benchmark
- 910 tasks requiring visual understanding
- Multimodal: Both text and visual observations
- Requires setup: Similar to WebArena
- Challenging: Requires visual reasoning
Use Case: Test visual web navigation capabilities
4. WorkArena (Evaluation) πΌ Enterprise Benchmark
- Enterprise tasks: CRM, project management, etc.
- Realistic workflows: Real enterprise software
- Requires setup: Enterprise software instances
Use Case: Evaluate on business automation tasks
Typical Training Pipeline
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Stage 1: Train on MiniWoB (simple tasks, fast)
train_env = BrowserGymEnv.from_docker_image(
"browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "miniwob",
"BROWSERGYM_TASK_NAME": "click-button",
}
)
# Train your agent (RL, imitation learning, etc.)
agent.train(train_env, num_episodes=10000)
train_env.close()
# Stage 2: Evaluate on WebArena (complex tasks, realistic)
eval_env = BrowserGymEnv.from_docker_image(
"browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0",
# ... WebArena URLs
}
)
# Test performance
success_rate = agent.evaluate(eval_env, num_tasks=812)
print(f"WebArena Success Rate: {success_rate:.2%}")
eval_env.close()
Development & Testing
Running Tests
# From the OpenEnv repository root
pytest tests/envs/test_browsergym_env.py
Local Development
# Install in development mode
cd /path/to/OpenEnv
pip install -e .
# Install BrowserGym
pip install browsergym browsergym-miniwob browsergym-webarena
# Run the server locally
cd src/envs/browsergym_env/server
export BROWSERGYM_BENCHMARK=miniwob
export BROWSERGYM_TASK_NAME=click-test
python app.py
Project Structure
browsergym_env/
βββ __init__.py # Module exports
βββ models.py # Action, Observation, State dataclasses
βββ client.py # HTTPEnvClient implementation
βββ README.md # This file
βββ server/
βββ __init__.py
βββ app.py # FastAPI application
βββ browsergym_environment.py # Environment implementation
βββ Dockerfile # Container specification
βββ requirements.txt # Python dependencies