File size: 4,123 Bytes
8dfc4b5
 
 
 
 
3e04edb
8dfc4b5
3e04edb
8dfc4b5
 
 
 
 
 
 
 
 
 
4966301
8dfc4b5
 
 
4966301
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5dad6cc
 
4966301
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: MCP-Bench Leaderboard
emoji: πŸ†
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
short_description: Leaderboard for MCP-Bench
tags:
  - benchmark
  - leaderboard
  - llm
  - mcp
  - evaluation
  - performance
  - tool-use
  - agents
---

# MCP-Bench Leaderboard

A modern, interactive web application displaying performance metrics for various Language Learning Models (LLMs) in the MCP-Bench.

## πŸ† Features

- **Interactive Leaderboard**: Sort by any metric column
- **Real-time Search**: Filter models by name
- **Responsive Design**: Optimized for desktop and mobile
- **Visual Indicators**: Color-coded performance levels and progress bars
- **Modern UI**: Clean, professional Material Design interface
- **Dark Mode Support**: Automatic dark/light theme detection

## πŸ“Š Metrics Displayed

The leaderboard shows comprehensive performance metrics:

- **Overall Score**: Combined performance metric
- **Valid Tool Schema**: Percentage of valid tool schemas
- **Compliance**: Rule compliance percentage  
- **Task Success**: Task completion success rate
- **Schema Understanding**: Understanding of tool schemas
- **Task Completion**: Task completion effectiveness
- **Tool Usage**: Tool utilization efficiency
- **Planning Effectiveness**: Planning and execution quality

## πŸš€ Quick Start

### Local Development

1. Clone this repository
2. Open `index.html` in your web browser
3. Or serve using a local HTTP server:

```bash
# Using Python
python -m http.server 8000

# Using Node.js
npx serve .

# Using PHP
php -S localhost:8000
```

### Hugging Face Spaces Deployment

This project is optimized for deployment on Hugging Face Spaces:

1. Create a new Space on [Hugging Face](https://huggingface.co/spaces)
2. Choose **Gradio** as the SDK
3. Upload all files to your Space
4. Rename `requirements-hf.txt` to `requirements.txt`
5. Your Space will automatically build and deploy

The `app.py` file provides Gradio integration for Hugging Face Spaces compatibility.

## πŸ“ Project Structure

```
mcp-bench-leaderboard/
β”œβ”€β”€ index.html          # Main HTML page
β”œβ”€β”€ style.css           # Responsive CSS styling
β”œβ”€β”€ script.js           # Interactive JavaScript functionality
β”œβ”€β”€ data.json           # Leaderboard data
β”œβ”€β”€ app.py             # Gradio app for HF Spaces
β”œβ”€β”€ requirements-hf.txt # Dependencies for HF deployment
└── README.md          # Documentation
```

## 🎨 Customization

### Update Data

Modify `data.json` to add new models or update scores:

```json
{
  "lastUpdated": "2025-09-05",
  "models": [
    {
      "name": "your-model-name",
      "overall_score": 0.750,
      "valid_tool_schema": 99.5,
      "compliance": 98.2,
      // ... other metrics
    }
  ]
}
```

### Styling

Edit `style.css` to customize:
- Colors and themes
- Layout and spacing
- Responsive breakpoints
- Animation effects

### Functionality

Extend `script.js` to add:
- New sorting algorithms
- Additional filtering options
- Export functionality
- Chart visualizations

## 🌐 Browser Support

- Chrome 60+
- Firefox 55+
- Safari 12+
- Edge 79+

## πŸ“± Mobile Compatibility

The application is fully responsive and optimized for:
- Tablets (768px - 1024px)
- Mobile phones (320px - 767px)
- Large screens (1200px+)

## πŸ”§ Technical Details

- **Pure Frontend**: No backend dependencies
- **Vanilla JavaScript**: No frameworks required
- **Modern CSS**: Flexbox, Grid, CSS Variables
- **Progressive Enhancement**: Works without JavaScript
- **SEO Friendly**: Semantic HTML structure

## πŸ“ˆ Performance

- Lightweight (~50KB total)
- Fast loading times
- Optimized images and assets
- Efficient DOM updates
- Smooth animations

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test across browsers
5. Submit a pull request

## πŸ“„ License

This project is open source and available under the MIT License.

## πŸ™ Acknowledgments

- Data sourced from MCP Benchmark Results
- Icons from Font Awesome
- Fonts from Google Fonts
- Hosted on Hugging Face Spaces

---

*Last updated: September 2025*