File size: 4,123 Bytes
8dfc4b5 3e04edb 8dfc4b5 3e04edb 8dfc4b5 4966301 8dfc4b5 4966301 5dad6cc 4966301 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
---
title: MCP-Bench Leaderboard
emoji: π
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
short_description: Leaderboard for MCP-Bench
tags:
- benchmark
- leaderboard
- llm
- mcp
- evaluation
- performance
- tool-use
- agents
---
# MCP-Bench Leaderboard
A modern, interactive web application displaying performance metrics for various Language Learning Models (LLMs) in the MCP-Bench.
## π Features
- **Interactive Leaderboard**: Sort by any metric column
- **Real-time Search**: Filter models by name
- **Responsive Design**: Optimized for desktop and mobile
- **Visual Indicators**: Color-coded performance levels and progress bars
- **Modern UI**: Clean, professional Material Design interface
- **Dark Mode Support**: Automatic dark/light theme detection
## π Metrics Displayed
The leaderboard shows comprehensive performance metrics:
- **Overall Score**: Combined performance metric
- **Valid Tool Schema**: Percentage of valid tool schemas
- **Compliance**: Rule compliance percentage
- **Task Success**: Task completion success rate
- **Schema Understanding**: Understanding of tool schemas
- **Task Completion**: Task completion effectiveness
- **Tool Usage**: Tool utilization efficiency
- **Planning Effectiveness**: Planning and execution quality
## π Quick Start
### Local Development
1. Clone this repository
2. Open `index.html` in your web browser
3. Or serve using a local HTTP server:
```bash
# Using Python
python -m http.server 8000
# Using Node.js
npx serve .
# Using PHP
php -S localhost:8000
```
### Hugging Face Spaces Deployment
This project is optimized for deployment on Hugging Face Spaces:
1. Create a new Space on [Hugging Face](https://huggingface.co/spaces)
2. Choose **Gradio** as the SDK
3. Upload all files to your Space
4. Rename `requirements-hf.txt` to `requirements.txt`
5. Your Space will automatically build and deploy
The `app.py` file provides Gradio integration for Hugging Face Spaces compatibility.
## π Project Structure
```
mcp-bench-leaderboard/
βββ index.html # Main HTML page
βββ style.css # Responsive CSS styling
βββ script.js # Interactive JavaScript functionality
βββ data.json # Leaderboard data
βββ app.py # Gradio app for HF Spaces
βββ requirements-hf.txt # Dependencies for HF deployment
βββ README.md # Documentation
```
## π¨ Customization
### Update Data
Modify `data.json` to add new models or update scores:
```json
{
"lastUpdated": "2025-09-05",
"models": [
{
"name": "your-model-name",
"overall_score": 0.750,
"valid_tool_schema": 99.5,
"compliance": 98.2,
// ... other metrics
}
]
}
```
### Styling
Edit `style.css` to customize:
- Colors and themes
- Layout and spacing
- Responsive breakpoints
- Animation effects
### Functionality
Extend `script.js` to add:
- New sorting algorithms
- Additional filtering options
- Export functionality
- Chart visualizations
## π Browser Support
- Chrome 60+
- Firefox 55+
- Safari 12+
- Edge 79+
## π± Mobile Compatibility
The application is fully responsive and optimized for:
- Tablets (768px - 1024px)
- Mobile phones (320px - 767px)
- Large screens (1200px+)
## π§ Technical Details
- **Pure Frontend**: No backend dependencies
- **Vanilla JavaScript**: No frameworks required
- **Modern CSS**: Flexbox, Grid, CSS Variables
- **Progressive Enhancement**: Works without JavaScript
- **SEO Friendly**: Semantic HTML structure
## π Performance
- Lightweight (~50KB total)
- Fast loading times
- Optimized images and assets
- Efficient DOM updates
- Smooth animations
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test across browsers
5. Submit a pull request
## π License
This project is open source and available under the MIT License.
## π Acknowledgments
- Data sourced from MCP Benchmark Results
- Icons from Font Awesome
- Fonts from Google Fonts
- Hosted on Hugging Face Spaces
---
*Last updated: September 2025* |