mcp-bench / README.md
ztwang's picture
Upload 10 files
3e04edb verified
|
raw
history blame
4.12 kB
metadata
title: MCP-Bench Leaderboard
emoji: πŸ†
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
short_description: Leaderboard for MCP-Bench
tags:
  - benchmark
  - leaderboard
  - llm
  - mcp
  - evaluation
  - performance
  - tool-use
  - agents

MCP-Bench Leaderboard

A modern, interactive web application displaying performance metrics for various Language Learning Models (LLMs) in the MCP-Bench.

πŸ† Features

  • Interactive Leaderboard: Sort by any metric column
  • Real-time Search: Filter models by name
  • Responsive Design: Optimized for desktop and mobile
  • Visual Indicators: Color-coded performance levels and progress bars
  • Modern UI: Clean, professional Material Design interface
  • Dark Mode Support: Automatic dark/light theme detection

πŸ“Š Metrics Displayed

The leaderboard shows comprehensive performance metrics:

  • Overall Score: Combined performance metric
  • Valid Tool Schema: Percentage of valid tool schemas
  • Compliance: Rule compliance percentage
  • Task Success: Task completion success rate
  • Schema Understanding: Understanding of tool schemas
  • Task Completion: Task completion effectiveness
  • Tool Usage: Tool utilization efficiency
  • Planning Effectiveness: Planning and execution quality

πŸš€ Quick Start

Local Development

  1. Clone this repository
  2. Open index.html in your web browser
  3. Or serve using a local HTTP server:
# Using Python
python -m http.server 8000

# Using Node.js
npx serve .

# Using PHP
php -S localhost:8000

Hugging Face Spaces Deployment

This project is optimized for deployment on Hugging Face Spaces:

  1. Create a new Space on Hugging Face
  2. Choose Gradio as the SDK
  3. Upload all files to your Space
  4. Rename requirements-hf.txt to requirements.txt
  5. Your Space will automatically build and deploy

The app.py file provides Gradio integration for Hugging Face Spaces compatibility.

πŸ“ Project Structure

mcp-bench-leaderboard/
β”œβ”€β”€ index.html          # Main HTML page
β”œβ”€β”€ style.css           # Responsive CSS styling
β”œβ”€β”€ script.js           # Interactive JavaScript functionality
β”œβ”€β”€ data.json           # Leaderboard data
β”œβ”€β”€ app.py             # Gradio app for HF Spaces
β”œβ”€β”€ requirements-hf.txt # Dependencies for HF deployment
└── README.md          # Documentation

🎨 Customization

Update Data

Modify data.json to add new models or update scores:

{
  "lastUpdated": "2025-09-05",
  "models": [
    {
      "name": "your-model-name",
      "overall_score": 0.750,
      "valid_tool_schema": 99.5,
      "compliance": 98.2,
      // ... other metrics
    }
  ]
}

Styling

Edit style.css to customize:

  • Colors and themes
  • Layout and spacing
  • Responsive breakpoints
  • Animation effects

Functionality

Extend script.js to add:

  • New sorting algorithms
  • Additional filtering options
  • Export functionality
  • Chart visualizations

🌐 Browser Support

  • Chrome 60+
  • Firefox 55+
  • Safari 12+
  • Edge 79+

πŸ“± Mobile Compatibility

The application is fully responsive and optimized for:

  • Tablets (768px - 1024px)
  • Mobile phones (320px - 767px)
  • Large screens (1200px+)

πŸ”§ Technical Details

  • Pure Frontend: No backend dependencies
  • Vanilla JavaScript: No frameworks required
  • Modern CSS: Flexbox, Grid, CSS Variables
  • Progressive Enhancement: Works without JavaScript
  • SEO Friendly: Semantic HTML structure

πŸ“ˆ Performance

  • Lightweight (~50KB total)
  • Fast loading times
  • Optimized images and assets
  • Efficient DOM updates
  • Smooth animations

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test across browsers
  5. Submit a pull request

πŸ“„ License

This project is open source and available under the MIT License.

πŸ™ Acknowledgments

  • Data sourced from MCP Benchmark Results
  • Icons from Font Awesome
  • Fonts from Google Fonts
  • Hosted on Hugging Face Spaces

Last updated: September 2025