Spaces:
Running
Running
| title: AI Evaluation Dashboard | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| app_port: 3000 | |
| # AI Evaluation Dashboard | |
| This repository is a Next.js application for viewing and authoring AI evaluations. It provides a comprehensive platform for documenting and sharing AI system evaluations across multiple dimensions including capabilities and risks. | |
| ## Project Goals | |
| The AI Evaluation Dashboard aims to: | |
| - **Standardize AI evaluation reporting** across different AI systems and models | |
| - **Facilitate transparency** by providing detailed evaluation cards for AI systems | |
| - **Enable comparative analysis** of AI capabilities and risks | |
| - **Support research and policy** by consolidating evaluation data in an accessible format | |
| - **Promote responsible AI development** through comprehensive risk assessment | |
| ## For External Collaborators | |
| ### Making Changes to Evaluation Categories and Schema | |
| All evaluation categories, form fields, and data structures are centrally managed in the `schema/` folder. **This is the primary location for making structural changes to the evaluation framework.** | |
| Key schema files: | |
| - **`schema/evaluation-schema.json`** - Defines all evaluation categories (capabilities and risks) | |
| - **`schema/output-schema.json`** - Defines the complete data structure for evaluation outputs | |
| - **`schema/system-info-schema.json`** - Defines form field options for system information | |
| - **`schema/category-details.json`** - Contains detailed descriptions and criteria for each category | |
| - **`schema/form-hints.json`** - Provides help text and guidance for form fields | |
| ### Standards and Frameworks Used | |
| The evaluation framework is based on established standards: | |
| - **Risk categories** are derived from **NIST AI 600-1** (AI Risk Management Framework) | |
| - **Capability categories** are based on the **OECD AI Classification Framework** | |
| This ensures consistency with international AI governance standards and facilitates interoperability with other evaluation systems. | |
| ### Contributing Evaluation Data | |
| Evaluation data files are stored in `public/evaluations/` as JSON files. Each file represents a complete evaluation of an AI system and must conform to the schema defined in `schema/output-schema.json`. | |
| To add a new evaluation: | |
| 1. Create a new JSON file in `public/evaluations/` | |
| 2. Follow the structure defined in `schema/output-schema.json` | |
| 3. Ensure all required fields are populated | |
| 4. Validate against the schema before submission | |
| ### Development Setup | |
| ## Run locally | |
| Install dependencies and run the dev server: | |
| ```bash | |
| npm ci | |
| npm run dev | |
| ``` | |
| Build for production and run: | |
| ```bash | |
| npm ci | |
| npm run build | |
| NODE_ENV=production PORT=3000 npm run start | |
| ``` | |
| ## Docker (recommended for Hugging Face Spaces) | |
| A `Dockerfile` is included for deploying this app as a dynamic service on Hugging Face Spaces (Docker runtime). | |
| Build the image locally: | |
| ```bash | |
| docker build -t ai-eval-dashboard . | |
| ``` | |
| Run the container (expose port 3000): | |
| ```bash | |
| docker run -p 3000:3000 -e HF_TOKEN="$HF_TOKEN" ai-eval-dashboard | |
| ``` | |
| Visit `http://localhost:3000` to verify. | |
| ### Deploy to Hugging Face Spaces | |
| 1. Create a new Space at https://huggingface.co/new-space and choose **Docker** as the runtime. | |
| 2. Push this repository to the Space Git (or upload files through the UI). The Space will build the Docker image using the included `Dockerfile` and serve your app on port 3000. | |
| Notes: | |
| - If your build needs native dependencies (e.g. `sharp`), the Docker image may require extra apt packages; update the Dockerfile accordingly. |