| # π§ HuggingFace Spaces Configuration Guide | |
| **Essential configuration options for your AI Dataset Studio Space** | |
| --- | |
| ## π **Required README.md Header** | |
| Every HuggingFace Space **must** have this YAML frontmatter at the very beginning of README.md: | |
| ### **Basic Configuration (Recommended)** | |
| ```yaml | |
| --- | |
| title: AI Dataset Studio | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| --- | |
| ``` | |
| ### **Alternative Configurations** | |
| #### **Professional/Business Version** | |
| ```yaml | |
| --- | |
| title: Enterprise Dataset Studio | |
| emoji: π’ | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| tags: | |
| - machine-learning | |
| - datasets | |
| - nlp | |
| - data-science | |
| - perplexity-ai | |
| --- | |
| ``` | |
| #### **Research/Academic Version** | |
| ```yaml | |
| --- | |
| title: Research Dataset Creator | |
| emoji: π | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - research | |
| - academic | |
| - datasets | |
| - nlp | |
| - ai | |
| --- | |
| ``` | |
| #### **Creative/Colorful Version** | |
| ```yaml | |
| --- | |
| title: AI Dataset Magic β¨ | |
| emoji: π¨ | |
| colorFrom: pink | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - datasets | |
| - creative | |
| - ai-tools | |
| - machine-learning | |
| --- | |
| ``` | |
| --- | |
| ## π¨ **Configuration Options Explained** | |
| ### **Required Fields** | |
| | Field | Description | Example Values | | |
| |-------|-------------|----------------| | |
| | `title` | Space name displayed in UI | `AI Dataset Studio` | | |
| | `emoji` | Icon shown next to title | `π`, `π€`, `π`, `π―` | | |
| | `colorFrom` | Gradient start color | `blue`, `red`, `green`, `purple` | | |
| | `colorTo` | Gradient end color | `purple`, `pink`, `yellow`, `blue` | | |
| | `sdk` | Framework used | `gradio` (for our app) | | |
| | `sdk_version` | SDK version | `"4.44.0"` | | |
| | `app_file` | Main application file | `app.py` | | |
| ### **Optional Fields** | |
| | Field | Description | Example Values | | |
| |-------|-------------|----------------| | |
| | `pinned` | Pin to your profile | `true`, `false` | | |
| | `license` | Software license | `mit`, `apache-2.0`, `gpl-3.0` | | |
| | `tags` | Searchable keywords | `machine-learning`, `nlp`, `datasets` | | |
| | `models` | Referenced models | `facebook/bart-large-cnn` | | |
| | `datasets` | Referenced datasets | `imdb`, `sentiment140` | | |
| --- | |
| ## π― **Popular Color Combinations** | |
| ### **Professional Themes** | |
| ```yaml | |
| # Corporate Blue | |
| colorFrom: blue | |
| colorTo: indigo | |
| # Business Gray | |
| colorFrom: gray | |
| colorTo: blue | |
| # Tech Green | |
| colorFrom: green | |
| colorTo: teal | |
| ``` | |
| ### **Creative Themes** | |
| ```yaml | |
| # Sunset | |
| colorFrom: orange | |
| colorTo: red | |
| # Ocean | |
| colorFrom: blue | |
| colorTo: cyan | |
| # Forest | |
| colorFrom: green | |
| colorTo: yellow | |
| # Galaxy | |
| colorFrom: purple | |
| colorTo: pink | |
| ``` | |
| ### **AI/Tech Themes** | |
| ```yaml | |
| # Matrix | |
| colorFrom: green | |
| colorTo: black | |
| # Cyberpunk | |
| colorFrom: purple | |
| colorTo: blue | |
| # Neural | |
| colorFrom: blue | |
| colorTo: purple | |
| ``` | |
| --- | |
| ## π·οΈ **Recommended Tags** | |
| ### **For AI Dataset Studio** | |
| ```yaml | |
| tags: | |
| - machine-learning | |
| - datasets | |
| - nlp | |
| - data-science | |
| - perplexity-ai | |
| - web-scraping | |
| - sentiment-analysis | |
| - text-classification | |
| - ai-tools | |
| - data-collection | |
| ``` | |
| ### **By Use Case** | |
| #### **Business/Enterprise** | |
| ```yaml | |
| tags: | |
| - business-intelligence | |
| - enterprise | |
| - data-analytics | |
| - market-research | |
| - customer-insights | |
| ``` | |
| #### **Research/Academic** | |
| ```yaml | |
| tags: | |
| - research | |
| - academic | |
| - scientific | |
| - literature-review | |
| - research-tools | |
| ``` | |
| #### **Developer Tools** | |
| ```yaml | |
| tags: | |
| - developer-tools | |
| - api | |
| - automation | |
| - productivity | |
| - data-engineering | |
| ``` | |
| --- | |
| ## π **Hardware Configuration** | |
| The Space configuration also affects hardware selection: | |
| ### **Hardware Options** | |
| ```yaml | |
| # In Space settings (not README.md): | |
| # - CPU Basic (free) | |
| # - CPU Upgrade ($0.03/hour) | |
| # - T4 Small ($0.60/hour) β Recommended | |
| # - T4 Medium ($1.20/hour) | |
| # - A10G Small ($1.05/hour) | |
| # - A10G Large ($3.15/hour) | |
| ``` | |
| ### **Memory Requirements** | |
| ```yaml | |
| # Our application needs: | |
| # - Base app: ~200MB | |
| # - AI models: ~2-4GB | |
| # - Processing: ~1-2GB | |
| # Total: ~4-6GB recommended (T4 Small = 16GB) | |
| ``` | |
| --- | |
| ## π **Environment Variables** | |
| Set these in Space Settings β Repository secrets: | |
| ### **Required** | |
| ```bash | |
| PERPLEXITY_API_KEY = "your_perplexity_api_key_here" | |
| ``` | |
| ### **Optional** | |
| ```bash | |
| # HuggingFace integration | |
| HF_TOKEN = "your_huggingface_token" | |
| # Performance tuning | |
| MAX_SOURCES_PER_SEARCH = "50" | |
| REQUEST_TIMEOUT = "30" | |
| LOG_LEVEL = "INFO" | |
| # Feature flags | |
| ENABLE_DEBUG_MODE = "false" | |
| ENABLE_CACHING = "true" | |
| ``` | |
| --- | |
| ## β **Validation Checklist** | |
| Before deploying, ensure: | |
| - [ ] β YAML frontmatter is at the very beginning of README.md | |
| - [ ] β No spaces before the opening `---` | |
| - [ ] β Proper YAML syntax (quotes around version numbers) | |
| - [ ] β `app_file: app.py` matches your main file name | |
| - [ ] β SDK version matches your requirements.txt | |
| - [ ] β Title and emoji are appropriate for your audience | |
| - [ ] β Tags are relevant and searchable | |
| - [ ] β PERPLEXITY_API_KEY is set in Space secrets | |
| --- | |
| ## π¨ **Common Configuration Errors** | |
| ### **β Missing Frontmatter** | |
| ```markdown | |
| # π AI Dataset Studio β ERROR: No YAML header | |
| ``` | |
| ### **β Correct Format** | |
| ```markdown | |
| --- | |
| title: AI Dataset Studio | |
| emoji: π | |
| sdk: gradio | |
| --- | |
| # π AI Dataset Studio β Correct: Content after YAML | |
| ``` | |
| ### **β Wrong SDK Version Format** | |
| ```yaml | |
| sdk_version: 4.44.0 β ERROR: Missing quotes | |
| ``` | |
| ### **β Correct Format** | |
| ```yaml | |
| sdk_version: "4.44.0" β Correct: Quoted string | |
| ``` | |
| ### **β Invalid App File** | |
| ```yaml | |
| app_file: main.py β ERROR: File doesn't exist | |
| ``` | |
| ### **β Correct Format** | |
| ```yaml | |
| app_file: app.py β Correct: Matches actual filename | |
| ``` | |
| --- | |
| ## π **Updating Configuration** | |
| To change your Space configuration: | |
| 1. **Edit README.md** | |
| - Update the YAML frontmatter | |
| - Commit changes to git | |
| 2. **Space will automatically rebuild** | |
| - Changes take effect immediately | |
| - Monitor build logs for errors | |
| 3. **Hardware changes** | |
| - Go to Space Settings | |
| - Change hardware tier | |
| - Restart Space | |
| --- | |
| ## π **Example Complete README.md Start** | |
| Here's how your README.md should begin: | |
| ```markdown | |
| --- | |
| title: AI Dataset Studio | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tags: | |
| - machine-learning | |
| - datasets | |
| - nlp | |
| - perplexity-ai | |
| - data-science | |
| --- | |
| # π AI Dataset Studio | |
| **Create high-quality training datasets with AI-powered source discovery** | |
| A comprehensive platform for building ML datasets that combines web scraping, AI processing, and smart source discovery using Perplexity AI... | |
| ``` | |
| --- | |
| ## π‘ **Pro Tips** | |
| 1. **Choose memorable titles** - They appear in search results | |
| 2. **Use relevant emojis** - They make your Space stand out | |
| 3. **Pick good color combinations** - They create visual appeal | |
| 4. **Add comprehensive tags** - They improve discoverability | |
| 5. **Pin important Spaces** - They appear prominently on your profile | |
| 6. **Use appropriate licenses** - MIT or Apache-2.0 for open source | |
| --- | |
| **Your Space configuration is now properly set up for deployment! π** |