ARCH.AI_SPACE / tech_spec.md
FadeClip's picture
Add API key authentication and tech specifications document
93c0ec9

OllamaSpace Technical Specifications

Project Overview

OllamaSpace is a web-based chat application that serves as a frontend interface for interacting with Ollama language models. The application provides a real-time chat interface where users can communicate with AI models through a web browser.

Architecture

Backend

  • Framework: FastAPI (Python)
  • API Gateway: Acts as a proxy between the frontend and Ollama API
  • Streaming: Supports real-time streaming of model responses
  • Default Model: qwen3:4b

Frontend

  • Technology: Pure HTML/CSS/JavaScript (no frameworks)
  • Interface: Simple chat interface with message history
  • Interaction: Real-time message streaming with typing indicators
  • Styling: Clean, minimal design with distinct user/bot message styling

Components

main.py

  • Framework: FastAPI
  • Authentication: Implements Bearer token authentication using HTTPBearer
  • Endpoints:
    • GET / - Redirects to /chat
    • GET /chat - Serves the chat HTML page
    • POST /chat_api - API endpoint that forwards requests to Ollama (requires authentication)
  • Functionality:
    • Proxies requests to local Ollama API (http://localhost:11434)
    • Streams model responses back to the frontend
    • Handles error cases and validation
    • Auto-generates secure API key if not provided via environment variable

chat.html

  • Template: HTML structure for the chat interface with API key management
  • Layout:
    • Header with API key input and save button
    • Chat window area with message history
    • Message input field
    • Send button
  • Static Assets: Links to CSS and JavaScript files

static/script.js

  • Features:
    • Real-time message streaming from the API
    • Message display in chat format
    • Enter key support for sending messages
    • Stream parsing to handle JSON responses
    • API key management with localStorage persistence
    • API key input UI with save functionality
  • API Communication:
    • Includes API key in Authorization header as Bearer token
    • POSTs to /chat_api endpoint
    • Receives streaming responses and displays incrementally
    • Handles error cases gracefully

static/style.css

  • Design: Minimal, clean chat interface with API key management section
  • Styling:
    • Distinct colors for user vs. bot messages
    • Responsive layout
    • API key section in header with input field and save button
    • Auto-scrolling to latest messages

Deployment

Dockerfile

  • Base Image: ollama/ollama
  • Environment: Sets up Ollama server and FastAPI gateway
  • Port Configuration: Listens on port 7860 (Hugging Face Spaces default)
  • Model Setup: Downloads specified model during build process
  • Dependencies: Installs Python, FastAPI, and related libraries

start.sh

  • Initialization Sequence:
    1. Starts Ollama server in background
    2. Health checks the Ollama server
    3. Starts FastAPI gateway on port 7860
  • Error Handling: Waits for Ollama to be ready before starting the gateway
  • API Key: If auto-generated, the API key will be displayed in the console logs during startup

Configuration

Environment Variables

  • OLLAMA_HOST: 0.0.0.0 (allows external connections)
  • OLLAMA_ORIGINS: '*' (allows CORS requests)
  • OLLAMA_MODEL: qwen3:4b (default model, can be overridden)
  • OLLAMA_API_KEY: (optional) Secure API key (auto-generated if not provided)

Default Model

  • Model: qwen3:4b
  • Fallback: If no model specified in request, uses qwen3:4b

API Key Management

  • Generation: If no OLLAMA_API_KEY environment variable is set, a cryptographically secure random key is generated at startup
  • Access: Generated API key is displayed in the application logs during startup
  • Frontend Storage: API key is stored in browser's localStorage after being entered once
  • Authentication: All API requests require a valid Bearer token in the Authorization header

API Specification

/chat_api Endpoint

  • Method: POST
  • Authentication: Requires Bearer token in Authorization header
  • Content-Type: application/json
  • Request Headers:
    • Authorization: Bearer {your_api_key}
    • Content-Type: application/json
  • Request Body:
    {
      "model": "string (optional, defaults to qwen3:4b)",
      "prompt": "string (required)"
    }
    
  • Response: Streaming response with incremental model output
  • Error Handling:
    • Returns 401 for invalid API key
    • Returns 400 for missing prompt

Data Flow

  1. Frontend sends user message to /chat_api
  2. Backend forwards request to local Ollama API
  3. Ollama processes request with specified model
  4. Response is streamed back to frontend in real-time
  5. Frontend displays response incrementally as it arrives

Security Considerations

  • API Key Authentication: Required for all API access using Bearer token authentication
  • Secure Key Generation: API key is auto-generated using cryptographically secure random generator (secrets.token_urlsafe(32))
  • Configurable Keys: API key can be set via environment variable (OLLAMA_API_KEY) or auto-generated
  • Storage: Client-side API key stored in browser's localStorage
  • CORS: Enabled for all origins (potential security concern in production)
  • Input Validation: Validates presence of prompt parameter
  • Local API: Communicates with Ollama through localhost only
  • Key Exposure: Auto-generated API key is displayed in console logs during startup (should be secured in production)

Performance Features

  • Streaming: Real-time response streaming for better UX
  • Client-side Display: Incremental message display as responses arrive
  • Efficient Communication: Uses streaming HTTP responses to minimize latency

Security Features

  • Authentication: Bearer token authentication for all API endpoints
  • Key Generation: Cryptographically secure random API key generation using secrets module
  • Key Storage: API key stored in browser localStorage (with option to enter via UI)
  • Transport Security: API key transmitted via Authorization header (should use HTTPS in production)

Technologies Used

  • Backend: Python, FastAPI
  • Frontend: HTML5, CSS3, JavaScript (ES6+)
  • Containerization: Docker
  • AI Model: Ollama with qwen3:4b by default
  • Web Server: Uvicorn ASGI server

File Structure

OllamaSpace/
β”œβ”€β”€ main.py (FastAPI application)
β”œβ”€β”€ chat.html (Chat interface)
β”œβ”€β”€ start.sh (Container startup script)
β”œβ”€β”€ Dockerfile (Container configuration)
β”œβ”€β”€ README.md (Project description)
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ script.js (Frontend JavaScript)
β”‚   └── style.css (Frontend styling)

Build Process

  1. Container built with Ollama and Python dependencies
  2. Model specified by OLLAMA_MODEL environment variable is pre-pulled
  3. Application files are copied into container
  4. FastAPI dependencies are installed
  5. Container starts with Ollama server and FastAPI gateway

Deployment Target

  • Platform: Designed for Hugging Face Spaces
  • Port: 7860 (standard for Hugging Face Spaces)
  • Runtime: Docker container
  • Model Serving: Ollama with FastAPI gateway