ARCH.AI_SPACE / tech_spec.md
FadeClip's picture
Add API key authentication and tech specifications document
93c0ec9
# OllamaSpace Technical Specifications
## Project Overview
OllamaSpace is a web-based chat application that serves as a frontend interface for interacting with Ollama language models. The application provides a real-time chat interface where users can communicate with AI models through a web browser.
## Architecture
### Backend
- **Framework**: FastAPI (Python)
- **API Gateway**: Acts as a proxy between the frontend and Ollama API
- **Streaming**: Supports real-time streaming of model responses
- **Default Model**: qwen3:4b
### Frontend
- **Technology**: Pure HTML/CSS/JavaScript (no frameworks)
- **Interface**: Simple chat interface with message history
- **Interaction**: Real-time message streaming with typing indicators
- **Styling**: Clean, minimal design with distinct user/bot message styling
## Components
### main.py
- **Framework**: FastAPI
- **Authentication**: Implements Bearer token authentication using HTTPBearer
- **Endpoints**:
- `GET /` - Redirects to `/chat`
- `GET /chat` - Serves the chat HTML page
- `POST /chat_api` - API endpoint that forwards requests to Ollama (requires authentication)
- **Functionality**:
- Proxies requests to local Ollama API (http://localhost:11434)
- Streams model responses back to the frontend
- Handles error cases and validation
- Auto-generates secure API key if not provided via environment variable
### chat.html
- **Template**: HTML structure for the chat interface with API key management
- **Layout**:
- Header with API key input and save button
- Chat window area with message history
- Message input field
- Send button
- **Static Assets**: Links to CSS and JavaScript files
### static/script.js
- **Features**:
- Real-time message streaming from the API
- Message display in chat format
- Enter key support for sending messages
- Stream parsing to handle JSON responses
- API key management with localStorage persistence
- API key input UI with save functionality
- **API Communication**:
- Includes API key in Authorization header as Bearer token
- POSTs to `/chat_api` endpoint
- Receives streaming responses and displays incrementally
- Handles error cases gracefully
### static/style.css
- **Design**: Minimal, clean chat interface with API key management section
- **Styling**:
- Distinct colors for user vs. bot messages
- Responsive layout
- API key section in header with input field and save button
- Auto-scrolling to latest messages
## Deployment
### Dockerfile
- **Base Image**: ollama/ollama
- **Environment**: Sets up Ollama server and FastAPI gateway
- **Port Configuration**: Listens on port 7860 (Hugging Face Spaces default)
- **Model Setup**: Downloads specified model during build process
- **Dependencies**: Installs Python, FastAPI, and related libraries
### start.sh
- **Initialization Sequence**:
1. Starts Ollama server in background
2. Health checks the Ollama server
3. Starts FastAPI gateway on port 7860
- **Error Handling**: Waits for Ollama to be ready before starting the gateway
- **API Key**: If auto-generated, the API key will be displayed in the console logs during startup
## Configuration
### Environment Variables
- `OLLAMA_HOST`: 0.0.0.0 (allows external connections)
- `OLLAMA_ORIGINS`: '*' (allows CORS requests)
- `OLLAMA_MODEL`: qwen3:4b (default model, can be overridden)
- `OLLAMA_API_KEY`: (optional) Secure API key (auto-generated if not provided)
### Default Model
- **Model**: qwen3:4b
- **Fallback**: If no model specified in request, uses qwen3:4b
### API Key Management
- **Generation**: If no OLLAMA_API_KEY environment variable is set, a cryptographically secure random key is generated at startup
- **Access**: Generated API key is displayed in the application logs during startup
- **Frontend Storage**: API key is stored in browser's localStorage after being entered once
- **Authentication**: All API requests require a valid Bearer token in the Authorization header
## API Specification
### `/chat_api` Endpoint
- **Method**: POST
- **Authentication**: Requires Bearer token in Authorization header
- **Content-Type**: application/json
- **Request Headers**:
- `Authorization`: Bearer {your_api_key}
- `Content-Type`: application/json
- **Request Body**:
```json
{
"model": "string (optional, defaults to qwen3:4b)",
"prompt": "string (required)"
}
```
- **Response**: Streaming response with incremental model output
- **Error Handling**:
- Returns 401 for invalid API key
- Returns 400 for missing prompt
### Data Flow
1. Frontend sends user message to `/chat_api`
2. Backend forwards request to local Ollama API
3. Ollama processes request with specified model
4. Response is streamed back to frontend in real-time
5. Frontend displays response incrementally as it arrives
## Security Considerations
- **API Key Authentication**: Required for all API access using Bearer token authentication
- **Secure Key Generation**: API key is auto-generated using cryptographically secure random generator (secrets.token_urlsafe(32))
- **Configurable Keys**: API key can be set via environment variable (OLLAMA_API_KEY) or auto-generated
- **Storage**: Client-side API key stored in browser's localStorage
- **CORS**: Enabled for all origins (potential security concern in production)
- **Input Validation**: Validates presence of prompt parameter
- **Local API**: Communicates with Ollama through localhost only
- **Key Exposure**: Auto-generated API key is displayed in console logs during startup (should be secured in production)
## Performance Features
- **Streaming**: Real-time response streaming for better UX
- **Client-side Display**: Incremental message display as responses arrive
- **Efficient Communication**: Uses streaming HTTP responses to minimize latency
## Security Features
- **Authentication**: Bearer token authentication for all API endpoints
- **Key Generation**: Cryptographically secure random API key generation using secrets module
- **Key Storage**: API key stored in browser localStorage (with option to enter via UI)
- **Transport Security**: API key transmitted via Authorization header (should use HTTPS in production)
## Technologies Used
- **Backend**: Python, FastAPI
- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
- **Containerization**: Docker
- **AI Model**: Ollama with qwen3:4b by default
- **Web Server**: Uvicorn ASGI server
## File Structure
```
OllamaSpace/
β”œβ”€β”€ main.py (FastAPI application)
β”œβ”€β”€ chat.html (Chat interface)
β”œβ”€β”€ start.sh (Container startup script)
β”œβ”€β”€ Dockerfile (Container configuration)
β”œβ”€β”€ README.md (Project description)
β”œβ”€β”€ static/
β”‚ β”œβ”€β”€ script.js (Frontend JavaScript)
β”‚ └── style.css (Frontend styling)
```
## Build Process
1. Container built with Ollama and Python dependencies
2. Model specified by OLLAMA_MODEL environment variable is pre-pulled
3. Application files are copied into container
4. FastAPI dependencies are installed
5. Container starts with Ollama server and FastAPI gateway
## Deployment Target
- **Platform**: Designed for Hugging Face Spaces
- **Port**: 7860 (standard for Hugging Face Spaces)
- **Runtime**: Docker container
- **Model Serving**: Ollama with FastAPI gateway