Spaces:
Sleeping
Sleeping
OllamaSpace Technical Specifications
Project Overview
OllamaSpace is a web-based chat application that serves as a frontend interface for interacting with Ollama language models. The application provides a real-time chat interface where users can communicate with AI models through a web browser.
Architecture
Backend
- Framework: FastAPI (Python)
- API Gateway: Acts as a proxy between the frontend and Ollama API
- Streaming: Supports real-time streaming of model responses
- Default Model: qwen3:4b
Frontend
- Technology: Pure HTML/CSS/JavaScript (no frameworks)
- Interface: Simple chat interface with message history
- Interaction: Real-time message streaming with typing indicators
- Styling: Clean, minimal design with distinct user/bot message styling
Components
main.py
- Framework: FastAPI
- Authentication: Implements Bearer token authentication using HTTPBearer
- Endpoints:
GET /- Redirects to/chatGET /chat- Serves the chat HTML pagePOST /chat_api- API endpoint that forwards requests to Ollama (requires authentication)
- Functionality:
- Proxies requests to local Ollama API (http://localhost:11434)
- Streams model responses back to the frontend
- Handles error cases and validation
- Auto-generates secure API key if not provided via environment variable
chat.html
- Template: HTML structure for the chat interface with API key management
- Layout:
- Header with API key input and save button
- Chat window area with message history
- Message input field
- Send button
- Static Assets: Links to CSS and JavaScript files
static/script.js
- Features:
- Real-time message streaming from the API
- Message display in chat format
- Enter key support for sending messages
- Stream parsing to handle JSON responses
- API key management with localStorage persistence
- API key input UI with save functionality
- API Communication:
- Includes API key in Authorization header as Bearer token
- POSTs to
/chat_apiendpoint - Receives streaming responses and displays incrementally
- Handles error cases gracefully
static/style.css
- Design: Minimal, clean chat interface with API key management section
- Styling:
- Distinct colors for user vs. bot messages
- Responsive layout
- API key section in header with input field and save button
- Auto-scrolling to latest messages
Deployment
Dockerfile
- Base Image: ollama/ollama
- Environment: Sets up Ollama server and FastAPI gateway
- Port Configuration: Listens on port 7860 (Hugging Face Spaces default)
- Model Setup: Downloads specified model during build process
- Dependencies: Installs Python, FastAPI, and related libraries
start.sh
- Initialization Sequence:
- Starts Ollama server in background
- Health checks the Ollama server
- Starts FastAPI gateway on port 7860
- Error Handling: Waits for Ollama to be ready before starting the gateway
- API Key: If auto-generated, the API key will be displayed in the console logs during startup
Configuration
Environment Variables
OLLAMA_HOST: 0.0.0.0 (allows external connections)OLLAMA_ORIGINS: '*' (allows CORS requests)OLLAMA_MODEL: qwen3:4b (default model, can be overridden)OLLAMA_API_KEY: (optional) Secure API key (auto-generated if not provided)
Default Model
- Model: qwen3:4b
- Fallback: If no model specified in request, uses qwen3:4b
API Key Management
- Generation: If no OLLAMA_API_KEY environment variable is set, a cryptographically secure random key is generated at startup
- Access: Generated API key is displayed in the application logs during startup
- Frontend Storage: API key is stored in browser's localStorage after being entered once
- Authentication: All API requests require a valid Bearer token in the Authorization header
API Specification
/chat_api Endpoint
- Method: POST
- Authentication: Requires Bearer token in Authorization header
- Content-Type: application/json
- Request Headers:
Authorization: Bearer {your_api_key}Content-Type: application/json
- Request Body:
{ "model": "string (optional, defaults to qwen3:4b)", "prompt": "string (required)" } - Response: Streaming response with incremental model output
- Error Handling:
- Returns 401 for invalid API key
- Returns 400 for missing prompt
Data Flow
- Frontend sends user message to
/chat_api - Backend forwards request to local Ollama API
- Ollama processes request with specified model
- Response is streamed back to frontend in real-time
- Frontend displays response incrementally as it arrives
Security Considerations
- API Key Authentication: Required for all API access using Bearer token authentication
- Secure Key Generation: API key is auto-generated using cryptographically secure random generator (secrets.token_urlsafe(32))
- Configurable Keys: API key can be set via environment variable (OLLAMA_API_KEY) or auto-generated
- Storage: Client-side API key stored in browser's localStorage
- CORS: Enabled for all origins (potential security concern in production)
- Input Validation: Validates presence of prompt parameter
- Local API: Communicates with Ollama through localhost only
- Key Exposure: Auto-generated API key is displayed in console logs during startup (should be secured in production)
Performance Features
- Streaming: Real-time response streaming for better UX
- Client-side Display: Incremental message display as responses arrive
- Efficient Communication: Uses streaming HTTP responses to minimize latency
Security Features
- Authentication: Bearer token authentication for all API endpoints
- Key Generation: Cryptographically secure random API key generation using secrets module
- Key Storage: API key stored in browser localStorage (with option to enter via UI)
- Transport Security: API key transmitted via Authorization header (should use HTTPS in production)
Technologies Used
- Backend: Python, FastAPI
- Frontend: HTML5, CSS3, JavaScript (ES6+)
- Containerization: Docker
- AI Model: Ollama with qwen3:4b by default
- Web Server: Uvicorn ASGI server
File Structure
OllamaSpace/
βββ main.py (FastAPI application)
βββ chat.html (Chat interface)
βββ start.sh (Container startup script)
βββ Dockerfile (Container configuration)
βββ README.md (Project description)
βββ static/
β βββ script.js (Frontend JavaScript)
β βββ style.css (Frontend styling)
Build Process
- Container built with Ollama and Python dependencies
- Model specified by OLLAMA_MODEL environment variable is pre-pulled
- Application files are copied into container
- FastAPI dependencies are installed
- Container starts with Ollama server and FastAPI gateway
Deployment Target
- Platform: Designed for Hugging Face Spaces
- Port: 7860 (standard for Hugging Face Spaces)
- Runtime: Docker container
- Model Serving: Ollama with FastAPI gateway