Spaces:
Sleeping
Sleeping
File size: 7,253 Bytes
93c0ec9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# OllamaSpace Technical Specifications
## Project Overview
OllamaSpace is a web-based chat application that serves as a frontend interface for interacting with Ollama language models. The application provides a real-time chat interface where users can communicate with AI models through a web browser.
## Architecture
### Backend
- **Framework**: FastAPI (Python)
- **API Gateway**: Acts as a proxy between the frontend and Ollama API
- **Streaming**: Supports real-time streaming of model responses
- **Default Model**: qwen3:4b
### Frontend
- **Technology**: Pure HTML/CSS/JavaScript (no frameworks)
- **Interface**: Simple chat interface with message history
- **Interaction**: Real-time message streaming with typing indicators
- **Styling**: Clean, minimal design with distinct user/bot message styling
## Components
### main.py
- **Framework**: FastAPI
- **Authentication**: Implements Bearer token authentication using HTTPBearer
- **Endpoints**:
- `GET /` - Redirects to `/chat`
- `GET /chat` - Serves the chat HTML page
- `POST /chat_api` - API endpoint that forwards requests to Ollama (requires authentication)
- **Functionality**:
- Proxies requests to local Ollama API (http://localhost:11434)
- Streams model responses back to the frontend
- Handles error cases and validation
- Auto-generates secure API key if not provided via environment variable
### chat.html
- **Template**: HTML structure for the chat interface with API key management
- **Layout**:
- Header with API key input and save button
- Chat window area with message history
- Message input field
- Send button
- **Static Assets**: Links to CSS and JavaScript files
### static/script.js
- **Features**:
- Real-time message streaming from the API
- Message display in chat format
- Enter key support for sending messages
- Stream parsing to handle JSON responses
- API key management with localStorage persistence
- API key input UI with save functionality
- **API Communication**:
- Includes API key in Authorization header as Bearer token
- POSTs to `/chat_api` endpoint
- Receives streaming responses and displays incrementally
- Handles error cases gracefully
### static/style.css
- **Design**: Minimal, clean chat interface with API key management section
- **Styling**:
- Distinct colors for user vs. bot messages
- Responsive layout
- API key section in header with input field and save button
- Auto-scrolling to latest messages
## Deployment
### Dockerfile
- **Base Image**: ollama/ollama
- **Environment**: Sets up Ollama server and FastAPI gateway
- **Port Configuration**: Listens on port 7860 (Hugging Face Spaces default)
- **Model Setup**: Downloads specified model during build process
- **Dependencies**: Installs Python, FastAPI, and related libraries
### start.sh
- **Initialization Sequence**:
1. Starts Ollama server in background
2. Health checks the Ollama server
3. Starts FastAPI gateway on port 7860
- **Error Handling**: Waits for Ollama to be ready before starting the gateway
- **API Key**: If auto-generated, the API key will be displayed in the console logs during startup
## Configuration
### Environment Variables
- `OLLAMA_HOST`: 0.0.0.0 (allows external connections)
- `OLLAMA_ORIGINS`: '*' (allows CORS requests)
- `OLLAMA_MODEL`: qwen3:4b (default model, can be overridden)
- `OLLAMA_API_KEY`: (optional) Secure API key (auto-generated if not provided)
### Default Model
- **Model**: qwen3:4b
- **Fallback**: If no model specified in request, uses qwen3:4b
### API Key Management
- **Generation**: If no OLLAMA_API_KEY environment variable is set, a cryptographically secure random key is generated at startup
- **Access**: Generated API key is displayed in the application logs during startup
- **Frontend Storage**: API key is stored in browser's localStorage after being entered once
- **Authentication**: All API requests require a valid Bearer token in the Authorization header
## API Specification
### `/chat_api` Endpoint
- **Method**: POST
- **Authentication**: Requires Bearer token in Authorization header
- **Content-Type**: application/json
- **Request Headers**:
- `Authorization`: Bearer {your_api_key}
- `Content-Type`: application/json
- **Request Body**:
```json
{
"model": "string (optional, defaults to qwen3:4b)",
"prompt": "string (required)"
}
```
- **Response**: Streaming response with incremental model output
- **Error Handling**:
- Returns 401 for invalid API key
- Returns 400 for missing prompt
### Data Flow
1. Frontend sends user message to `/chat_api`
2. Backend forwards request to local Ollama API
3. Ollama processes request with specified model
4. Response is streamed back to frontend in real-time
5. Frontend displays response incrementally as it arrives
## Security Considerations
- **API Key Authentication**: Required for all API access using Bearer token authentication
- **Secure Key Generation**: API key is auto-generated using cryptographically secure random generator (secrets.token_urlsafe(32))
- **Configurable Keys**: API key can be set via environment variable (OLLAMA_API_KEY) or auto-generated
- **Storage**: Client-side API key stored in browser's localStorage
- **CORS**: Enabled for all origins (potential security concern in production)
- **Input Validation**: Validates presence of prompt parameter
- **Local API**: Communicates with Ollama through localhost only
- **Key Exposure**: Auto-generated API key is displayed in console logs during startup (should be secured in production)
## Performance Features
- **Streaming**: Real-time response streaming for better UX
- **Client-side Display**: Incremental message display as responses arrive
- **Efficient Communication**: Uses streaming HTTP responses to minimize latency
## Security Features
- **Authentication**: Bearer token authentication for all API endpoints
- **Key Generation**: Cryptographically secure random API key generation using secrets module
- **Key Storage**: API key stored in browser localStorage (with option to enter via UI)
- **Transport Security**: API key transmitted via Authorization header (should use HTTPS in production)
## Technologies Used
- **Backend**: Python, FastAPI
- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
- **Containerization**: Docker
- **AI Model**: Ollama with qwen3:4b by default
- **Web Server**: Uvicorn ASGI server
## File Structure
```
OllamaSpace/
βββ main.py (FastAPI application)
βββ chat.html (Chat interface)
βββ start.sh (Container startup script)
βββ Dockerfile (Container configuration)
βββ README.md (Project description)
βββ static/
β βββ script.js (Frontend JavaScript)
β βββ style.css (Frontend styling)
```
## Build Process
1. Container built with Ollama and Python dependencies
2. Model specified by OLLAMA_MODEL environment variable is pre-pulled
3. Application files are copied into container
4. FastAPI dependencies are installed
5. Container starts with Ollama server and FastAPI gateway
## Deployment Target
- **Platform**: Designed for Hugging Face Spaces
- **Port**: 7860 (standard for Hugging Face Spaces)
- **Runtime**: Docker container
- **Model Serving**: Ollama with FastAPI gateway |