Spaces:

crosse712
/

fastvlm-screen-observer

Paused

App Files Files Community

crosse712 commited on Sep 8

Commit

0bfacdd

1 Parent(s): 08511e7

Configure for Hugging Face Spaces deployment

Browse files

Files changed (4) hide show

Dockerfile +64 -37
Dockerfile.original +61 -0
README.md +50 -161
README_ORIGINAL.md +167 -0

Dockerfile CHANGED Viewed

@@ -1,61 +1,88 @@
-# Multi-stage build for optimized image size
-FROM python:3.9-slim as builder
-WORKDIR /app
-# Install build dependencies
-RUN apt-get update && apt-get install -y \
-    gcc \
-    g++ \
-    git \
-    && rm -rf /var/lib/apt/lists/*
-# Copy and install Python dependencies
-COPY backend/requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-# Production stage
 FROM python:3.9-slim
 WORKDIR /app
-# Install runtime dependencies
 RUN apt-get update && apt-get install -y \
     libgomp1 \
     libglib2.0-0 \
     libsm6 \
     libxext6 \
     libxrender1 \
-    libgomp1 \
-    wget \
     && rm -rf /var/lib/apt/lists/*
-# Copy Python packages from builder
-COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
-COPY --from=builder /usr/local/bin /usr/local/bin
-# Copy application code
 COPY backend/ ./backend/
-# Set environment variables for memory optimization
-ENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
-ENV OMP_NUM_THREADS=4
-ENV MKL_NUM_THREADS=4
-ENV NUMEXPR_NUM_THREADS=4
-ENV TOKENIZERS_PARALLELISM=false
-# Enable extreme memory optimization
-ENV USE_EXTREME_OPTIMIZATION=true
-ENV MAX_MEMORY_GB=3
-WORKDIR /app/backend
-# Expose port
-EXPOSE 8000
 # Health check
 HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
-    CMD curl -f http://localhost:8000/ || exit 1
-# Start the application with memory-limited configuration
-CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]

+# Hugging Face Spaces Dockerfile - Frontend + Backend
+FROM node:18-slim as frontend-builder
+WORKDIR /app/frontend
+COPY frontend/package*.json ./
+RUN npm ci
+COPY frontend/ ./
+RUN npm run build
+# Python backend stage
 FROM python:3.9-slim
 WORKDIR /app
+# Install system dependencies
 RUN apt-get update && apt-get install -y \
+    nginx \
+    supervisor \
     libgomp1 \
     libglib2.0-0 \
     libsm6 \
     libxext6 \
     libxrender1 \
+    curl \
     && rm -rf /var/lib/apt/lists/*
+# Install Python dependencies
+COPY backend/requirements.txt ./backend/
+RUN pip install --no-cache-dir -r backend/requirements.txt
+# Copy backend code
 COPY backend/ ./backend/
+# Copy frontend build from builder stage
+COPY --from=frontend-builder /app/frontend/dist /usr/share/nginx/html
+# Configure nginx to serve frontend and proxy to backend
+RUN echo 'server { \n\
+    listen 7860; \n\
+    root /usr/share/nginx/html; \n\
+    index index.html; \n\
+    \n\
+    location / { \n\
+        try_files $uri $uri/ /index.html; \n\
+    } \n\
+    \n\
+    location /api/ { \n\
+        proxy_pass http://127.0.0.1:8000/; \n\
+        proxy_http_version 1.1; \n\
+        proxy_set_header Upgrade $http_upgrade; \n\
+        proxy_set_header Connection "upgrade"; \n\
+        proxy_set_header Host $host; \n\
+        proxy_set_header X-Real-IP $remote_addr; \n\
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; \n\
+        proxy_set_header X-Forwarded-Proto $scheme; \n\
+        proxy_buffering off; \n\
+    } \n\
+}' > /etc/nginx/sites-available/default
+# Create supervisor config
+RUN echo '[supervisord] \n\
+nodaemon=true \n\
+\n\
+[program:nginx] \n\
+command=nginx -g "daemon off;" \n\
+autostart=true \n\
+autorestart=true \n\
+stderr_logfile=/var/log/nginx.err.log \n\
+stdout_logfile=/var/log/nginx.out.log \n\
+\n\
+[program:backend] \n\
+command=python -m uvicorn backend.app.main:app --host 127.0.0.1 --port 8000 \n\
+directory=/app \n\
+autostart=true \n\
+autorestart=true \n\
+stderr_logfile=/var/log/backend.err.log \n\
+stdout_logfile=/var/log/backend.out.log \n\
+environment=USE_EXTREME_OPTIMIZATION="true",MAX_MEMORY_GB="3"' > /etc/supervisor/conf.d/supervisord.conf
+# Expose Hugging Face Spaces default port
+EXPOSE 7860
 # Health check
 HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:7860/ || exit 1
+# Start supervisor
+CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

Dockerfile.original ADDED Viewed

	@@ -0,0 +1,61 @@

+# Multi-stage build for optimized image size
+FROM python:3.9-slim as builder
+WORKDIR /app
+# Install build dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    g++ \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy and install Python dependencies
+COPY backend/requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Production stage
+FROM python:3.9-slim
+WORKDIR /app
+# Install runtime dependencies
+RUN apt-get update && apt-get install -y \
+    libgomp1 \
+    libglib2.0-0 \
+    libsm6 \
+    libxext6 \
+    libxrender1 \
+    libgomp1 \
+    wget \
+    && rm -rf /var/lib/apt/lists/*
+# Copy Python packages from builder
+COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
+COPY --from=builder /usr/local/bin /usr/local/bin
+# Copy application code
+COPY backend/ ./backend/
+# Set environment variables for memory optimization
+ENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
+ENV OMP_NUM_THREADS=4
+ENV MKL_NUM_THREADS=4
+ENV NUMEXPR_NUM_THREADS=4
+ENV TOKENIZERS_PARALLELISM=false
+# Enable extreme memory optimization
+ENV USE_EXTREME_OPTIMIZATION=true
+ENV MAX_MEMORY_GB=3
+WORKDIR /app/backend
+# Expose port
+EXPOSE 8000
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:8000/ || exit 1
+# Start the application with memory-limited configuration
+CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]

README.md CHANGED Viewed

@@ -1,167 +1,56 @@
-# FastVLM-7B Screen Observer
-A local web application for real-time screen observation and analysis using Apple's FastVLM-7B model via HuggingFace.
 ## Features
-- **Real-time Screen Capture**: Capture and analyze screen content on-demand or automatically
-- **FastVLM-7B Integration**: Uses Apple's vision-language model for intelligent screen analysis
-- **UI Element Detection**: Identifies buttons, links, forms, and other interface elements
-- **Text Extraction**: Captures text snippets from the screen
-- **Risk Detection**: Flags potential security or privacy concerns
-- **Automation Demo**: Demonstrates browser automation capabilities
-- **NDJSON Logging**: Comprehensive logging in NDJSON format with timestamps
-- **Export Functionality**: Download logs and captured frames as ZIP archive
-## Specifications
-- **Frontend**: React + Vite on `http://localhost:5173`
-- **Backend**: FastAPI on `http://localhost:8000`
-- **Model**: Apple FastVLM-7B with `trust_remote_code=True`
-- **Image Token**: `IMAGE_TOKEN_INDEX = -200`
-- **Output Format**: JSON with summary, ui_elements, text_snippets, risk_flags
-## Prerequisites
-- Python 3.8+
-- Node.js 16+
-- Chrome/Chromium browser (for automation demo)
-- 14GB+ RAM (required for FastVLM-7B model weights)
-- CUDA-capable GPU or Apple Silicon (recommended for FastVLM-7B)
-## Installation
-1. Clone this repository:
-```bash
-cd fastvlm-screen-observer
-```
-2. Install Python dependencies:
-```bash
-cd backend
-python3 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-pip install -r requirements.txt
-```
-3. Install Node.js dependencies:
-```bash
-cd ../frontend
-npm install
-```
-## Running the Application
-### Option 1: Using the start script (Recommended)
-```bash
-./start.sh
-```
-### Option 2: Manual start
-Terminal 1 - Backend:
-```bash
-cd backend
-source venv/bin/activate
-uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
-```
-Terminal 2 - Frontend:
-```bash
-cd frontend
-npm run dev
-```
-## Usage
-1. Open your browser and navigate to `http://localhost:5173`
-2. Click "Capture Screen" to analyze the current screen
-3. Enable "Auto Capture" for continuous monitoring
-4. Use "Run Demo" to see browser automation in action
-5. Click "Export Logs" to download analysis data
 ## API Endpoints
-- `GET /` - API status check
-- `POST /analyze` - Capture and analyze screen
-- `POST /demo` - Run automation demo
-- `GET /export` - Export logs as ZIP
-- `GET /logs/stream` - Stream logs via SSE
-- `GET /docs` - Interactive API documentation
-## Project Structure
-```
-fastvlm-screen-observer/
-├── backend/
-│   ├── app/
-│   │   └── main.py              # FastAPI application
-│   ├── models/
-│   │   ├── fastvlm_model.py     # FastVLM-7B main integration
-│   │   ├── fastvlm_optimized.py # Memory optimization strategies
-│   │   ├── fastvlm_extreme.py   # Extreme optimization (4-bit)
-│   │   └── use_fastvlm_small.py # Alternative 1.5B model
-│   ├── utils/
-│   │   ├── screen_capture.py    # Screen capture utilities
-│   │   ├── automation.py        # Browser automation
-│   │   └── logger.py            # NDJSON logging
-│   └── requirements.txt
-├── frontend/
-│   ├── src/
-│   │   ├── App.jsx              # React main component (with error handling)
-│   │   ├── ScreenCapture.jsx    # WebRTC screen capture
-│   │   └── App.css              # Styling
-│   ├── package.json
-│   └��─ vite.config.js
-├── logs/                         # Generated logs and frames
-├── start.sh                      # Startup script
-└── README.md
-```
-## Model Notes
-The application uses Apple's FastVLM-7B model with the following specifications:
-- **Model ID**: `apple/FastVLM-7B` from HuggingFace
-- **Tokenizer**: Qwen2Tokenizer (requires `transformers>=4.40.0`)
-- **IMAGE_TOKEN_INDEX**: -200 (special token for image placeholders)
-- **trust_remote_code**: True (required for model loading)
-### Memory Requirements:
-- **Minimum**: 14GB RAM for model weights
-- **Recommended**: 16GB+ RAM for smooth operation
-- The model will download automatically on first run (~14GB)
-### Current Implementation:
-The system includes multiple optimization strategies:
-1. **Standard Mode**: Full precision (float16) - requires 14GB+ RAM
-2. **Optimized Mode**: 8-bit quantization - requires 8-10GB RAM
-3. **Extreme Mode**: 4-bit quantization with disk offloading - requires 6-8GB RAM
-If the model fails to load due to memory constraints, the application will:
-- Display a user-friendly error message
-- Continue operating with graceful error handling
-- NOT show "ANALYSIS_ERROR" in risk flags
-## Acceptance Criteria
-✅ Local web app running on localhost:5173
-✅ FastAPI backend on localhost:8000
-✅ FastVLM-7B integration with trust_remote_code=True
-✅ IMAGE_TOKEN_INDEX = -200 configured
-✅ JSON output format with required fields
-✅ Demo automation functionality
-✅ NDJSON logging with timestamps
-✅ ZIP export with logs and frames
-✅ Project structure matches specifications
-## Troubleshooting
-- **Model Loading Issues**: Check GPU memory and CUDA installation
-- **Screen Capture Errors**: Ensure proper display permissions
-- **Browser Automation**: Install Chrome/Chromium and check WebDriver
-- **Port Conflicts**: Ensure ports 5173 and 8000 are available
-## License
-MIT

+---
+title: FastVLM Screen Observer
+emoji: 🖥️👁️
+colorFrom: blue
+colorTo: purple
+sdk: docker
+sdk_version: "3.9"
+app_port: 7860
+pinned: false
+license: mit
+models:
+  - apple/FastVLM-7B
+suggested_hardware: t4-small
+custom_headers:
+  cross-origin-embedder-policy: require-corp
+  cross-origin-opener-policy: same-origin
+---
+# FastVLM Screen Observer 🖥️👁️
+Real-time screen observation and analysis using Apple's FastVLM-7B model, optimized for low-RAM systems (3-8GB).
 ## Features
+- 🎯 Real-time screen capture and analysis
+- 🤖 FastVLM-7B vision-language model integration
+- 🔍 UI element detection
+- 📝 Text extraction from screenshots
+- ⚠️ Risk detection for security concerns
+- 🎮 Browser automation demo
+- 💾 Export logs and captured frames
+- 🚀 Optimized for 3-8GB RAM with 4-bit quantization
+## How to Use
+1. Click "Capture Screen" to analyze your current screen
+2. Enable "Auto Capture" for continuous monitoring
+3. Use "Run Demo" to see browser automation
+4. Export logs as ZIP archive
+## Model Information
+- **Model**: Apple FastVLM-7B
+- **Optimization**: Extreme memory optimization with 4-bit quantization
+- **Memory**: Runs on 3-8GB RAM systems
+- **Device**: Supports CPU, CUDA, and MPS (Apple Silicon)
 ## API Endpoints
+- `GET /api/` - Status check
+- `POST /api/analyze` - Screen analysis
+- `POST /api/demo` - Automation demo
+- `GET /api/export` - Export logs
+- `GET /api/logs/stream` - Stream logs via SSE
+## GitHub Repository
+https://github.com/crosse712/fastvlm-screen-observer
+---
+Built with ❤️ using FastAPI, React, and FastVLM-7B

README_ORIGINAL.md ADDED Viewed

	@@ -0,0 +1,167 @@

+# FastVLM-7B Screen Observer
+A local web application for real-time screen observation and analysis using Apple's FastVLM-7B model via HuggingFace.
+## Features
+- **Real-time Screen Capture**: Capture and analyze screen content on-demand or automatically
+- **FastVLM-7B Integration**: Uses Apple's vision-language model for intelligent screen analysis
+- **UI Element Detection**: Identifies buttons, links, forms, and other interface elements
+- **Text Extraction**: Captures text snippets from the screen
+- **Risk Detection**: Flags potential security or privacy concerns
+- **Automation Demo**: Demonstrates browser automation capabilities
+- **NDJSON Logging**: Comprehensive logging in NDJSON format with timestamps
+- **Export Functionality**: Download logs and captured frames as ZIP archive
+## Specifications
+- **Frontend**: React + Vite on `http://localhost:5173`
+- **Backend**: FastAPI on `http://localhost:8000`
+- **Model**: Apple FastVLM-7B with `trust_remote_code=True`
+- **Image Token**: `IMAGE_TOKEN_INDEX = -200`
+- **Output Format**: JSON with summary, ui_elements, text_snippets, risk_flags
+## Prerequisites
+- Python 3.8+
+- Node.js 16+
+- Chrome/Chromium browser (for automation demo)
+- 14GB+ RAM (required for FastVLM-7B model weights)
+- CUDA-capable GPU or Apple Silicon (recommended for FastVLM-7B)
+## Installation
+1. Clone this repository:
+```bash
+cd fastvlm-screen-observer
+```
+2. Install Python dependencies:
+```bash
+cd backend
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+pip install -r requirements.txt
+```
+3. Install Node.js dependencies:
+```bash
+cd ../frontend
+npm install
+```
+## Running the Application
+### Option 1: Using the start script (Recommended)
+```bash
+./start.sh
+```
+### Option 2: Manual start
+Terminal 1 - Backend:
+```bash
+cd backend
+source venv/bin/activate
+uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
+```
+Terminal 2 - Frontend:
+```bash
+cd frontend
+npm run dev
+```
+## Usage
+1. Open your browser and navigate to `http://localhost:5173`
+2. Click "Capture Screen" to analyze the current screen
+3. Enable "Auto Capture" for continuous monitoring
+4. Use "Run Demo" to see browser automation in action
+5. Click "Export Logs" to download analysis data
+## API Endpoints
+- `GET /` - API status check
+- `POST /analyze` - Capture and analyze screen
+- `POST /demo` - Run automation demo
+- `GET /export` - Export logs as ZIP
+- `GET /logs/stream` - Stream logs via SSE
+- `GET /docs` - Interactive API documentation
+## Project Structure
+```
+fastvlm-screen-observer/
+├── backend/
+│   ├── app/
+│   │   └── main.py              # FastAPI application
+│   ├── models/
+│   │   ├── fastvlm_model.py     # FastVLM-7B main integration
+│   │   ├── fastvlm_optimized.py # Memory optimization strategies
+│   │   ├── fastvlm_extreme.py   # Extreme optimization (4-bit)
+│   │   └── use_fastvlm_small.py # Alternative 1.5B model
+│   ├── utils/
+│   │   ├── screen_capture.py    # Screen capture utilities
+│   │   ├── automation.py        # Browser automation
+│   │   └── logger.py            # NDJSON logging
+│   └── requirements.txt
+├── frontend/
+│   ├── src/
+│   │   ├── App.jsx              # React main component (with error handling)
+│   │   ├── ScreenCapture.jsx    # WebRTC screen capture
+│   │   └── App.css              # Styling
+│   ├── package.json
+│   └── vite.config.js
+├── logs/                         # Generated logs and frames
+├── start.sh                      # Startup script
+└── README.md
+```
+## Model Notes
+The application uses Apple's FastVLM-7B model with the following specifications:
+- **Model ID**: `apple/FastVLM-7B` from HuggingFace
+- **Tokenizer**: Qwen2Tokenizer (requires `transformers>=4.40.0`)
+- **IMAGE_TOKEN_INDEX**: -200 (special token for image placeholders)
+- **trust_remote_code**: True (required for model loading)
+### Memory Requirements:
+- **Minimum**: 14GB RAM for model weights
+- **Recommended**: 16GB+ RAM for smooth operation
+- The model will download automatically on first run (~14GB)
+### Current Implementation:
+The system includes multiple optimization strategies:
+1. **Standard Mode**: Full precision (float16) - requires 14GB+ RAM
+2. **Optimized Mode**: 8-bit quantization - requires 8-10GB RAM
+3. **Extreme Mode**: 4-bit quantization with disk offloading - requires 6-8GB RAM
+If the model fails to load due to memory constraints, the application will:
+- Display a user-friendly error message
+- Continue operating with graceful error handling
+- NOT show "ANALYSIS_ERROR" in risk flags
+## Acceptance Criteria
+✅ Local web app running on localhost:5173
+✅ FastAPI backend on localhost:8000
+✅ FastVLM-7B integration with trust_remote_code=True
+✅ IMAGE_TOKEN_INDEX = -200 configured
+✅ JSON output format with required fields
+✅ Demo automation functionality
+✅ NDJSON logging with timestamps
+✅ ZIP export with logs and frames
+✅ Project structure matches specifications
+## Troubleshooting
+- **Model Loading Issues**: Check GPU memory and CUDA installation
+- **Screen Capture Errors**: Ensure proper display permissions
+- **Browser Automation**: Install Chrome/Chromium and check WebDriver
+- **Port Conflicts**: Ensure ports 5173 and 8000 are available
+## License
+MIT