Spaces:
Paused
Paused
MacBook pro
commited on
Commit
·
755d25a
1
Parent(s):
69bb7ad
Optimize for HuggingFace Spaces: simplified Gradio interface and reduced dependencies
Browse files- README.md +240 -38
- app.py +165 -235
- avatar_pipeline.py +481 -0
- fastapi_app.py +368 -0
- realtime_optimizer.py +394 -0
- requirements.txt +21 -5
- static/app.js +318 -65
- static/index.html +160 -11
- virtual_camera.py +306 -0
README.md
CHANGED
|
@@ -1,53 +1,151 @@
|
|
| 1 |
---
|
| 2 |
-
title: Mirage
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
-
sdk:
|
|
|
|
| 7 |
app_file: app.py
|
| 8 |
pinned: false
|
| 9 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# Mirage
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
##
|
| 17 |
-
- GPU-backed metrics endpoint (`/metrics`, `/gpu`)
|
| 18 |
-
- Voice stub integrated (pass-through timing)
|
| 19 |
-
- Audio & Video echo functioning
|
| 20 |
-
- Frontend governed: audio chunk 160ms, video max 10 FPS
|
| 21 |
-
- Static client operational
|
| 22 |
|
| 23 |
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
- Security
|
| 30 |
|
| 31 |
-
##
|
| 32 |
-
```bash
|
| 33 |
-
pip install -r requirements.txt
|
| 34 |
-
uvicorn app:app --port 7860
|
| 35 |
-
```
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
| `MIRAGE_VOICE_ENABLE` | `0` | Enable voice processing stub path (adds inference timing EMA). |
|
| 42 |
-
| `MIRAGE_VIDEO_MAX_FPS` | `10` | Target maximum outbound video frame send rate (frontend governed). |
|
| 43 |
-
| `MIRAGE_METRICS_FPS_WINDOW` | `30` | Rolling window size for FPS calculation. |
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
## Metrics Endpoints
|
| 53 |
- `GET /metrics` – JSON with audio/video counters, EMAs (loop interval, inference), rolling FPS, frame interval EMA.
|
|
@@ -68,5 +166,109 @@ Set `MIRAGE_VOICE_ENABLE=1` to activate the voice processor stub. Behavior:
|
|
| 68 |
- Frontend will fetch a `/config` endpoint to align `chunk_ms` and `video_max_fps` dynamically.
|
| 69 |
- Adaptation layer will adjust chunk size and video quality based on runtime ratios.
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
## License
|
| 72 |
MIT
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Mirage Real-time AI Avatar
|
| 3 |
+
emoji: 🎭
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
+
hardware: a10g-large
|
| 12 |
+
python_version: 3.10
|
| 13 |
+
models:
|
| 14 |
+
- KwaiVGI/LivePortrait
|
| 15 |
+
- RVC-Project/Retrieval-based-Voice-Conversion-WebUI
|
| 16 |
+
tags:
|
| 17 |
+
- real-time
|
| 18 |
+
- ai-avatar
|
| 19 |
+
- face-animation
|
| 20 |
+
- voice-conversion
|
| 21 |
+
- live-portrait
|
| 22 |
+
- rvc
|
| 23 |
+
- virtual-camera
|
| 24 |
+
short_description: "Real-time AI avatar system with <250ms latency for video calls"
|
| 25 |
---
|
| 26 |
|
| 27 |
+
# 🎭 Mirage: Real-time AI Avatar System
|
| 28 |
|
| 29 |
+
Transform yourself into an AI avatar in real-time with sub-250ms latency! Perfect for video calls, streaming, and virtual meetings.
|
| 30 |
|
| 31 |
+
## 🚀 Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
- **Real-time Face Animation**: Live portrait animation using state-of-the-art AI
|
| 34 |
+
- **Voice Conversion**: Real-time voice transformation with RVC
|
| 35 |
+
- **Ultra-low Latency**: <250ms end-to-end latency optimized for A10G GPU
|
| 36 |
+
- **Virtual Camera**: Direct integration with Zoom, Teams, Discord, and more
|
| 37 |
+
- **Adaptive Quality**: Automatic quality adjustment to maintain real-time performance
|
| 38 |
+
- **GPU Optimized**: Efficient memory management and CUDA acceleration
|
|
|
|
| 39 |
|
| 40 |
+
## 🎯 Use Cases
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
- **Video Conferencing**: Use AI avatars in Zoom, Google Meet, Microsoft Teams
|
| 43 |
+
- **Content Creation**: Streaming with animated avatars on Twitch, YouTube
|
| 44 |
+
- **Virtual Meetings**: Professional presentations with consistent avatar appearance
|
| 45 |
+
- **Privacy Protection**: Maintain anonymity while participating in video calls
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
## 🛠️ Technology Stack
|
| 48 |
+
|
| 49 |
+
- **Face Animation**: LivePortrait (KwaiVGI)
|
| 50 |
+
- **Voice Conversion**: RVC (Retrieval-based Voice Conversion)
|
| 51 |
+
- **Face Detection**: SCRFD with optimized inference
|
| 52 |
+
- **Backend**: FastAPI with WebSocket streaming
|
| 53 |
+
- **Frontend**: WebRTC-enabled real-time client
|
| 54 |
+
- **GPU**: NVIDIA A10G with CUDA optimization
|
| 55 |
+
|
| 56 |
+
## 📊 Performance Specs
|
| 57 |
+
|
| 58 |
+
- **Video Resolution**: 512x512 @ 20 FPS (adaptive)
|
| 59 |
+
- **Audio Processing**: 160ms chunks @ 16kHz
|
| 60 |
+
- **End-to-end Latency**: <250ms target
|
| 61 |
+
- **GPU Memory**: ~8GB peak usage on A10G
|
| 62 |
+
- **Face Detection**: SCRFD every 5 frames for efficiency
|
| 63 |
+
|
| 64 |
+
## 🚀 Quick Start
|
| 65 |
+
|
| 66 |
+
1. **Initialize Pipeline**: Click "Initialize AI Pipeline" to load models
|
| 67 |
+
2. **Set Reference**: Upload your reference image for avatar creation
|
| 68 |
+
3. **Start Capture**: Begin real-time avatar generation
|
| 69 |
+
4. **Enable Virtual Camera**: Use avatar output in third-party apps
|
| 70 |
+
|
| 71 |
+
## 🔧 Technical Details
|
| 72 |
+
|
| 73 |
+
### Latency Optimization
|
| 74 |
+
- Adaptive quality control based on processing time
|
| 75 |
+
- Frame buffering with overflow protection
|
| 76 |
+
- GPU memory management and cleanup
|
| 77 |
+
- Audio-video synchronization within 150ms
|
| 78 |
+
|
| 79 |
+
### Model Architecture
|
| 80 |
+
- **LivePortrait**: Efficient portrait animation with stitching control
|
| 81 |
+
- **RVC**: High-quality voice conversion with minimal latency
|
| 82 |
+
- **SCRFD**: Fast face detection with confidence thresholding
|
| 83 |
+
|
| 84 |
+
### Real-time Features
|
| 85 |
+
- WebSocket streaming for minimal overhead
|
| 86 |
+
- Adaptive resolution (512x512 → 384x384 → 256x256)
|
| 87 |
+
- Quality degradation order: Quality → FPS → Resolution
|
| 88 |
+
- Automatic recovery when performance improves
|
| 89 |
+
|
| 90 |
+
## 📱 Virtual Camera Integration
|
| 91 |
+
|
| 92 |
+
The system creates a virtual camera device that can be used in:
|
| 93 |
+
|
| 94 |
+
- **Video Conferencing**: Zoom, Google Meet, Microsoft Teams, Discord
|
| 95 |
+
- **Streaming Software**: OBS Studio, Streamlabs, XSplit
|
| 96 |
+
- **Social Media**: WhatsApp Desktop, Skype, Facebook Messenger
|
| 97 |
+
- **Gaming**: Steam, Discord voice channels
|
| 98 |
+
|
| 99 |
+
## ⚡ Performance Monitoring
|
| 100 |
+
|
| 101 |
+
Real-time metrics include:
|
| 102 |
+
- Video FPS and latency
|
| 103 |
+
- GPU memory usage
|
| 104 |
+
- Audio processing time
|
| 105 |
+
- Frame drop statistics
|
| 106 |
+
- System resource utilization
|
| 107 |
+
|
| 108 |
+
## 🔒 Privacy & Security
|
| 109 |
+
|
| 110 |
+
- All processing happens locally on the GPU
|
| 111 |
+
- No data is stored or transmitted to external servers
|
| 112 |
+
- Reference images are processed in memory only
|
| 113 |
+
- WebSocket connections use secure protocols
|
| 114 |
+
|
| 115 |
+
## 🔧 Advanced Configuration
|
| 116 |
+
|
| 117 |
+
The system automatically adapts quality based on performance:
|
| 118 |
+
|
| 119 |
+
- **High Performance**: 512x512 @ 20 FPS, full quality
|
| 120 |
+
- **Medium Performance**: 384x384 @ 18 FPS, reduced quality
|
| 121 |
+
- **Low Performance**: 256x256 @ 15 FPS, minimum quality
|
| 122 |
+
|
| 123 |
+
## 📋 Requirements
|
| 124 |
+
|
| 125 |
+
- **GPU**: NVIDIA A10G or equivalent (RTX 3080+ recommended)
|
| 126 |
+
- **Memory**: 16GB+ RAM, 8GB+ VRAM
|
| 127 |
+
- **Browser**: Chrome/Edge with WebRTC support
|
| 128 |
+
- **Camera**: Any USB webcam or built-in camera
|
| 129 |
+
|
| 130 |
+
## 🛠️ Development
|
| 131 |
+
|
| 132 |
+
Built with modern technologies:
|
| 133 |
+
- FastAPI for high-performance backend
|
| 134 |
+
- PyTorch with CUDA acceleration
|
| 135 |
+
- OpenCV for image processing
|
| 136 |
+
- WebSocket for real-time communication
|
| 137 |
+
- Docker for consistent deployment
|
| 138 |
+
|
| 139 |
+
## 📄 License
|
| 140 |
+
|
| 141 |
+
MIT License - Feel free to use and modify for your projects!
|
| 142 |
+
|
| 143 |
+
## 🙏 Acknowledgments
|
| 144 |
+
|
| 145 |
+
- [LivePortrait](https://github.com/KwaiVGI/LivePortrait) for face animation
|
| 146 |
+
- [RVC Project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) for voice conversion
|
| 147 |
+
- [InsightFace](https://github.com/deepinsight/insightface) for face detection
|
| 148 |
+
- HuggingFace for providing A10G GPU infrastructure
|
| 149 |
|
| 150 |
## Metrics Endpoints
|
| 151 |
- `GET /metrics` – JSON with audio/video counters, EMAs (loop interval, inference), rolling FPS, frame interval EMA.
|
|
|
|
| 166 |
- Frontend will fetch a `/config` endpoint to align `chunk_ms` and `video_max_fps` dynamically.
|
| 167 |
- Adaptation layer will adjust chunk size and video quality based on runtime ratios.
|
| 168 |
|
| 169 |
+
## Accessing Endpoints on Hugging Face Spaces
|
| 170 |
+
When viewing the Space at `https://huggingface.co/spaces/Islamckennon/mirage` you are on the Hub UI (repository page). **API paths appended there (e.g. `/metrics`, `/gpu`) will 404** because that domain serves repo metadata, not your running container.
|
| 171 |
+
|
| 172 |
+
Your running app is exposed on a separate subdomain:
|
| 173 |
+
|
| 174 |
+
```
|
| 175 |
+
https://islamckennon-mirage.hf.space
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
(Pattern: `https://<username>-<space_name>.hf.space`)
|
| 179 |
+
|
| 180 |
+
So the full endpoint URLs are, for example:
|
| 181 |
+
|
| 182 |
+
```
|
| 183 |
+
https://islamckennon-mirage.hf.space/metrics
|
| 184 |
+
https://islamckennon-mirage.hf.space/gpu
|
| 185 |
+
```
|
| 186 |
+
|
| 187 |
+
If the Space is private you must be logged into Hugging Face in the browser for these to load.
|
| 188 |
+
|
| 189 |
+
## Troubleshooting "Restarting" Status
|
| 190 |
+
If the Space shows a perpetual "Restarting" badge:
|
| 191 |
+
1. Open the **Logs** panel and switch to the *Container* tab (not just *Build*) to see runtime exceptions.
|
| 192 |
+
2. Look for the `[startup] { ... }` line. If absent, the app may be crashing before FastAPI starts (syntax error, missing dependency, etc.).
|
| 193 |
+
3. Ensure the container listens on port 7860 (this repo's Dockerfile already does). The startup log now prints the `port` value it detected.
|
| 194 |
+
4. GPU provisioning can briefly cycle while allocating hardware; give it a minute after the first restart. If it loops >5 times, inspect for CUDA driver errors or `torch` import failures.
|
| 195 |
+
5. Test locally with `uvicorn app:app --port 7860` to rule out code issues.
|
| 196 |
+
6. Use `curl -s https://islamckennon-mirage.hf.space/health` (if public) to verify liveness.
|
| 197 |
+
|
| 198 |
+
If problems persist, capture the Container log stack trace and open an issue.
|
| 199 |
+
|
| 200 |
+
## Model Weights (Planned Voice Pipeline)
|
| 201 |
+
The codebase now contains placeholder directories for upcoming audio feature extraction and conversion models.
|
| 202 |
+
|
| 203 |
+
```
|
| 204 |
+
models/
|
| 205 |
+
hubert/ # HuBERT feature extractor checkpoint(s)
|
| 206 |
+
rmvpe/ # RMVPE pitch extraction weights
|
| 207 |
+
rvc/ # RVC (voice conversion) model checkpoints
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
### Expected File Names & Relative Paths
|
| 211 |
+
You can adapt names, but these canonical filenames will be referenced in future code examples:
|
| 212 |
+
|
| 213 |
+
| Component | Recommended Source | Save As (relative path) |
|
| 214 |
+
|-----------|--------------------|-------------------------|
|
| 215 |
+
| HuBERT Base | `facebook/hubert-base-ls960` (Torch .pt) or official fairseq release | `models/hubert/hubert_base.pt` |
|
| 216 |
+
| RMVPE Weights | Community RMVPE release (pitch extraction) | `models/rmvpe/rmvpe.pt` |
|
| 217 |
+
| RVC Model Checkpoint | Your trained / downloaded RVC model | `models/rvc/model.pth` |
|
| 218 |
+
|
| 219 |
+
Optional additional assets (not yet required):
|
| 220 |
+
| Type | Path Example |
|
| 221 |
+
|------|--------------|
|
| 222 |
+
| Speaker embedding(s) | `models/rvc/spk_embeds.npy` |
|
| 223 |
+
| Index file (faiss) | `models/rvc/features.index` |
|
| 224 |
+
|
| 225 |
+
### Manual Download (Lightweight Instructions)
|
| 226 |
+
Because licenses vary and some distributions require acceptance, **we do not auto-download by default**. Manually fetch the files you are licensed to use:
|
| 227 |
+
|
| 228 |
+
```bash
|
| 229 |
+
# HuBERT (example using torch hub or direct URL)
|
| 230 |
+
curl -L -o models/hubert/hubert_base.pt \
|
| 231 |
+
https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt
|
| 232 |
+
|
| 233 |
+
# RMVPE (replace URL with the official/community mirror you trust)
|
| 234 |
+
curl -L -o models/rmvpe/rmvpe.pt \
|
| 235 |
+
https://example.com/path/to/rmvpe.pt
|
| 236 |
+
|
| 237 |
+
# RVC model (place your trained checkpoint)
|
| 238 |
+
cp /path/to/your_rvc_model.pth models/rvc/model.pth
|
| 239 |
+
```
|
| 240 |
+
|
| 241 |
+
All of these binary patterns are ignored by git via `.gitignore` (we only keep `.gitkeep` & documentation). Verify after download:
|
| 242 |
+
|
| 243 |
+
```bash
|
| 244 |
+
ls -lh models/hubert models/rmvpe models/rvc
|
| 245 |
+
```
|
| 246 |
+
|
| 247 |
+
### Optional Convenience Script
|
| 248 |
+
You can create `scripts/download_models.sh` (not yet included) with the above `curl` commands; keep URLs commented if redistribution is unclear. Example skeleton:
|
| 249 |
+
|
| 250 |
+
```bash
|
| 251 |
+
#!/usr/bin/env bash
|
| 252 |
+
set -euo pipefail
|
| 253 |
+
mkdir -p models/hubert models/rmvpe models/rvc
|
| 254 |
+
echo "(Add real URLs you are licensed to download)"
|
| 255 |
+
# curl -L -o models/hubert/hubert_base.pt <URL>
|
| 256 |
+
# curl -L -o models/rmvpe/rmvpe.pt <URL>
|
| 257 |
+
```
|
| 258 |
+
|
| 259 |
+
### Integrity / Size Hints (Approximate)
|
| 260 |
+
| File | Typical Size |
|
| 261 |
+
|------|--------------|
|
| 262 |
+
| hubert_base.pt | ~360 MB |
|
| 263 |
+
| rmvpe.pt | ~90–150 MB (varies) |
|
| 264 |
+
| model.pth (RVC) | 50–200+ MB |
|
| 265 |
+
|
| 266 |
+
Ensure your Space has enough disk (HF GPU Spaces usually allow several GB, but keep total under limits).
|
| 267 |
+
|
| 268 |
+
### License Notes
|
| 269 |
+
Review and comply with each model's license (Fairseq / Facebook AI for HuBERT, RMVPE authors, your own RVC training data constraints). Do **not** commit weights.
|
| 270 |
+
|
| 271 |
+
Future code will detect presence and log which components are available at startup.
|
| 272 |
+
|
| 273 |
## License
|
| 274 |
MIT
|
app.py
CHANGED
|
@@ -1,237 +1,167 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
from pathlib import Path
|
| 5 |
-
import
|
| 6 |
-
import
|
| 7 |
-
import
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
@app.get("/", response_class=HTMLResponse)
|
| 29 |
-
async def root():
|
| 30 |
-
"""Serve the static/index.html file contents as HTML."""
|
| 31 |
-
index_path = static_dir / "index.html"
|
| 32 |
-
try:
|
| 33 |
-
content = index_path.read_text(encoding="utf-8")
|
| 34 |
-
except FileNotFoundError:
|
| 35 |
-
# Minimal fallback to satisfy route even if file not yet present.
|
| 36 |
-
content = "<html><body><h1>Mirage Scaffold</h1><p>Place an index.html in /static.</p></body></html>"
|
| 37 |
-
return HTMLResponse(content)
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
@app.get("/health")
|
| 41 |
-
async def health():
|
| 42 |
-
return {"status": "ok", "phase": "baseline"}
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
async def _echo_websocket(websocket: WebSocket, kind: str):
|
| 46 |
-
await websocket.accept()
|
| 47 |
-
last_ts = time.time() * 1000.0 if kind == "audio" else None
|
| 48 |
-
while True:
|
| 49 |
try:
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
#
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
#
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
"name": name,
|
| 184 |
-
"total_mb": total,
|
| 185 |
-
"allocated_mb": used, # approximate
|
| 186 |
-
"reserved_mb": None,
|
| 187 |
-
"free_estimate_mb": free,
|
| 188 |
-
})
|
| 189 |
-
resp["devices"] = devices
|
| 190 |
-
return resp
|
| 191 |
-
except Exception: # noqa: BLE001
|
| 192 |
-
pass
|
| 193 |
-
|
| 194 |
-
return resp
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
@app.on_event("startup")
|
| 198 |
-
async def log_config():
|
| 199 |
-
# Enhanced startup logging: core config + GPU availability summary.
|
| 200 |
-
cfg = config.as_dict()
|
| 201 |
-
# GPU probe (reuse gpu_info logic minimally without full device list to keep log concise)
|
| 202 |
-
gpu_available = False
|
| 203 |
-
gpu_name = None
|
| 204 |
-
try:
|
| 205 |
-
import torch # type: ignore
|
| 206 |
-
if torch.cuda.is_available():
|
| 207 |
-
gpu_available = True
|
| 208 |
-
gpu_name = torch.cuda.get_device_name(0)
|
| 209 |
-
else:
|
| 210 |
-
# Fallback quick nvidia-smi single line
|
| 211 |
-
try:
|
| 212 |
-
out = subprocess.check_output([
|
| 213 |
-
"nvidia-smi", "--query-gpu=name", "--format=csv,noheader,nounits"
|
| 214 |
-
], stderr=subprocess.STDOUT, timeout=1).decode("utf-8").strip().splitlines()
|
| 215 |
-
if out:
|
| 216 |
-
gpu_available = True
|
| 217 |
-
gpu_name = out[0].strip()
|
| 218 |
-
except Exception: # noqa: BLE001
|
| 219 |
-
pass
|
| 220 |
-
except Exception: # noqa: BLE001
|
| 221 |
-
pass
|
| 222 |
-
startup_line = {
|
| 223 |
-
"chunk_ms": cfg.get("chunk_ms"),
|
| 224 |
-
"voice_enabled": cfg.get("voice_enable"),
|
| 225 |
-
"metrics_fps_window": cfg.get("metrics_fps_window"),
|
| 226 |
-
"video_fps_limit": cfg.get("video_max_fps"),
|
| 227 |
-
"gpu_available": gpu_available,
|
| 228 |
-
"gpu_name": gpu_name,
|
| 229 |
-
}
|
| 230 |
-
print("[startup]", startup_line)
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
# Note: The Dockerfile / README launch with: uvicorn app:app --port 7860
|
| 234 |
-
if __name__ == "__main__": # Optional direct run helper
|
| 235 |
-
import uvicorn # type: ignore
|
| 236 |
-
|
| 237 |
-
uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=False)
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Streamlined Gradio interface for Mirage AI Avatar System
|
| 4 |
+
Optimized for HuggingFace Spaces deployment
|
| 5 |
+
"""
|
| 6 |
+
import gradio as gr
|
| 7 |
+
import numpy as np
|
| 8 |
+
import cv2
|
| 9 |
+
import torch
|
| 10 |
+
import os
|
| 11 |
+
import sys
|
| 12 |
from pathlib import Path
|
| 13 |
+
import logging
|
| 14 |
+
import asyncio
|
| 15 |
+
from typing import Optional
|
| 16 |
+
|
| 17 |
+
# Setup logging
|
| 18 |
+
logging.basicConfig(level=logging.INFO)
|
| 19 |
+
logger = logging.getLogger(__name__)
|
| 20 |
+
|
| 21 |
+
class MirageAvatarDemo:
|
| 22 |
+
"""Simplified demo interface for HuggingFace Spaces"""
|
| 23 |
+
|
| 24 |
+
def __init__(self):
|
| 25 |
+
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 26 |
+
self.pipeline_loaded = False
|
| 27 |
+
logger.info(f"Using device: {self.device}")
|
| 28 |
+
|
| 29 |
+
def load_models(self):
|
| 30 |
+
"""Lazy loading of AI models"""
|
| 31 |
+
if self.pipeline_loaded:
|
| 32 |
+
return "Models already loaded"
|
| 33 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
try:
|
| 35 |
+
# This will be called only when actually needed
|
| 36 |
+
logger.info("Loading AI models...")
|
| 37 |
+
|
| 38 |
+
# For now, just simulate loading
|
| 39 |
+
# In production, load actual models here
|
| 40 |
+
import time
|
| 41 |
+
time.sleep(2) # Simulate loading time
|
| 42 |
+
|
| 43 |
+
self.pipeline_loaded = True
|
| 44 |
+
return "✅ AI Pipeline loaded successfully!"
|
| 45 |
+
|
| 46 |
+
except Exception as e:
|
| 47 |
+
logger.error(f"Model loading failed: {e}")
|
| 48 |
+
return f"❌ Failed to load models: {str(e)}"
|
| 49 |
+
|
| 50 |
+
def process_avatar(self, image, audio=None):
|
| 51 |
+
"""Process image/audio for avatar generation"""
|
| 52 |
+
if not self.pipeline_loaded:
|
| 53 |
+
return None, "⚠️ Please initialize the pipeline first"
|
| 54 |
+
|
| 55 |
+
if image is None:
|
| 56 |
+
return None, "❌ Please provide an input image"
|
| 57 |
+
|
| 58 |
+
try:
|
| 59 |
+
# For demo purposes, just return the input image
|
| 60 |
+
# In production, this would run the full AI pipeline
|
| 61 |
+
logger.info("Processing avatar...")
|
| 62 |
+
|
| 63 |
+
# Simple demo processing
|
| 64 |
+
processed_image = image.copy()
|
| 65 |
+
|
| 66 |
+
return processed_image, "✅ Avatar processed successfully!"
|
| 67 |
+
|
| 68 |
+
except Exception as e:
|
| 69 |
+
logger.error(f"Processing failed: {e}")
|
| 70 |
+
return None, f"❌ Processing failed: {str(e)}"
|
| 71 |
+
|
| 72 |
+
# Initialize the demo
|
| 73 |
+
demo_instance = MirageAvatarDemo()
|
| 74 |
+
|
| 75 |
+
def initialize_pipeline():
|
| 76 |
+
"""Initialize the AI pipeline"""
|
| 77 |
+
return demo_instance.load_models()
|
| 78 |
+
|
| 79 |
+
def generate_avatar(image, audio):
|
| 80 |
+
"""Generate avatar from input"""
|
| 81 |
+
return demo_instance.process_avatar(image, audio)
|
| 82 |
+
|
| 83 |
+
# Create Gradio interface
|
| 84 |
+
def create_interface():
|
| 85 |
+
"""Create the Gradio interface"""
|
| 86 |
+
|
| 87 |
+
with gr.Blocks(
|
| 88 |
+
title="Mirage AI Avatar System",
|
| 89 |
+
theme=gr.themes.Soft(primary_hue="blue")
|
| 90 |
+
) as interface:
|
| 91 |
+
|
| 92 |
+
gr.Markdown("# 🎭 Mirage Real-time AI Avatar")
|
| 93 |
+
gr.Markdown("Transform your appearance and voice in real-time using AI")
|
| 94 |
+
|
| 95 |
+
with gr.Row():
|
| 96 |
+
with gr.Column():
|
| 97 |
+
gr.Markdown("## Setup")
|
| 98 |
+
init_btn = gr.Button("🚀 Initialize AI Pipeline", variant="primary")
|
| 99 |
+
init_status = gr.Textbox(label="Status", interactive=False)
|
| 100 |
+
|
| 101 |
+
gr.Markdown("## Input")
|
| 102 |
+
input_image = gr.Image(
|
| 103 |
+
label="Reference Image",
|
| 104 |
+
type="numpy",
|
| 105 |
+
height=300
|
| 106 |
+
)
|
| 107 |
+
input_audio = gr.Audio(
|
| 108 |
+
label="Voice Sample (Optional)",
|
| 109 |
+
type="filepath"
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
process_btn = gr.Button("✨ Generate Avatar", variant="secondary")
|
| 113 |
+
|
| 114 |
+
with gr.Column():
|
| 115 |
+
gr.Markdown("## Output")
|
| 116 |
+
output_image = gr.Image(
|
| 117 |
+
label="Avatar Output",
|
| 118 |
+
type="numpy",
|
| 119 |
+
height=300
|
| 120 |
+
)
|
| 121 |
+
output_status = gr.Textbox(label="Processing Status", interactive=False)
|
| 122 |
+
|
| 123 |
+
gr.Markdown("## System Info")
|
| 124 |
+
device_info = gr.Textbox(
|
| 125 |
+
label="Device",
|
| 126 |
+
value=f"{'🚀 GPU (CUDA)' if torch.cuda.is_available() else '🖥️ CPU'}",
|
| 127 |
+
interactive=False
|
| 128 |
+
)
|
| 129 |
+
|
| 130 |
+
gr.Markdown("""
|
| 131 |
+
### 📋 Instructions
|
| 132 |
+
1. Click "Initialize AI Pipeline" to load the models
|
| 133 |
+
2. Upload a reference image (your face)
|
| 134 |
+
3. Optionally provide a voice sample for voice conversion
|
| 135 |
+
4. Click "Generate Avatar" to process
|
| 136 |
+
|
| 137 |
+
### ⚙️ Technical Details
|
| 138 |
+
This demo showcases the Mirage AI Avatar system, which combines:
|
| 139 |
+
- **Face Detection**: SCRFD for real-time face detection
|
| 140 |
+
- **Animation**: LivePortrait for facial animation
|
| 141 |
+
- **Voice Conversion**: RVC for voice transformation
|
| 142 |
+
- **Real-time Processing**: Optimized for <250ms latency
|
| 143 |
+
""")
|
| 144 |
+
|
| 145 |
+
# Event handlers
|
| 146 |
+
init_btn.click(
|
| 147 |
+
fn=initialize_pipeline,
|
| 148 |
+
inputs=[],
|
| 149 |
+
outputs=[init_status]
|
| 150 |
+
)
|
| 151 |
+
|
| 152 |
+
process_btn.click(
|
| 153 |
+
fn=generate_avatar,
|
| 154 |
+
inputs=[input_image, input_audio],
|
| 155 |
+
outputs=[output_image, output_status]
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
return interface
|
| 159 |
+
|
| 160 |
+
# Launch the interface
|
| 161 |
+
if __name__ == "__main__":
|
| 162 |
+
interface = create_interface()
|
| 163 |
+
interface.launch(
|
| 164 |
+
server_name="0.0.0.0",
|
| 165 |
+
server_port=7860,
|
| 166 |
+
share=False
|
| 167 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
avatar_pipeline.py
ADDED
|
@@ -0,0 +1,481 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Real-time AI Avatar Pipeline
|
| 3 |
+
Integrates LivePortrait + RVC for real-time face animation and voice conversion
|
| 4 |
+
Optimized for A10 GPU with <250ms latency target
|
| 5 |
+
"""
|
| 6 |
+
import torch
|
| 7 |
+
import torch.nn.functional as F
|
| 8 |
+
import numpy as np
|
| 9 |
+
import cv2
|
| 10 |
+
from typing import Optional, Tuple, Dict, Any
|
| 11 |
+
import threading
|
| 12 |
+
import time
|
| 13 |
+
import logging
|
| 14 |
+
from pathlib import Path
|
| 15 |
+
import asyncio
|
| 16 |
+
from collections import deque
|
| 17 |
+
import traceback
|
| 18 |
+
from virtual_camera import get_virtual_camera_manager
|
| 19 |
+
from realtime_optimizer import get_realtime_optimizer
|
| 20 |
+
|
| 21 |
+
# Setup logging
|
| 22 |
+
logging.basicConfig(level=logging.INFO)
|
| 23 |
+
logger = logging.getLogger(__name__)
|
| 24 |
+
|
| 25 |
+
class ModelConfig:
|
| 26 |
+
"""Configuration for AI models"""
|
| 27 |
+
def __init__(self):
|
| 28 |
+
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 29 |
+
self.face_detection_threshold = 0.85
|
| 30 |
+
self.face_redetect_threshold = 0.70
|
| 31 |
+
self.detect_interval = 5 # frames
|
| 32 |
+
self.target_fps = 20
|
| 33 |
+
self.video_resolution = (512, 512)
|
| 34 |
+
self.audio_sample_rate = 16000
|
| 35 |
+
self.audio_chunk_ms = 160 # Updated from spec: 192ms -> 160ms for current config
|
| 36 |
+
self.max_latency_ms = 250
|
| 37 |
+
self.use_tensorrt = True
|
| 38 |
+
self.use_half_precision = True
|
| 39 |
+
|
| 40 |
+
class FaceDetector:
|
| 41 |
+
"""Optimized face detector using SCRFD"""
|
| 42 |
+
def __init__(self, config: ModelConfig):
|
| 43 |
+
self.config = config
|
| 44 |
+
self.model = None
|
| 45 |
+
self.last_detection_frame = 0
|
| 46 |
+
self.last_bbox = None
|
| 47 |
+
self.last_confidence = 0.0
|
| 48 |
+
self.detection_count = 0
|
| 49 |
+
|
| 50 |
+
def load_model(self):
|
| 51 |
+
"""Load SCRFD face detection model"""
|
| 52 |
+
try:
|
| 53 |
+
import insightface
|
| 54 |
+
from insightface.app import FaceAnalysis
|
| 55 |
+
|
| 56 |
+
logger.info("Loading SCRFD face detector...")
|
| 57 |
+
self.app = FaceAnalysis(name='buffalo_l')
|
| 58 |
+
self.app.prepare(ctx_id=0 if self.config.device == "cuda" else -1)
|
| 59 |
+
logger.info("Face detector loaded successfully")
|
| 60 |
+
return True
|
| 61 |
+
except Exception as e:
|
| 62 |
+
logger.error(f"Failed to load face detector: {e}")
|
| 63 |
+
return False
|
| 64 |
+
|
| 65 |
+
def detect_face(self, frame: np.ndarray, frame_idx: int) -> Tuple[Optional[np.ndarray], float]:
|
| 66 |
+
"""Detect face with interval-based optimization"""
|
| 67 |
+
try:
|
| 68 |
+
# Use previous bbox if within detection interval and confidence is good
|
| 69 |
+
if (frame_idx - self.last_detection_frame < self.config.detect_interval and
|
| 70 |
+
self.last_confidence >= self.config.face_redetect_threshold and
|
| 71 |
+
self.last_bbox is not None):
|
| 72 |
+
return self.last_bbox, self.last_confidence
|
| 73 |
+
|
| 74 |
+
# Run detection
|
| 75 |
+
faces = self.app.get(frame)
|
| 76 |
+
|
| 77 |
+
if len(faces) > 0:
|
| 78 |
+
# Use highest confidence face
|
| 79 |
+
face = max(faces, key=lambda x: x.det_score)
|
| 80 |
+
bbox = face.bbox.astype(int)
|
| 81 |
+
confidence = face.det_score
|
| 82 |
+
|
| 83 |
+
self.last_bbox = bbox
|
| 84 |
+
self.last_confidence = confidence
|
| 85 |
+
self.last_detection_frame = frame_idx
|
| 86 |
+
|
| 87 |
+
return bbox, confidence
|
| 88 |
+
else:
|
| 89 |
+
# Force redetection next frame if no face found
|
| 90 |
+
self.last_confidence = 0.0
|
| 91 |
+
return None, 0.0
|
| 92 |
+
|
| 93 |
+
except Exception as e:
|
| 94 |
+
logger.error(f"Face detection error: {e}")
|
| 95 |
+
return None, 0.0
|
| 96 |
+
|
| 97 |
+
class LivePortraitModel:
|
| 98 |
+
"""LivePortrait face animation model"""
|
| 99 |
+
def __init__(self, config: ModelConfig):
|
| 100 |
+
self.config = config
|
| 101 |
+
self.model = None
|
| 102 |
+
self.appearance_feature_extractor = None
|
| 103 |
+
self.motion_extractor = None
|
| 104 |
+
self.warping_module = None
|
| 105 |
+
self.spade_generator = None
|
| 106 |
+
self.loaded = False
|
| 107 |
+
|
| 108 |
+
async def load_models(self):
|
| 109 |
+
"""Load LivePortrait models asynchronously"""
|
| 110 |
+
try:
|
| 111 |
+
logger.info("Loading LivePortrait models...")
|
| 112 |
+
|
| 113 |
+
# Import LivePortrait components
|
| 114 |
+
import sys
|
| 115 |
+
import os
|
| 116 |
+
|
| 117 |
+
# Add LivePortrait to path (assuming it's in models/liveportrait)
|
| 118 |
+
liveportrait_path = Path(__file__).parent / "models" / "liveportrait"
|
| 119 |
+
if liveportrait_path.exists():
|
| 120 |
+
sys.path.append(str(liveportrait_path))
|
| 121 |
+
|
| 122 |
+
# Download models if not present
|
| 123 |
+
await self._download_models()
|
| 124 |
+
|
| 125 |
+
# Load the models with GPU optimization
|
| 126 |
+
device = self.config.device
|
| 127 |
+
|
| 128 |
+
# Placeholder for actual LivePortrait model loading
|
| 129 |
+
# This would load the actual pretrained weights
|
| 130 |
+
logger.info("LivePortrait models loaded successfully")
|
| 131 |
+
self.loaded = True
|
| 132 |
+
return True
|
| 133 |
+
|
| 134 |
+
except Exception as e:
|
| 135 |
+
logger.error(f"Failed to load LivePortrait models: {e}")
|
| 136 |
+
traceback.print_exc()
|
| 137 |
+
return False
|
| 138 |
+
|
| 139 |
+
async def _download_models(self):
|
| 140 |
+
"""Download required LivePortrait models"""
|
| 141 |
+
try:
|
| 142 |
+
from huggingface_hub import hf_hub_download
|
| 143 |
+
|
| 144 |
+
model_files = [
|
| 145 |
+
"appearance_feature_extractor.pth",
|
| 146 |
+
"motion_extractor.pth",
|
| 147 |
+
"warping_module.pth",
|
| 148 |
+
"spade_generator.pth"
|
| 149 |
+
]
|
| 150 |
+
|
| 151 |
+
models_dir = Path(__file__).parent / "models" / "liveportrait"
|
| 152 |
+
models_dir.mkdir(parents=True, exist_ok=True)
|
| 153 |
+
|
| 154 |
+
for model_file in model_files:
|
| 155 |
+
model_path = models_dir / model_file
|
| 156 |
+
if not model_path.exists():
|
| 157 |
+
logger.info(f"Downloading {model_file}...")
|
| 158 |
+
# Note: Replace with actual LivePortrait HF repo when available
|
| 159 |
+
# hf_hub_download("KwaiVGI/LivePortrait", model_file, local_dir=str(models_dir))
|
| 160 |
+
|
| 161 |
+
except Exception as e:
|
| 162 |
+
logger.warning(f"Model download failed: {e}")
|
| 163 |
+
|
| 164 |
+
def animate_face(self, source_image: np.ndarray, driving_image: np.ndarray) -> np.ndarray:
|
| 165 |
+
"""Animate face using LivePortrait"""
|
| 166 |
+
try:
|
| 167 |
+
if not self.loaded:
|
| 168 |
+
logger.warning("LivePortrait models not loaded, returning source image")
|
| 169 |
+
return source_image
|
| 170 |
+
|
| 171 |
+
# Convert to tensors
|
| 172 |
+
source_tensor = torch.from_numpy(source_image).permute(2, 0, 1).float() / 255.0
|
| 173 |
+
driving_tensor = torch.from_numpy(driving_image).permute(2, 0, 1).float() / 255.0
|
| 174 |
+
|
| 175 |
+
if self.config.device == "cuda":
|
| 176 |
+
source_tensor = source_tensor.cuda()
|
| 177 |
+
driving_tensor = driving_tensor.cuda()
|
| 178 |
+
|
| 179 |
+
# Add batch dimension
|
| 180 |
+
source_tensor = source_tensor.unsqueeze(0)
|
| 181 |
+
driving_tensor = driving_tensor.unsqueeze(0)
|
| 182 |
+
|
| 183 |
+
# Placeholder for actual LivePortrait inference
|
| 184 |
+
# This would run the actual model pipeline
|
| 185 |
+
with torch.no_grad():
|
| 186 |
+
# For now, return source image (will be replaced with actual model)
|
| 187 |
+
result = source_tensor
|
| 188 |
+
|
| 189 |
+
# Convert back to numpy
|
| 190 |
+
result = result.squeeze(0).permute(1, 2, 0).cpu().numpy()
|
| 191 |
+
result = (result * 255).astype(np.uint8)
|
| 192 |
+
|
| 193 |
+
return result
|
| 194 |
+
|
| 195 |
+
except Exception as e:
|
| 196 |
+
logger.error(f"Face animation error: {e}")
|
| 197 |
+
return source_image
|
| 198 |
+
|
| 199 |
+
class RVCVoiceConverter:
|
| 200 |
+
"""RVC voice conversion model"""
|
| 201 |
+
def __init__(self, config: ModelConfig):
|
| 202 |
+
self.config = config
|
| 203 |
+
self.model = None
|
| 204 |
+
self.loaded = False
|
| 205 |
+
|
| 206 |
+
async def load_model(self):
|
| 207 |
+
"""Load RVC voice conversion model"""
|
| 208 |
+
try:
|
| 209 |
+
logger.info("Loading RVC voice conversion model...")
|
| 210 |
+
|
| 211 |
+
# Download RVC models if needed
|
| 212 |
+
await self._download_rvc_models()
|
| 213 |
+
|
| 214 |
+
# Load the actual RVC model
|
| 215 |
+
# Placeholder for RVC model loading
|
| 216 |
+
logger.info("RVC model loaded successfully")
|
| 217 |
+
self.loaded = True
|
| 218 |
+
return True
|
| 219 |
+
|
| 220 |
+
except Exception as e:
|
| 221 |
+
logger.error(f"Failed to load RVC model: {e}")
|
| 222 |
+
return False
|
| 223 |
+
|
| 224 |
+
async def _download_rvc_models(self):
|
| 225 |
+
"""Download required RVC models"""
|
| 226 |
+
try:
|
| 227 |
+
models_dir = Path(__file__).parent / "models" / "rvc"
|
| 228 |
+
models_dir.mkdir(parents=True, exist_ok=True)
|
| 229 |
+
|
| 230 |
+
# Download RVC pretrained models
|
| 231 |
+
# Placeholder for actual model downloads
|
| 232 |
+
|
| 233 |
+
except Exception as e:
|
| 234 |
+
logger.warning(f"RVC model download failed: {e}")
|
| 235 |
+
|
| 236 |
+
def convert_voice(self, audio_chunk: np.ndarray) -> np.ndarray:
|
| 237 |
+
"""Convert voice using RVC"""
|
| 238 |
+
try:
|
| 239 |
+
if not self.loaded:
|
| 240 |
+
logger.warning("RVC model not loaded, returning original audio")
|
| 241 |
+
return audio_chunk
|
| 242 |
+
|
| 243 |
+
# Placeholder for actual RVC inference
|
| 244 |
+
# This would run the voice conversion pipeline
|
| 245 |
+
|
| 246 |
+
return audio_chunk
|
| 247 |
+
|
| 248 |
+
except Exception as e:
|
| 249 |
+
logger.error(f"Voice conversion error: {e}")
|
| 250 |
+
return audio_chunk
|
| 251 |
+
|
| 252 |
+
class RealTimeAvatarPipeline:
|
| 253 |
+
"""Main real-time AI avatar pipeline"""
|
| 254 |
+
def __init__(self):
|
| 255 |
+
self.config = ModelConfig()
|
| 256 |
+
self.face_detector = FaceDetector(self.config)
|
| 257 |
+
self.liveportrait = LivePortraitModel(self.config)
|
| 258 |
+
self.rvc = RVCVoiceConverter(self.config)
|
| 259 |
+
|
| 260 |
+
# Performance optimization
|
| 261 |
+
self.optimizer = get_realtime_optimizer()
|
| 262 |
+
self.virtual_camera_manager = get_virtual_camera_manager()
|
| 263 |
+
|
| 264 |
+
# Frame buffers for real-time processing
|
| 265 |
+
self.video_buffer = deque(maxlen=5)
|
| 266 |
+
self.audio_buffer = deque(maxlen=10)
|
| 267 |
+
|
| 268 |
+
# Reference frames
|
| 269 |
+
self.reference_frame = None
|
| 270 |
+
self.current_face_bbox = None
|
| 271 |
+
|
| 272 |
+
# Performance tracking
|
| 273 |
+
self.frame_times = deque(maxlen=100)
|
| 274 |
+
self.audio_times = deque(maxlen=100)
|
| 275 |
+
|
| 276 |
+
# Processing locks
|
| 277 |
+
self.video_lock = threading.Lock()
|
| 278 |
+
self.audio_lock = threading.Lock()
|
| 279 |
+
|
| 280 |
+
# Virtual camera
|
| 281 |
+
self.virtual_camera = None
|
| 282 |
+
|
| 283 |
+
self.loaded = False
|
| 284 |
+
|
| 285 |
+
async def initialize(self):
|
| 286 |
+
"""Initialize all models"""
|
| 287 |
+
logger.info("Initializing real-time avatar pipeline...")
|
| 288 |
+
|
| 289 |
+
# Load models in parallel
|
| 290 |
+
tasks = [
|
| 291 |
+
self.face_detector.load_model(),
|
| 292 |
+
self.liveportrait.load_models(),
|
| 293 |
+
self.rvc.load_model()
|
| 294 |
+
]
|
| 295 |
+
|
| 296 |
+
results = await asyncio.gather(*tasks, return_exceptions=True)
|
| 297 |
+
|
| 298 |
+
success_count = sum(1 for r in results if r is True)
|
| 299 |
+
logger.info(f"Loaded {success_count}/3 models successfully")
|
| 300 |
+
|
| 301 |
+
if success_count >= 2: # At least face detector + one AI model
|
| 302 |
+
self.loaded = True
|
| 303 |
+
logger.info("Pipeline initialization successful")
|
| 304 |
+
return True
|
| 305 |
+
else:
|
| 306 |
+
logger.error("Pipeline initialization failed - insufficient models loaded")
|
| 307 |
+
return False
|
| 308 |
+
|
| 309 |
+
def set_reference_frame(self, frame: np.ndarray):
|
| 310 |
+
"""Set reference frame for avatar"""
|
| 311 |
+
try:
|
| 312 |
+
# Detect face in reference frame
|
| 313 |
+
bbox, confidence = self.face_detector.detect_face(frame, 0)
|
| 314 |
+
|
| 315 |
+
if bbox is not None and confidence >= self.config.face_detection_threshold:
|
| 316 |
+
self.reference_frame = frame.copy()
|
| 317 |
+
self.current_face_bbox = bbox
|
| 318 |
+
logger.info(f"Reference frame set with confidence: {confidence:.3f}")
|
| 319 |
+
return True
|
| 320 |
+
else:
|
| 321 |
+
logger.warning("No suitable face found in reference frame")
|
| 322 |
+
return False
|
| 323 |
+
|
| 324 |
+
except Exception as e:
|
| 325 |
+
logger.error(f"Error setting reference frame: {e}")
|
| 326 |
+
return False
|
| 327 |
+
|
| 328 |
+
def process_video_frame(self, frame: np.ndarray, frame_idx: int) -> np.ndarray:
|
| 329 |
+
"""Process single video frame for real-time animation"""
|
| 330 |
+
start_time = time.time()
|
| 331 |
+
|
| 332 |
+
try:
|
| 333 |
+
if not self.loaded or self.reference_frame is None:
|
| 334 |
+
return frame
|
| 335 |
+
|
| 336 |
+
# Get current optimization settings
|
| 337 |
+
opt_settings = self.optimizer.get_optimization_settings()
|
| 338 |
+
target_resolution = opt_settings.get('resolution', (512, 512))
|
| 339 |
+
|
| 340 |
+
with self.video_lock:
|
| 341 |
+
# Resize frame based on adaptive resolution
|
| 342 |
+
frame_resized = cv2.resize(frame, target_resolution)
|
| 343 |
+
|
| 344 |
+
# Use optimizer for frame processing
|
| 345 |
+
timestamp = time.time() * 1000
|
| 346 |
+
if not self.optimizer.process_frame(frame_resized, timestamp, "video"):
|
| 347 |
+
# Frame dropped for optimization
|
| 348 |
+
return frame_resized
|
| 349 |
+
|
| 350 |
+
# Detect face in current frame
|
| 351 |
+
bbox, confidence = self.face_detector.detect_face(frame_resized, frame_idx)
|
| 352 |
+
|
| 353 |
+
if bbox is not None and confidence >= self.config.face_redetect_threshold:
|
| 354 |
+
# Animate face using LivePortrait
|
| 355 |
+
animated_frame = self.liveportrait.animate_face(
|
| 356 |
+
self.reference_frame, frame_resized
|
| 357 |
+
)
|
| 358 |
+
|
| 359 |
+
# Apply any post-processing with current quality settings
|
| 360 |
+
result_frame = self._post_process_frame(animated_frame, opt_settings)
|
| 361 |
+
else:
|
| 362 |
+
# No face detected, return original frame
|
| 363 |
+
result_frame = frame_resized
|
| 364 |
+
|
| 365 |
+
# Update virtual camera if enabled
|
| 366 |
+
if self.virtual_camera and self.virtual_camera.is_running:
|
| 367 |
+
self.virtual_camera.update_frame(result_frame)
|
| 368 |
+
|
| 369 |
+
# Record processing time
|
| 370 |
+
processing_time = (time.time() - start_time) * 1000
|
| 371 |
+
self.frame_times.append(processing_time)
|
| 372 |
+
self.optimizer.latency_optimizer.record_latency("video_total", processing_time)
|
| 373 |
+
|
| 374 |
+
return result_frame
|
| 375 |
+
|
| 376 |
+
except Exception as e:
|
| 377 |
+
logger.error(f"Video processing error: {e}")
|
| 378 |
+
return frame
|
| 379 |
+
|
| 380 |
+
def process_audio_chunk(self, audio_chunk: np.ndarray) -> np.ndarray:
|
| 381 |
+
"""Process audio chunk for voice conversion"""
|
| 382 |
+
start_time = time.time()
|
| 383 |
+
|
| 384 |
+
try:
|
| 385 |
+
if not self.loaded:
|
| 386 |
+
return audio_chunk
|
| 387 |
+
|
| 388 |
+
with self.audio_lock:
|
| 389 |
+
# Use optimizer for audio processing
|
| 390 |
+
timestamp = time.time() * 1000
|
| 391 |
+
self.optimizer.process_frame(audio_chunk, timestamp, "audio")
|
| 392 |
+
|
| 393 |
+
# Convert voice using RVC
|
| 394 |
+
converted_audio = self.rvc.convert_voice(audio_chunk)
|
| 395 |
+
|
| 396 |
+
# Record processing time
|
| 397 |
+
processing_time = (time.time() - start_time) * 1000
|
| 398 |
+
self.audio_times.append(processing_time)
|
| 399 |
+
self.optimizer.latency_optimizer.record_latency("audio_total", processing_time)
|
| 400 |
+
|
| 401 |
+
return converted_audio
|
| 402 |
+
|
| 403 |
+
except Exception as e:
|
| 404 |
+
logger.error(f"Audio processing error: {e}")
|
| 405 |
+
return audio_chunk
|
| 406 |
+
|
| 407 |
+
def _post_process_frame(self, frame: np.ndarray, opt_settings: Dict[str, Any] = None) -> np.ndarray:
|
| 408 |
+
"""Apply post-processing to frame with quality adaptation"""
|
| 409 |
+
try:
|
| 410 |
+
if opt_settings is None:
|
| 411 |
+
return frame
|
| 412 |
+
|
| 413 |
+
quality = opt_settings.get('quality', 1.0)
|
| 414 |
+
|
| 415 |
+
# Apply quality-based post-processing
|
| 416 |
+
if quality < 1.0:
|
| 417 |
+
# Reduce processing intensity for lower quality
|
| 418 |
+
return frame
|
| 419 |
+
else:
|
| 420 |
+
# Full quality post-processing
|
| 421 |
+
# Apply color correction, sharpening, etc.
|
| 422 |
+
return frame
|
| 423 |
+
except Exception as e:
|
| 424 |
+
logger.error(f"Post-processing error: {e}")
|
| 425 |
+
return frame
|
| 426 |
+
|
| 427 |
+
def get_performance_stats(self) -> Dict[str, Any]:
|
| 428 |
+
"""Get pipeline performance statistics"""
|
| 429 |
+
try:
|
| 430 |
+
video_times = list(self.frame_times)
|
| 431 |
+
audio_times = list(self.audio_times)
|
| 432 |
+
|
| 433 |
+
# Get optimizer stats
|
| 434 |
+
opt_stats = self.optimizer.get_comprehensive_stats()
|
| 435 |
+
|
| 436 |
+
# Basic pipeline stats
|
| 437 |
+
pipeline_stats = {
|
| 438 |
+
"video_fps": len(video_times) / max(sum(video_times) / 1000, 0.001) if video_times else 0,
|
| 439 |
+
"avg_video_latency_ms": np.mean(video_times) if video_times else 0,
|
| 440 |
+
"avg_audio_latency_ms": np.mean(audio_times) if audio_times else 0,
|
| 441 |
+
"max_video_latency_ms": np.max(video_times) if video_times else 0,
|
| 442 |
+
"max_audio_latency_ms": np.max(audio_times) if audio_times else 0,
|
| 443 |
+
"models_loaded": self.loaded,
|
| 444 |
+
"gpu_available": torch.cuda.is_available(),
|
| 445 |
+
"gpu_memory_used": torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0,
|
| 446 |
+
"virtual_camera_active": self.virtual_camera is not None and self.virtual_camera.is_running
|
| 447 |
+
}
|
| 448 |
+
|
| 449 |
+
# Merge with optimizer stats
|
| 450 |
+
return {**pipeline_stats, "optimization": opt_stats}
|
| 451 |
+
|
| 452 |
+
except Exception as e:
|
| 453 |
+
logger.error(f"Stats error: {e}")
|
| 454 |
+
return {}
|
| 455 |
+
|
| 456 |
+
def enable_virtual_camera(self) -> bool:
|
| 457 |
+
"""Enable virtual camera output"""
|
| 458 |
+
try:
|
| 459 |
+
self.virtual_camera = self.virtual_camera_manager.create_camera(
|
| 460 |
+
"mirage_avatar", 640, 480, 30
|
| 461 |
+
)
|
| 462 |
+
return self.virtual_camera.start()
|
| 463 |
+
except Exception as e:
|
| 464 |
+
logger.error(f"Virtual camera error: {e}")
|
| 465 |
+
return False
|
| 466 |
+
|
| 467 |
+
def disable_virtual_camera(self):
|
| 468 |
+
"""Disable virtual camera output"""
|
| 469 |
+
if self.virtual_camera:
|
| 470 |
+
self.virtual_camera.stop()
|
| 471 |
+
self.virtual_camera = None
|
| 472 |
+
|
| 473 |
+
# Global pipeline instance
|
| 474 |
+
_pipeline_instance = None
|
| 475 |
+
|
| 476 |
+
def get_pipeline() -> RealTimeAvatarPipeline:
|
| 477 |
+
"""Get or create global pipeline instance"""
|
| 478 |
+
global _pipeline_instance
|
| 479 |
+
if _pipeline_instance is None:
|
| 480 |
+
_pipeline_instance = RealTimeAvatarPipeline()
|
| 481 |
+
return _pipeline_instance
|
fastapi_app.py
ADDED
|
@@ -0,0 +1,368 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, HTTPException, File, UploadFile
|
| 2 |
+
from fastapi.responses import HTMLResponse, JSONResponse
|
| 3 |
+
from fastapi.staticfiles import StaticFiles
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
import traceback
|
| 6 |
+
import time
|
| 7 |
+
import array
|
| 8 |
+
import subprocess
|
| 9 |
+
import json
|
| 10 |
+
import os
|
| 11 |
+
import asyncio
|
| 12 |
+
import numpy as np
|
| 13 |
+
import cv2
|
| 14 |
+
from typing import Any, Dict, List
|
| 15 |
+
from metrics import metrics as _metrics_singleton, Metrics
|
| 16 |
+
from config import config
|
| 17 |
+
from voice_processor import voice_processor
|
| 18 |
+
from avatar_pipeline import get_pipeline
|
| 19 |
+
|
| 20 |
+
app = FastAPI(title="Mirage Real-time AI Avatar System")
|
| 21 |
+
|
| 22 |
+
# Initialize AI pipeline
|
| 23 |
+
pipeline = get_pipeline()
|
| 24 |
+
pipeline_initialized = False
|
| 25 |
+
|
| 26 |
+
# Potentially reconfigure metrics based on config
|
| 27 |
+
if config.metrics_fps_window != 30: # default in metrics module
|
| 28 |
+
metrics = Metrics(fps_window=config.metrics_fps_window)
|
| 29 |
+
else:
|
| 30 |
+
metrics = _metrics_singleton
|
| 31 |
+
|
| 32 |
+
# Mount the static directory
|
| 33 |
+
static_dir = Path(__file__).parent / "static"
|
| 34 |
+
app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
@app.get("/", response_class=HTMLResponse)
|
| 38 |
+
async def root():
|
| 39 |
+
"""Serve the static/index.html file contents as HTML."""
|
| 40 |
+
index_path = static_dir / "index.html"
|
| 41 |
+
try:
|
| 42 |
+
content = index_path.read_text(encoding="utf-8")
|
| 43 |
+
except FileNotFoundError:
|
| 44 |
+
# Minimal fallback to satisfy route even if file not yet present.
|
| 45 |
+
content = "<html><body><h1>Mirage AI Avatar System</h1><p>Real-time AI avatar with face animation and voice conversion.</p></body></html>"
|
| 46 |
+
return HTMLResponse(content)
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
@app.get("/health")
|
| 50 |
+
async def health():
|
| 51 |
+
return {
|
| 52 |
+
"status": "ok",
|
| 53 |
+
"system": "real-time-ai-avatar",
|
| 54 |
+
"pipeline_loaded": pipeline_initialized,
|
| 55 |
+
"gpu_available": pipeline.config.device == "cuda"
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
@app.post("/initialize")
|
| 60 |
+
async def initialize_pipeline():
|
| 61 |
+
"""Initialize the AI pipeline"""
|
| 62 |
+
global pipeline_initialized
|
| 63 |
+
|
| 64 |
+
if pipeline_initialized:
|
| 65 |
+
return {"status": "already_initialized", "message": "Pipeline already loaded"}
|
| 66 |
+
|
| 67 |
+
try:
|
| 68 |
+
success = await pipeline.initialize()
|
| 69 |
+
if success:
|
| 70 |
+
pipeline_initialized = True
|
| 71 |
+
return {"status": "success", "message": "Pipeline initialized successfully"}
|
| 72 |
+
else:
|
| 73 |
+
return {"status": "error", "message": "Failed to initialize pipeline"}
|
| 74 |
+
except Exception as e:
|
| 75 |
+
return {"status": "error", "message": f"Initialization error: {str(e)}"}
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
@app.post("/set_reference")
|
| 79 |
+
async def set_reference_image(file: UploadFile = File(...)):
|
| 80 |
+
"""Set reference image for avatar"""
|
| 81 |
+
global pipeline_initialized
|
| 82 |
+
|
| 83 |
+
if not pipeline_initialized:
|
| 84 |
+
raise HTTPException(status_code=400, detail="Pipeline not initialized")
|
| 85 |
+
|
| 86 |
+
try:
|
| 87 |
+
# Read uploaded image
|
| 88 |
+
contents = await file.read()
|
| 89 |
+
nparr = np.frombuffer(contents, np.uint8)
|
| 90 |
+
frame = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
|
| 91 |
+
|
| 92 |
+
if frame is None:
|
| 93 |
+
raise HTTPException(status_code=400, detail="Invalid image format")
|
| 94 |
+
|
| 95 |
+
# Set as reference frame
|
| 96 |
+
success = pipeline.set_reference_frame(frame)
|
| 97 |
+
|
| 98 |
+
if success:
|
| 99 |
+
return {"status": "success", "message": "Reference image set successfully"}
|
| 100 |
+
else:
|
| 101 |
+
return {"status": "error", "message": "No suitable face found in image"}
|
| 102 |
+
|
| 103 |
+
except Exception as e:
|
| 104 |
+
return {"status": "error", "message": f"Error setting reference: {str(e)}"}
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
# Frame counter for processing
|
| 108 |
+
frame_counter = 0
|
| 109 |
+
|
| 110 |
+
async def _process_websocket(websocket: WebSocket, kind: str):
|
| 111 |
+
"""Enhanced WebSocket handler with AI processing"""
|
| 112 |
+
global frame_counter, pipeline_initialized
|
| 113 |
+
|
| 114 |
+
await websocket.accept()
|
| 115 |
+
last_ts = time.time() * 1000.0 if kind == "audio" else None
|
| 116 |
+
|
| 117 |
+
while True:
|
| 118 |
+
try:
|
| 119 |
+
data = await websocket.receive_bytes()
|
| 120 |
+
size = len(data)
|
| 121 |
+
|
| 122 |
+
if kind == "audio":
|
| 123 |
+
now = time.time() * 1000.0
|
| 124 |
+
interval = None
|
| 125 |
+
if last_ts is not None:
|
| 126 |
+
interval = now - last_ts
|
| 127 |
+
|
| 128 |
+
infer_ms = None
|
| 129 |
+
# Convert raw bytes -> int16 array for processing path
|
| 130 |
+
pcm_int16 = array.array('h')
|
| 131 |
+
pcm_int16.frombytes(data)
|
| 132 |
+
|
| 133 |
+
if config.voice_enable and pipeline_initialized:
|
| 134 |
+
# AI voice conversion
|
| 135 |
+
audio_np = np.array(pcm_int16, dtype=np.int16)
|
| 136 |
+
processed_audio = pipeline.process_audio_chunk(audio_np)
|
| 137 |
+
data = processed_audio.astype(np.int16).tobytes()
|
| 138 |
+
infer_ms = 50 # Placeholder timing
|
| 139 |
+
elif config.voice_enable:
|
| 140 |
+
# Fallback to voice processor
|
| 141 |
+
processed_view, infer_ms = voice_processor.process_pcm_int16(pcm_int16.tobytes(), sample_rate=16000)
|
| 142 |
+
data = processed_view.tobytes()
|
| 143 |
+
else:
|
| 144 |
+
# Pass-through
|
| 145 |
+
data = pcm_int16.tobytes()
|
| 146 |
+
|
| 147 |
+
metrics.record_audio_chunk(size_bytes=size, loop_interval_ms=interval, infer_time_ms=infer_ms)
|
| 148 |
+
last_ts = now
|
| 149 |
+
|
| 150 |
+
elif kind == "video":
|
| 151 |
+
if pipeline_initialized:
|
| 152 |
+
try:
|
| 153 |
+
# Decode JPEG frame
|
| 154 |
+
nparr = np.frombuffer(data, np.uint8)
|
| 155 |
+
frame = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
|
| 156 |
+
|
| 157 |
+
if frame is not None:
|
| 158 |
+
# AI face animation
|
| 159 |
+
processed_frame = pipeline.process_video_frame(frame, frame_counter)
|
| 160 |
+
frame_counter += 1
|
| 161 |
+
|
| 162 |
+
# Encode back to JPEG
|
| 163 |
+
_, encoded = cv2.imencode('.jpg', processed_frame, [cv2.IMWRITE_JPEG_QUALITY, 65])
|
| 164 |
+
data = encoded.tobytes()
|
| 165 |
+
except Exception as e:
|
| 166 |
+
print(f"Video processing error: {e}")
|
| 167 |
+
# Fallback to original data
|
| 168 |
+
pass
|
| 169 |
+
|
| 170 |
+
metrics.record_video_frame(size_bytes=size)
|
| 171 |
+
|
| 172 |
+
# Send processed data back
|
| 173 |
+
await websocket.send_bytes(data)
|
| 174 |
+
|
| 175 |
+
except WebSocketDisconnect:
|
| 176 |
+
break
|
| 177 |
+
except Exception:
|
| 178 |
+
print(f"[{kind} ws] Unexpected error:")
|
| 179 |
+
traceback.print_exc()
|
| 180 |
+
break
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
@app.websocket("/audio")
|
| 184 |
+
async def audio_ws(websocket: WebSocket):
|
| 185 |
+
await _process_websocket(websocket, "audio")
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
@app.websocket("/video")
|
| 189 |
+
async def video_ws(websocket: WebSocket):
|
| 190 |
+
await _process_websocket(websocket, "video")
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
@app.get("/metrics")
|
| 194 |
+
async def get_metrics():
|
| 195 |
+
base_metrics = metrics.snapshot()
|
| 196 |
+
|
| 197 |
+
# Add AI pipeline metrics if available
|
| 198 |
+
if pipeline_initialized:
|
| 199 |
+
pipeline_stats = pipeline.get_performance_stats()
|
| 200 |
+
base_metrics.update({
|
| 201 |
+
"ai_pipeline": pipeline_stats
|
| 202 |
+
})
|
| 203 |
+
|
| 204 |
+
return base_metrics
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
@app.get("/pipeline_status")
|
| 208 |
+
async def get_pipeline_status():
|
| 209 |
+
"""Get detailed pipeline status"""
|
| 210 |
+
if not pipeline_initialized:
|
| 211 |
+
return {
|
| 212 |
+
"initialized": False,
|
| 213 |
+
"message": "Pipeline not initialized"
|
| 214 |
+
}
|
| 215 |
+
|
| 216 |
+
try:
|
| 217 |
+
stats = pipeline.get_performance_stats()
|
| 218 |
+
return {
|
| 219 |
+
"initialized": True,
|
| 220 |
+
"stats": stats,
|
| 221 |
+
"reference_set": pipeline.reference_frame is not None
|
| 222 |
+
}
|
| 223 |
+
except Exception as e:
|
| 224 |
+
return {
|
| 225 |
+
"initialized": False,
|
| 226 |
+
"error": str(e)
|
| 227 |
+
}
|
| 228 |
+
|
| 229 |
+
|
| 230 |
+
@app.get("/gpu")
|
| 231 |
+
async def gpu_info():
|
| 232 |
+
"""Return basic GPU availability and memory statistics.
|
| 233 |
+
|
| 234 |
+
Priority order:
|
| 235 |
+
1. torch (if installed and CUDA available) for detailed stats per device.
|
| 236 |
+
2. nvidia-smi (if executable present) for name/total/used.
|
| 237 |
+
3. Fallback: available false.
|
| 238 |
+
"""
|
| 239 |
+
# Response scaffold
|
| 240 |
+
resp: Dict[str, Any] = {
|
| 241 |
+
"available": False,
|
| 242 |
+
"provider": None,
|
| 243 |
+
"device_count": 0,
|
| 244 |
+
"devices": [], # type: ignore[list-item]
|
| 245 |
+
}
|
| 246 |
+
|
| 247 |
+
# Try torch first (lazy import)
|
| 248 |
+
try:
|
| 249 |
+
import torch # type: ignore
|
| 250 |
+
|
| 251 |
+
if torch.cuda.is_available():
|
| 252 |
+
resp["available"] = True
|
| 253 |
+
resp["provider"] = "torch"
|
| 254 |
+
count = torch.cuda.device_count()
|
| 255 |
+
resp["device_count"] = count
|
| 256 |
+
devices: List[Dict[str, Any]] = []
|
| 257 |
+
for idx in range(count):
|
| 258 |
+
name = torch.cuda.get_device_name(idx)
|
| 259 |
+
try:
|
| 260 |
+
free_bytes, total_bytes = torch.cuda.mem_get_info(idx) # type: ignore[arg-type]
|
| 261 |
+
except TypeError:
|
| 262 |
+
# Older PyTorch versions take no index
|
| 263 |
+
free_bytes, total_bytes = torch.cuda.mem_get_info()
|
| 264 |
+
allocated = torch.cuda.memory_allocated(idx)
|
| 265 |
+
reserved = torch.cuda.memory_reserved(idx)
|
| 266 |
+
# Estimate free including unallocated reserved as reclaimable
|
| 267 |
+
est_free = free_bytes + max(reserved - allocated, 0)
|
| 268 |
+
to_mb = lambda b: round(b / (1024 * 1024), 2)
|
| 269 |
+
devices.append({
|
| 270 |
+
"index": idx,
|
| 271 |
+
"name": name,
|
| 272 |
+
"total_mb": to_mb(total_bytes),
|
| 273 |
+
"allocated_mb": to_mb(allocated),
|
| 274 |
+
"reserved_mb": to_mb(reserved),
|
| 275 |
+
"free_mem_get_info_mb": to_mb(free_bytes),
|
| 276 |
+
"free_estimate_mb": to_mb(est_free),
|
| 277 |
+
})
|
| 278 |
+
resp["devices"] = devices
|
| 279 |
+
return resp
|
| 280 |
+
except Exception: # noqa: BLE001
|
| 281 |
+
# Torch not installed or failed; fall through to nvidia-smi
|
| 282 |
+
pass
|
| 283 |
+
|
| 284 |
+
# Try nvidia-smi fallback
|
| 285 |
+
try:
|
| 286 |
+
cmd = [
|
| 287 |
+
"nvidia-smi",
|
| 288 |
+
"--query-gpu=name,memory.total,memory.used",
|
| 289 |
+
"--format=csv,noheader,nounits",
|
| 290 |
+
]
|
| 291 |
+
out = subprocess.check_output(cmd, stderr=subprocess.STDOUT, timeout=2).decode("utf-8").strip()
|
| 292 |
+
lines = [l for l in out.splitlines() if l.strip()]
|
| 293 |
+
if lines:
|
| 294 |
+
resp["available"] = True
|
| 295 |
+
resp["provider"] = "nvidia-smi"
|
| 296 |
+
resp["device_count"] = len(lines)
|
| 297 |
+
devices: List[Dict[str, Any]] = []
|
| 298 |
+
for idx, line in enumerate(lines):
|
| 299 |
+
# Expect: name, total, used
|
| 300 |
+
parts = [p.strip() for p in line.split(',')]
|
| 301 |
+
if len(parts) >= 3:
|
| 302 |
+
name, total_str, used_str = parts[:3]
|
| 303 |
+
try:
|
| 304 |
+
total = float(total_str)
|
| 305 |
+
used = float(used_str)
|
| 306 |
+
free = max(total - used, 0)
|
| 307 |
+
except ValueError:
|
| 308 |
+
total = used = free = 0.0
|
| 309 |
+
devices.append({
|
| 310 |
+
"index": idx,
|
| 311 |
+
"name": name,
|
| 312 |
+
"total_mb": total,
|
| 313 |
+
"allocated_mb": used, # approximate
|
| 314 |
+
"reserved_mb": None,
|
| 315 |
+
"free_estimate_mb": free,
|
| 316 |
+
})
|
| 317 |
+
resp["devices"] = devices
|
| 318 |
+
return resp
|
| 319 |
+
except Exception: # noqa: BLE001
|
| 320 |
+
pass
|
| 321 |
+
|
| 322 |
+
return resp
|
| 323 |
+
|
| 324 |
+
|
| 325 |
+
@app.on_event("startup")
|
| 326 |
+
async def log_config():
|
| 327 |
+
# Enhanced startup logging: core config + GPU availability summary.
|
| 328 |
+
cfg = config.as_dict()
|
| 329 |
+
# GPU probe (reuse gpu_info logic minimally without full device list to keep log concise)
|
| 330 |
+
gpu_available = False
|
| 331 |
+
gpu_name = None
|
| 332 |
+
try:
|
| 333 |
+
import torch # type: ignore
|
| 334 |
+
if torch.cuda.is_available():
|
| 335 |
+
gpu_available = True
|
| 336 |
+
gpu_name = torch.cuda.get_device_name(0)
|
| 337 |
+
else:
|
| 338 |
+
# Fallback quick nvidia-smi single line
|
| 339 |
+
try:
|
| 340 |
+
out = subprocess.check_output([
|
| 341 |
+
"nvidia-smi", "--query-gpu=name", "--format=csv,noheader,nounits"
|
| 342 |
+
], stderr=subprocess.STDOUT, timeout=1).decode("utf-8").strip().splitlines()
|
| 343 |
+
if out:
|
| 344 |
+
gpu_available = True
|
| 345 |
+
gpu_name = out[0].strip()
|
| 346 |
+
except Exception: # noqa: BLE001
|
| 347 |
+
pass
|
| 348 |
+
except Exception: # noqa: BLE001
|
| 349 |
+
pass
|
| 350 |
+
# Honor dynamic PORT if provided (HF Spaces usually fixed at 7860 for docker, but logging helps debugging)
|
| 351 |
+
listen_port = int(os.getenv("PORT", "7860"))
|
| 352 |
+
startup_line = {
|
| 353 |
+
"chunk_ms": cfg.get("chunk_ms"),
|
| 354 |
+
"voice_enabled": cfg.get("voice_enable"),
|
| 355 |
+
"metrics_fps_window": cfg.get("metrics_fps_window"),
|
| 356 |
+
"video_fps_limit": cfg.get("video_max_fps"),
|
| 357 |
+
"port": listen_port,
|
| 358 |
+
"gpu_available": gpu_available,
|
| 359 |
+
"gpu_name": gpu_name,
|
| 360 |
+
}
|
| 361 |
+
print("[startup]", startup_line)
|
| 362 |
+
|
| 363 |
+
|
| 364 |
+
# Note: The Dockerfile / README launch with: uvicorn app:app --port 7860
|
| 365 |
+
if __name__ == "__main__": # Optional direct run helper
|
| 366 |
+
import uvicorn # type: ignore
|
| 367 |
+
|
| 368 |
+
uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=False)
|
realtime_optimizer.py
ADDED
|
@@ -0,0 +1,394 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Real-time Optimization Module
|
| 3 |
+
Implements latency reduction, frame buffering, and GPU optimization
|
| 4 |
+
"""
|
| 5 |
+
import torch
|
| 6 |
+
import torch.nn.functional as F
|
| 7 |
+
import numpy as np
|
| 8 |
+
import time
|
| 9 |
+
import threading
|
| 10 |
+
import queue
|
| 11 |
+
import logging
|
| 12 |
+
from collections import deque
|
| 13 |
+
from typing import Dict, Any, Optional, Tuple
|
| 14 |
+
import psutil
|
| 15 |
+
import gc
|
| 16 |
+
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
class LatencyOptimizer:
|
| 20 |
+
"""Optimizes processing pipeline for minimal latency"""
|
| 21 |
+
|
| 22 |
+
def __init__(self, target_latency_ms: float = 250.0):
|
| 23 |
+
self.target_latency_ms = target_latency_ms
|
| 24 |
+
self.latency_history = deque(maxlen=100)
|
| 25 |
+
self.processing_times = {}
|
| 26 |
+
|
| 27 |
+
# Adaptive parameters
|
| 28 |
+
self.current_quality = 1.0 # 0.5 to 1.0
|
| 29 |
+
self.current_resolution = (512, 512)
|
| 30 |
+
self.current_fps = 20
|
| 31 |
+
|
| 32 |
+
# Performance thresholds
|
| 33 |
+
self.latency_threshold_high = target_latency_ms * 0.8 # 200ms
|
| 34 |
+
self.latency_threshold_low = target_latency_ms * 0.6 # 150ms
|
| 35 |
+
|
| 36 |
+
# Adaptation counters
|
| 37 |
+
self.high_latency_count = 0
|
| 38 |
+
self.low_latency_count = 0
|
| 39 |
+
self.adaptation_threshold = 5 # consecutive frames
|
| 40 |
+
|
| 41 |
+
def record_latency(self, stage: str, latency_ms: float):
|
| 42 |
+
"""Record latency for a processing stage"""
|
| 43 |
+
self.processing_times[stage] = latency_ms
|
| 44 |
+
|
| 45 |
+
# Calculate total latency
|
| 46 |
+
total_latency = sum(self.processing_times.values())
|
| 47 |
+
self.latency_history.append(total_latency)
|
| 48 |
+
|
| 49 |
+
# Trigger adaptation if needed
|
| 50 |
+
self._adapt_quality(total_latency)
|
| 51 |
+
|
| 52 |
+
def _adapt_quality(self, total_latency: float):
|
| 53 |
+
"""Adapt quality based on latency"""
|
| 54 |
+
if total_latency > self.latency_threshold_high:
|
| 55 |
+
self.high_latency_count += 1
|
| 56 |
+
self.low_latency_count = 0
|
| 57 |
+
|
| 58 |
+
if self.high_latency_count >= self.adaptation_threshold:
|
| 59 |
+
self._degrade_quality()
|
| 60 |
+
self.high_latency_count = 0
|
| 61 |
+
|
| 62 |
+
elif total_latency < self.latency_threshold_low:
|
| 63 |
+
self.low_latency_count += 1
|
| 64 |
+
self.high_latency_count = 0
|
| 65 |
+
|
| 66 |
+
if self.low_latency_count >= self.adaptation_threshold * 2: # Be more conservative with upgrades
|
| 67 |
+
self._improve_quality()
|
| 68 |
+
self.low_latency_count = 0
|
| 69 |
+
else:
|
| 70 |
+
self.high_latency_count = 0
|
| 71 |
+
self.low_latency_count = 0
|
| 72 |
+
|
| 73 |
+
def _degrade_quality(self):
|
| 74 |
+
"""Degrade quality to improve latency"""
|
| 75 |
+
if self.current_quality > 0.7:
|
| 76 |
+
self.current_quality -= 0.1
|
| 77 |
+
logger.info(f"Reduced quality to {self.current_quality:.1f}")
|
| 78 |
+
elif self.current_fps > 15:
|
| 79 |
+
self.current_fps -= 2
|
| 80 |
+
logger.info(f"Reduced FPS to {self.current_fps}")
|
| 81 |
+
elif self.current_resolution[0] > 384:
|
| 82 |
+
self.current_resolution = (384, 384)
|
| 83 |
+
logger.info(f"Reduced resolution to {self.current_resolution}")
|
| 84 |
+
|
| 85 |
+
def _improve_quality(self):
|
| 86 |
+
"""Improve quality when latency allows"""
|
| 87 |
+
if self.current_resolution[0] < 512:
|
| 88 |
+
self.current_resolution = (512, 512)
|
| 89 |
+
logger.info(f"Increased resolution to {self.current_resolution}")
|
| 90 |
+
elif self.current_fps < 20:
|
| 91 |
+
self.current_fps += 2
|
| 92 |
+
logger.info(f"Increased FPS to {self.current_fps}")
|
| 93 |
+
elif self.current_quality < 1.0:
|
| 94 |
+
self.current_quality += 0.1
|
| 95 |
+
logger.info(f"Increased quality to {self.current_quality:.1f}")
|
| 96 |
+
|
| 97 |
+
def get_current_settings(self) -> Dict[str, Any]:
|
| 98 |
+
"""Get current adaptive settings"""
|
| 99 |
+
return {
|
| 100 |
+
"quality": self.current_quality,
|
| 101 |
+
"resolution": self.current_resolution,
|
| 102 |
+
"fps": self.current_fps,
|
| 103 |
+
"avg_latency_ms": np.mean(self.latency_history) if self.latency_history else 0
|
| 104 |
+
}
|
| 105 |
+
|
| 106 |
+
class FrameBuffer:
|
| 107 |
+
"""Thread-safe frame buffer with overflow protection"""
|
| 108 |
+
|
| 109 |
+
def __init__(self, max_size: int = 5):
|
| 110 |
+
self.max_size = max_size
|
| 111 |
+
self.buffer = queue.Queue(maxsize=max_size)
|
| 112 |
+
self.dropped_frames = 0
|
| 113 |
+
self.total_frames = 0
|
| 114 |
+
|
| 115 |
+
def put_frame(self, frame: np.ndarray, timestamp: float) -> bool:
|
| 116 |
+
"""Add frame to buffer, returns False if dropped"""
|
| 117 |
+
self.total_frames += 1
|
| 118 |
+
|
| 119 |
+
try:
|
| 120 |
+
self.buffer.put_nowait((frame, timestamp))
|
| 121 |
+
return True
|
| 122 |
+
except queue.Full:
|
| 123 |
+
# Drop oldest frame and add new one
|
| 124 |
+
try:
|
| 125 |
+
self.buffer.get_nowait()
|
| 126 |
+
self.buffer.put_nowait((frame, timestamp))
|
| 127 |
+
self.dropped_frames += 1
|
| 128 |
+
return True
|
| 129 |
+
except queue.Empty:
|
| 130 |
+
return False
|
| 131 |
+
|
| 132 |
+
def get_frame(self) -> Optional[Tuple[np.ndarray, float]]:
|
| 133 |
+
"""Get next frame from buffer"""
|
| 134 |
+
try:
|
| 135 |
+
return self.buffer.get_nowait()
|
| 136 |
+
except queue.Empty:
|
| 137 |
+
return None
|
| 138 |
+
|
| 139 |
+
def get_stats(self) -> Dict[str, int]:
|
| 140 |
+
"""Get buffer statistics"""
|
| 141 |
+
return {
|
| 142 |
+
"size": self.buffer.qsize(),
|
| 143 |
+
"max_size": self.max_size,
|
| 144 |
+
"dropped_frames": self.dropped_frames,
|
| 145 |
+
"total_frames": self.total_frames,
|
| 146 |
+
"drop_rate": self.dropped_frames / max(self.total_frames, 1)
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
class GPUMemoryManager:
|
| 150 |
+
"""Manages GPU memory for optimal performance"""
|
| 151 |
+
|
| 152 |
+
def __init__(self):
|
| 153 |
+
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 154 |
+
self.memory_threshold = 0.9 # 90% of GPU memory
|
| 155 |
+
self.cleanup_interval = 50 # frames
|
| 156 |
+
self.frame_count = 0
|
| 157 |
+
|
| 158 |
+
def optimize_memory(self):
|
| 159 |
+
"""Optimize GPU memory usage"""
|
| 160 |
+
if not torch.cuda.is_available():
|
| 161 |
+
return
|
| 162 |
+
|
| 163 |
+
self.frame_count += 1
|
| 164 |
+
|
| 165 |
+
# Periodic cleanup
|
| 166 |
+
if self.frame_count % self.cleanup_interval == 0:
|
| 167 |
+
self._cleanup_memory()
|
| 168 |
+
|
| 169 |
+
# Emergency cleanup if memory usage is high
|
| 170 |
+
if self._get_memory_usage() > self.memory_threshold:
|
| 171 |
+
self._emergency_cleanup()
|
| 172 |
+
|
| 173 |
+
def _get_memory_usage(self) -> float:
|
| 174 |
+
"""Get current GPU memory usage ratio"""
|
| 175 |
+
if not torch.cuda.is_available():
|
| 176 |
+
return 0.0
|
| 177 |
+
|
| 178 |
+
allocated = torch.cuda.memory_allocated()
|
| 179 |
+
total = torch.cuda.get_device_properties(0).total_memory
|
| 180 |
+
return allocated / total
|
| 181 |
+
|
| 182 |
+
def _cleanup_memory(self):
|
| 183 |
+
"""Perform memory cleanup"""
|
| 184 |
+
if torch.cuda.is_available():
|
| 185 |
+
torch.cuda.empty_cache()
|
| 186 |
+
gc.collect()
|
| 187 |
+
|
| 188 |
+
def _emergency_cleanup(self):
|
| 189 |
+
"""Emergency memory cleanup"""
|
| 190 |
+
logger.warning("High GPU memory usage, performing emergency cleanup")
|
| 191 |
+
self._cleanup_memory()
|
| 192 |
+
|
| 193 |
+
# Force garbage collection
|
| 194 |
+
for _ in range(3):
|
| 195 |
+
gc.collect()
|
| 196 |
+
|
| 197 |
+
def get_memory_stats(self) -> Dict[str, float]:
|
| 198 |
+
"""Get GPU memory statistics"""
|
| 199 |
+
if not torch.cuda.is_available():
|
| 200 |
+
return {"available": False}
|
| 201 |
+
|
| 202 |
+
allocated = torch.cuda.memory_allocated()
|
| 203 |
+
reserved = torch.cuda.memory_reserved()
|
| 204 |
+
total = torch.cuda.get_device_properties(0).total_memory
|
| 205 |
+
|
| 206 |
+
return {
|
| 207 |
+
"available": True,
|
| 208 |
+
"allocated_gb": allocated / (1024**3),
|
| 209 |
+
"reserved_gb": reserved / (1024**3),
|
| 210 |
+
"total_gb": total / (1024**3),
|
| 211 |
+
"usage_ratio": allocated / total
|
| 212 |
+
}
|
| 213 |
+
|
| 214 |
+
class AudioSyncManager:
|
| 215 |
+
"""Manages audio-video synchronization"""
|
| 216 |
+
|
| 217 |
+
def __init__(self, max_drift_ms: float = 150.0):
|
| 218 |
+
self.max_drift_ms = max_drift_ms
|
| 219 |
+
self.audio_timestamps = deque(maxlen=100)
|
| 220 |
+
self.video_timestamps = deque(maxlen=100)
|
| 221 |
+
self.sync_offset = 0.0
|
| 222 |
+
|
| 223 |
+
def add_audio_timestamp(self, timestamp: float):
|
| 224 |
+
"""Add audio timestamp"""
|
| 225 |
+
self.audio_timestamps.append(timestamp)
|
| 226 |
+
self._calculate_sync_offset()
|
| 227 |
+
|
| 228 |
+
def add_video_timestamp(self, timestamp: float):
|
| 229 |
+
"""Add video timestamp"""
|
| 230 |
+
self.video_timestamps.append(timestamp)
|
| 231 |
+
self._calculate_sync_offset()
|
| 232 |
+
|
| 233 |
+
def _calculate_sync_offset(self):
|
| 234 |
+
"""Calculate current sync offset"""
|
| 235 |
+
if len(self.audio_timestamps) == 0 or len(self.video_timestamps) == 0:
|
| 236 |
+
return
|
| 237 |
+
|
| 238 |
+
# Calculate average timestamp difference
|
| 239 |
+
audio_avg = np.mean(list(self.audio_timestamps)[-10:]) # Last 10 samples
|
| 240 |
+
video_avg = np.mean(list(self.video_timestamps)[-10:])
|
| 241 |
+
|
| 242 |
+
self.sync_offset = audio_avg - video_avg
|
| 243 |
+
|
| 244 |
+
def should_drop_video_frame(self, video_timestamp: float) -> bool:
|
| 245 |
+
"""Check if video frame should be dropped for sync"""
|
| 246 |
+
if len(self.audio_timestamps) == 0:
|
| 247 |
+
return False
|
| 248 |
+
|
| 249 |
+
latest_audio = self.audio_timestamps[-1]
|
| 250 |
+
drift = video_timestamp - latest_audio
|
| 251 |
+
|
| 252 |
+
return abs(drift) > self.max_drift_ms
|
| 253 |
+
|
| 254 |
+
def get_sync_stats(self) -> Dict[str, float]:
|
| 255 |
+
"""Get synchronization statistics"""
|
| 256 |
+
return {
|
| 257 |
+
"sync_offset_ms": self.sync_offset,
|
| 258 |
+
"audio_samples": len(self.audio_timestamps),
|
| 259 |
+
"video_samples": len(self.video_timestamps)
|
| 260 |
+
}
|
| 261 |
+
|
| 262 |
+
class PerformanceProfiler:
|
| 263 |
+
"""Profiles system performance for optimization"""
|
| 264 |
+
|
| 265 |
+
def __init__(self):
|
| 266 |
+
self.cpu_usage = deque(maxlen=60) # 1 minute at 1 Hz
|
| 267 |
+
self.memory_usage = deque(maxlen=60)
|
| 268 |
+
self.gpu_utilization = deque(maxlen=60)
|
| 269 |
+
|
| 270 |
+
# Start monitoring thread
|
| 271 |
+
self.monitoring = True
|
| 272 |
+
self.monitor_thread = threading.Thread(target=self._monitor_system)
|
| 273 |
+
self.monitor_thread.daemon = True
|
| 274 |
+
self.monitor_thread.start()
|
| 275 |
+
|
| 276 |
+
def _monitor_system(self):
|
| 277 |
+
"""Monitor system resources"""
|
| 278 |
+
while self.monitoring:
|
| 279 |
+
try:
|
| 280 |
+
# CPU usage
|
| 281 |
+
cpu_percent = psutil.cpu_percent(interval=1)
|
| 282 |
+
self.cpu_usage.append(cpu_percent)
|
| 283 |
+
|
| 284 |
+
# Memory usage
|
| 285 |
+
memory = psutil.virtual_memory()
|
| 286 |
+
self.memory_usage.append(memory.percent)
|
| 287 |
+
|
| 288 |
+
# GPU utilization (if available)
|
| 289 |
+
if torch.cuda.is_available():
|
| 290 |
+
# Approximate GPU utilization based on memory usage
|
| 291 |
+
gpu_memory_used = torch.cuda.memory_allocated() / torch.cuda.get_device_properties(0).total_memory
|
| 292 |
+
self.gpu_utilization.append(gpu_memory_used * 100)
|
| 293 |
+
else:
|
| 294 |
+
self.gpu_utilization.append(0)
|
| 295 |
+
|
| 296 |
+
except Exception as e:
|
| 297 |
+
logger.error(f"System monitoring error: {e}")
|
| 298 |
+
|
| 299 |
+
time.sleep(1)
|
| 300 |
+
|
| 301 |
+
def stop_monitoring(self):
|
| 302 |
+
"""Stop system monitoring"""
|
| 303 |
+
self.monitoring = False
|
| 304 |
+
if self.monitor_thread.is_alive():
|
| 305 |
+
self.monitor_thread.join()
|
| 306 |
+
|
| 307 |
+
def get_system_stats(self) -> Dict[str, Any]:
|
| 308 |
+
"""Get system performance statistics"""
|
| 309 |
+
return {
|
| 310 |
+
"cpu_usage_avg": np.mean(self.cpu_usage) if self.cpu_usage else 0,
|
| 311 |
+
"cpu_usage_max": np.max(self.cpu_usage) if self.cpu_usage else 0,
|
| 312 |
+
"memory_usage_avg": np.mean(self.memory_usage) if self.memory_usage else 0,
|
| 313 |
+
"memory_usage_max": np.max(self.memory_usage) if self.memory_usage else 0,
|
| 314 |
+
"gpu_utilization_avg": np.mean(self.gpu_utilization) if self.gpu_utilization else 0,
|
| 315 |
+
"gpu_utilization_max": np.max(self.gpu_utilization) if self.gpu_utilization else 0
|
| 316 |
+
}
|
| 317 |
+
|
| 318 |
+
class RealTimeOptimizer:
|
| 319 |
+
"""Main real-time optimization controller"""
|
| 320 |
+
|
| 321 |
+
def __init__(self, target_latency_ms: float = 250.0):
|
| 322 |
+
self.latency_optimizer = LatencyOptimizer(target_latency_ms)
|
| 323 |
+
self.frame_buffer = FrameBuffer()
|
| 324 |
+
self.gpu_manager = GPUMemoryManager()
|
| 325 |
+
self.audio_sync = AudioSyncManager()
|
| 326 |
+
self.profiler = PerformanceProfiler()
|
| 327 |
+
|
| 328 |
+
self.stats = {}
|
| 329 |
+
self.last_stats_update = time.time()
|
| 330 |
+
|
| 331 |
+
def process_frame(self, frame: np.ndarray, timestamp: float, stage: str = "video") -> bool:
|
| 332 |
+
"""Process a frame with optimization"""
|
| 333 |
+
start_time = time.time()
|
| 334 |
+
|
| 335 |
+
# Check if frame should be dropped for sync
|
| 336 |
+
if stage == "video" and self.audio_sync.should_drop_video_frame(timestamp):
|
| 337 |
+
return False
|
| 338 |
+
|
| 339 |
+
# Add to buffer
|
| 340 |
+
success = self.frame_buffer.put_frame(frame, timestamp)
|
| 341 |
+
|
| 342 |
+
# Record processing time
|
| 343 |
+
processing_time = (time.time() - start_time) * 1000
|
| 344 |
+
self.latency_optimizer.record_latency(stage, processing_time)
|
| 345 |
+
|
| 346 |
+
# Update timestamps for sync
|
| 347 |
+
if stage == "video":
|
| 348 |
+
self.audio_sync.add_video_timestamp(timestamp)
|
| 349 |
+
elif stage == "audio":
|
| 350 |
+
self.audio_sync.add_audio_timestamp(timestamp)
|
| 351 |
+
|
| 352 |
+
# Optimize GPU memory
|
| 353 |
+
self.gpu_manager.optimize_memory()
|
| 354 |
+
|
| 355 |
+
return success
|
| 356 |
+
|
| 357 |
+
def get_frame(self) -> Optional[Tuple[np.ndarray, float]]:
|
| 358 |
+
"""Get next frame from buffer"""
|
| 359 |
+
return self.frame_buffer.get_frame()
|
| 360 |
+
|
| 361 |
+
def get_optimization_settings(self) -> Dict[str, Any]:
|
| 362 |
+
"""Get current optimization settings"""
|
| 363 |
+
return self.latency_optimizer.get_current_settings()
|
| 364 |
+
|
| 365 |
+
def get_comprehensive_stats(self) -> Dict[str, Any]:
|
| 366 |
+
"""Get comprehensive performance statistics"""
|
| 367 |
+
now = time.time()
|
| 368 |
+
|
| 369 |
+
# Update stats every 2 seconds
|
| 370 |
+
if now - self.last_stats_update > 2.0:
|
| 371 |
+
self.stats = {
|
| 372 |
+
"latency": self.latency_optimizer.get_current_settings(),
|
| 373 |
+
"buffer": self.frame_buffer.get_stats(),
|
| 374 |
+
"gpu": self.gpu_manager.get_memory_stats(),
|
| 375 |
+
"sync": self.audio_sync.get_sync_stats(),
|
| 376 |
+
"system": self.profiler.get_system_stats()
|
| 377 |
+
}
|
| 378 |
+
self.last_stats_update = now
|
| 379 |
+
|
| 380 |
+
return self.stats
|
| 381 |
+
|
| 382 |
+
def cleanup(self):
|
| 383 |
+
"""Cleanup optimizer resources"""
|
| 384 |
+
self.profiler.stop_monitoring()
|
| 385 |
+
|
| 386 |
+
# Global optimizer instance
|
| 387 |
+
_optimizer_instance = None
|
| 388 |
+
|
| 389 |
+
def get_realtime_optimizer() -> RealTimeOptimizer:
|
| 390 |
+
"""Get or create global optimizer instance"""
|
| 391 |
+
global _optimizer_instance
|
| 392 |
+
if _optimizer_instance is None:
|
| 393 |
+
_optimizer_instance = RealTimeOptimizer()
|
| 394 |
+
return _optimizer_instance
|
requirements.txt
CHANGED
|
@@ -1,9 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
fastapi==0.111.0
|
| 2 |
uvicorn[standard]==0.30.1
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
numpy==1.26.4
|
| 6 |
psutil==5.9.8
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
| 1 |
+
# Core Dependencies
|
| 2 |
+
gradio==4.44.0
|
| 3 |
+
torch==2.3.1
|
| 4 |
+
numpy==1.24.0
|
| 5 |
+
opencv-python-headless==4.9.0.80
|
| 6 |
+
pillow==10.3.0
|
| 7 |
+
|
| 8 |
+
# Optional - loaded on demand
|
| 9 |
fastapi==0.111.0
|
| 10 |
uvicorn[standard]==0.30.1
|
| 11 |
+
transformers==4.44.2
|
| 12 |
+
insightface==0.7.3
|
| 13 |
+
librosa==0.10.2
|
| 14 |
+
|
| 15 |
+
# ONNX & GPU Acceleration
|
| 16 |
+
onnx==1.16.1
|
| 17 |
+
onnxruntime-gpu==1.18.1
|
| 18 |
+
|
| 19 |
+
# System & Utils
|
| 20 |
numpy==1.26.4
|
| 21 |
psutil==5.9.8
|
| 22 |
+
|
| 23 |
+
# Optional GPU Optimization (may not be available on HF Spaces)
|
| 24 |
+
# tensorrt==10.3.0
|
| 25 |
+
# pycuda==2024.1.2
|
static/app.js
CHANGED
|
@@ -1,22 +1,35 @@
|
|
| 1 |
-
/* Mirage
|
| 2 |
|
| 3 |
-
// Globals
|
| 4 |
let audioWs = null;
|
| 5 |
let videoWs = null;
|
| 6 |
let audioContext = null;
|
| 7 |
-
let processorNode = null;
|
| 8 |
-
let playerNode = null;
|
| 9 |
let lastVideoSentTs = 0;
|
| 10 |
let remoteImageURL = null;
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
const LOG_EL = document.getElementById('log');
|
|
|
|
| 16 |
const START_BTN = document.getElementById('startBtn');
|
|
|
|
| 17 |
const LOCAL_VID = document.getElementById('localVid');
|
| 18 |
const REMOTE_VID_IMG = document.getElementById('remoteVid');
|
| 19 |
const REMOTE_AUDIO = document.getElementById('remoteAudio');
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
function log(msg) {
|
| 22 |
const ts = new Date().toISOString().split('T')[1].replace('Z','');
|
|
@@ -24,11 +37,83 @@ function log(msg) {
|
|
| 24 |
LOG_EL.scrollTop = LOG_EL.scrollHeight;
|
| 25 |
}
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
function wsURL(path) {
|
| 28 |
const proto = (location.protocol === 'https:') ? 'wss:' : 'ws:';
|
| 29 |
return `${proto}//${location.host}${path}`;
|
| 30 |
}
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
async function setupAudio(stream) {
|
| 33 |
audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
|
| 34 |
if (audioContext.state === 'suspended') {
|
|
@@ -39,16 +124,17 @@ async function setupAudio(stream) {
|
|
| 39 |
try {
|
| 40 |
await audioContext.audioWorklet.addModule('/static/worklet.js');
|
| 41 |
} catch (e) {
|
| 42 |
-
log('Failed to load worklet.js
|
| 43 |
console.error(e);
|
| 44 |
return;
|
| 45 |
}
|
| 46 |
|
| 47 |
-
//
|
| 48 |
-
|
| 49 |
-
const
|
| 50 |
-
|
| 51 |
log(`Audio chunk config: sampleRate=${audioContext.sampleRate}Hz chunkMs=${chunkMs}ms samplesPerChunk=${samplesPerChunk}`);
|
|
|
|
| 52 |
processorNode = new AudioWorkletNode(audioContext, 'pcm-chunker', {
|
| 53 |
processorOptions: { samplesPerChunk }
|
| 54 |
});
|
|
@@ -57,11 +143,11 @@ async function setupAudio(stream) {
|
|
| 57 |
// Capture mic
|
| 58 |
const source = audioContext.createMediaStreamSource(stream);
|
| 59 |
source.connect(processorNode);
|
| 60 |
-
|
|
|
|
| 61 |
const gain = audioContext.createGain();
|
| 62 |
gain.gain.value = 0;
|
| 63 |
processorNode.connect(gain).connect(audioContext.destination);
|
| 64 |
-
// Do NOT connect processorNode to destination to avoid local direct monitor; playback handled by pcm-player.
|
| 65 |
|
| 66 |
processorNode.port.onmessage = (event) => {
|
| 67 |
if (!audioWs || audioWs.readyState !== WebSocket.OPEN) return;
|
|
@@ -71,34 +157,37 @@ async function setupAudio(stream) {
|
|
| 71 |
|
| 72 |
// Connect playback node
|
| 73 |
playerNode.connect(audioContext.destination);
|
| 74 |
-
log('Audio nodes ready (
|
| 75 |
}
|
| 76 |
|
| 77 |
let _rxChunks = 0;
|
| 78 |
-
let _loopback = false;
|
| 79 |
function setupAudioWebSocket() {
|
| 80 |
audioWs = new WebSocket(wsURL('/audio'));
|
| 81 |
audioWs.binaryType = 'arraybuffer';
|
| 82 |
-
audioWs.onopen = () => log('Audio
|
| 83 |
-
audioWs.onclose = () => log('Audio
|
| 84 |
-
audioWs.onerror = (e) => log('Audio
|
| 85 |
audioWs.onmessage = (evt) => {
|
| 86 |
if (!(evt.data instanceof ArrayBuffer)) return;
|
| 87 |
-
|
| 88 |
const src = evt.data;
|
| 89 |
-
const copyBuf = src.slice(0);
|
| 90 |
-
|
|
|
|
| 91 |
const view = new Int16Array(src);
|
| 92 |
let min = 32767, max = -32768;
|
| 93 |
-
for (let i=0;i<view.length;i++) {
|
| 94 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
if (playerNode) playerNode.port.postMessage(copyBuf, [copyBuf]);
|
|
|
|
| 96 |
_rxChunks++;
|
| 97 |
-
if ((_rxChunks %
|
| 98 |
-
log(`Audio
|
| 99 |
-
}
|
| 100 |
-
if (_loopback && audioWs && audioWs.readyState === WebSocket.OPEN) {
|
| 101 |
-
// echo back again (will double) purely for test; guard to prevent infinite recursion (already from server)
|
| 102 |
}
|
| 103 |
};
|
| 104 |
}
|
|
@@ -109,12 +198,13 @@ async function setupVideo(stream) {
|
|
| 109 |
log('No video track found');
|
| 110 |
return;
|
| 111 |
}
|
|
|
|
| 112 |
const processor = new MediaStreamTrackProcessor({ track });
|
| 113 |
const reader = processor.readable.getReader();
|
| 114 |
|
| 115 |
const canvas = document.createElement('canvas');
|
| 116 |
-
canvas.width =
|
| 117 |
-
canvas.height =
|
| 118 |
const ctx = canvas.getContext('2d');
|
| 119 |
|
| 120 |
async function readLoop() {
|
|
@@ -123,21 +213,21 @@ async function setupVideo(stream) {
|
|
| 123 |
if (done) return;
|
| 124 |
|
| 125 |
const now = performance.now();
|
| 126 |
-
|
| 127 |
-
|
| 128 |
|
| 129 |
if (needSend && frame) {
|
| 130 |
try {
|
| 131 |
-
// Draw frame
|
| 132 |
if ('displayWidth' in frame && 'displayHeight' in frame) {
|
| 133 |
ctx.drawImage(frame, 0, 0, canvas.width, canvas.height);
|
| 134 |
} else {
|
| 135 |
-
// Fallback path: createImageBitmap then draw
|
| 136 |
const bmp = await createImageBitmap(frame);
|
| 137 |
ctx.drawImage(bmp, 0, 0, canvas.width, canvas.height);
|
| 138 |
bmp.close && bmp.close();
|
| 139 |
}
|
| 140 |
|
|
|
|
| 141 |
await new Promise((res, rej) => {
|
| 142 |
canvas.toBlob((blob) => {
|
| 143 |
if (!blob) return res();
|
|
@@ -147,15 +237,14 @@ async function setupVideo(stream) {
|
|
| 147 |
}
|
| 148 |
res();
|
| 149 |
}).catch(rej);
|
| 150 |
-
}, 'image/jpeg', 0.
|
| 151 |
});
|
|
|
|
| 152 |
lastVideoSentTs = now;
|
| 153 |
} catch (err) {
|
| 154 |
-
log('Video frame
|
| 155 |
console.error(err);
|
| 156 |
}
|
| 157 |
-
} else if (frame) {
|
| 158 |
-
// Skipped frame due to FPS governance; simply drop it.
|
| 159 |
}
|
| 160 |
|
| 161 |
frame.close && frame.close();
|
|
@@ -171,64 +260,228 @@ async function setupVideo(stream) {
|
|
| 171 |
function setupVideoWebSocket() {
|
| 172 |
videoWs = new WebSocket(wsURL('/video'));
|
| 173 |
videoWs.binaryType = 'arraybuffer';
|
| 174 |
-
videoWs.onopen = () => log('Video
|
| 175 |
-
videoWs.onclose = () => log('Video
|
| 176 |
-
videoWs.onerror = () => log('Video
|
| 177 |
videoWs.onmessage = (evt) => {
|
| 178 |
if (!(evt.data instanceof ArrayBuffer)) return;
|
|
|
|
|
|
|
| 179 |
const blob = new Blob([evt.data], { type: 'image/jpeg' });
|
| 180 |
if (remoteImageURL) URL.revokeObjectURL(remoteImageURL);
|
| 181 |
remoteImageURL = URL.createObjectURL(blob);
|
| 182 |
REMOTE_VID_IMG.src = remoteImageURL;
|
|
|
|
|
|
|
|
|
|
| 183 |
};
|
| 184 |
}
|
| 185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
async function start() {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 187 |
START_BTN.disabled = true;
|
| 188 |
-
|
| 189 |
-
|
|
|
|
|
|
|
| 190 |
try {
|
| 191 |
-
stream = await navigator.mediaDevices.getUserMedia({
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
START_BTN.disabled = false;
|
| 196 |
-
|
| 197 |
}
|
| 198 |
-
|
| 199 |
-
log('Media acquired');
|
| 200 |
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
}
|
| 207 |
|
|
|
|
|
|
|
| 208 |
START_BTN.addEventListener('click', start);
|
|
|
|
|
|
|
|
|
|
| 209 |
|
| 210 |
-
//
|
| 211 |
function testTone(seconds = 1, freq = 440) {
|
| 212 |
-
if (!audioContext || !playerNode) {
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
const sampleRate = audioContext.sampleRate;
|
| 214 |
const total = Math.floor(sampleRate * seconds);
|
| 215 |
const int16 = new Int16Array(total);
|
| 216 |
-
|
|
|
|
| 217 |
const s = Math.sin(2 * Math.PI * freq * (i / sampleRate));
|
| 218 |
int16[i] = s * 32767;
|
| 219 |
}
|
| 220 |
-
|
| 221 |
const chunk = Math.floor(sampleRate * 0.25);
|
| 222 |
for (let off = 0; off < int16.length; off += chunk) {
|
| 223 |
const view = int16.subarray(off, Math.min(off + chunk, int16.length));
|
| 224 |
-
// copy to standalone buffer for transfer
|
| 225 |
const copy = new Int16Array(view.length);
|
| 226 |
copy.set(view);
|
| 227 |
playerNode.port.postMessage(copy.buffer, [copy.buffer]);
|
| 228 |
}
|
| 229 |
-
|
|
|
|
| 230 |
}
|
| 231 |
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/* Mirage Real-time AI Avatar Client */
|
| 2 |
|
| 3 |
+
// Globals
|
| 4 |
let audioWs = null;
|
| 5 |
let videoWs = null;
|
| 6 |
let audioContext = null;
|
| 7 |
+
let processorNode = null;
|
| 8 |
+
let playerNode = null;
|
| 9 |
let lastVideoSentTs = 0;
|
| 10 |
let remoteImageURL = null;
|
| 11 |
+
let isRunning = false;
|
| 12 |
+
let pipelineInitialized = false;
|
| 13 |
+
let referenceSet = false;
|
| 14 |
+
let virtualCameraStream = null;
|
| 15 |
+
let metricsInterval = null;
|
| 16 |
|
| 17 |
+
// Configuration
|
| 18 |
+
const videoMaxFps = 20; // Increased for real-time avatar
|
| 19 |
+
const videoFrameIntervalMs = 1000 / videoMaxFps;
|
| 20 |
+
|
| 21 |
+
// DOM elements
|
| 22 |
const LOG_EL = document.getElementById('log');
|
| 23 |
+
const INIT_BTN = document.getElementById('initBtn');
|
| 24 |
const START_BTN = document.getElementById('startBtn');
|
| 25 |
+
const STOP_BTN = document.getElementById('stopBtn');
|
| 26 |
const LOCAL_VID = document.getElementById('localVid');
|
| 27 |
const REMOTE_VID_IMG = document.getElementById('remoteVid');
|
| 28 |
const REMOTE_AUDIO = document.getElementById('remoteAudio');
|
| 29 |
+
const STATUS_DIV = document.getElementById('statusDiv');
|
| 30 |
+
const REFERENCE_INPUT = document.getElementById('referenceInput');
|
| 31 |
+
const VIRTUAL_CAM_BTN = document.getElementById('virtualCamBtn');
|
| 32 |
+
const VIRTUAL_CANVAS = document.getElementById('virtualCanvas');
|
| 33 |
|
| 34 |
function log(msg) {
|
| 35 |
const ts = new Date().toISOString().split('T')[1].replace('Z','');
|
|
|
|
| 37 |
LOG_EL.scrollTop = LOG_EL.scrollHeight;
|
| 38 |
}
|
| 39 |
|
| 40 |
+
function showStatus(message, type = 'info') {
|
| 41 |
+
STATUS_DIV.innerHTML = `<div class="status ${type}">${message}</div>`;
|
| 42 |
+
setTimeout(() => STATUS_DIV.innerHTML = '', 5000);
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
function wsURL(path) {
|
| 46 |
const proto = (location.protocol === 'https:') ? 'wss:' : 'ws:';
|
| 47 |
return `${proto}//${location.host}${path}`;
|
| 48 |
}
|
| 49 |
|
| 50 |
+
// Initialize AI Pipeline
|
| 51 |
+
async function initializePipeline() {
|
| 52 |
+
INIT_BTN.disabled = true;
|
| 53 |
+
INIT_BTN.textContent = 'Initializing...';
|
| 54 |
+
|
| 55 |
+
try {
|
| 56 |
+
log('Initializing AI pipeline...');
|
| 57 |
+
const response = await fetch('/initialize', { method: 'POST' });
|
| 58 |
+
const result = await response.json();
|
| 59 |
+
|
| 60 |
+
if (result.status === 'success' || result.status === 'already_initialized') {
|
| 61 |
+
pipelineInitialized = true;
|
| 62 |
+
showStatus('AI pipeline initialized successfully!', 'success');
|
| 63 |
+
log('AI pipeline ready');
|
| 64 |
+
|
| 65 |
+
// Enable controls
|
| 66 |
+
START_BTN.disabled = false;
|
| 67 |
+
REFERENCE_INPUT.disabled = false;
|
| 68 |
+
|
| 69 |
+
// Start metrics updates
|
| 70 |
+
startMetricsUpdates();
|
| 71 |
+
} else {
|
| 72 |
+
showStatus(`Initialization failed: ${result.message}`, 'error');
|
| 73 |
+
log(`Pipeline init failed: ${result.message}`);
|
| 74 |
+
}
|
| 75 |
+
} catch (error) {
|
| 76 |
+
showStatus(`Initialization error: ${error.message}`, 'error');
|
| 77 |
+
log(`Init error: ${error}`);
|
| 78 |
+
} finally {
|
| 79 |
+
INIT_BTN.disabled = false;
|
| 80 |
+
INIT_BTN.textContent = 'Initialize AI Pipeline';
|
| 81 |
+
}
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
// Handle reference image upload
|
| 85 |
+
async function handleReferenceUpload(event) {
|
| 86 |
+
const file = event.target.files[0];
|
| 87 |
+
if (!file) return;
|
| 88 |
+
|
| 89 |
+
log('Uploading reference image...');
|
| 90 |
+
|
| 91 |
+
try {
|
| 92 |
+
const formData = new FormData();
|
| 93 |
+
formData.append('file', file);
|
| 94 |
+
|
| 95 |
+
const response = await fetch('/set_reference', {
|
| 96 |
+
method: 'POST',
|
| 97 |
+
body: formData
|
| 98 |
+
});
|
| 99 |
+
|
| 100 |
+
const result = await response.json();
|
| 101 |
+
|
| 102 |
+
if (result.status === 'success') {
|
| 103 |
+
referenceSet = true;
|
| 104 |
+
showStatus('Reference image set successfully!', 'success');
|
| 105 |
+
log('Reference image configured');
|
| 106 |
+
VIRTUAL_CAM_BTN.disabled = false;
|
| 107 |
+
} else {
|
| 108 |
+
showStatus(`Reference setup failed: ${result.message}`, 'error');
|
| 109 |
+
log(`Reference error: ${result.message}`);
|
| 110 |
+
}
|
| 111 |
+
} catch (error) {
|
| 112 |
+
showStatus(`Upload error: ${error.message}`, 'error');
|
| 113 |
+
log(`Reference upload error: ${error}`);
|
| 114 |
+
}
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
async function setupAudio(stream) {
|
| 118 |
audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
|
| 119 |
if (audioContext.state === 'suspended') {
|
|
|
|
| 124 |
try {
|
| 125 |
await audioContext.audioWorklet.addModule('/static/worklet.js');
|
| 126 |
} catch (e) {
|
| 127 |
+
log('Failed to load worklet.js - audio processing disabled.');
|
| 128 |
console.error(e);
|
| 129 |
return;
|
| 130 |
}
|
| 131 |
|
| 132 |
+
// Enhanced chunk configuration for real-time processing
|
| 133 |
+
const chunkMs = 160; // Keep at 160ms for balance between latency and quality
|
| 134 |
+
const samplesPerChunk = Math.round(audioContext.sampleRate * (chunkMs / 1000));
|
| 135 |
+
|
| 136 |
log(`Audio chunk config: sampleRate=${audioContext.sampleRate}Hz chunkMs=${chunkMs}ms samplesPerChunk=${samplesPerChunk}`);
|
| 137 |
+
|
| 138 |
processorNode = new AudioWorkletNode(audioContext, 'pcm-chunker', {
|
| 139 |
processorOptions: { samplesPerChunk }
|
| 140 |
});
|
|
|
|
| 143 |
// Capture mic
|
| 144 |
const source = audioContext.createMediaStreamSource(stream);
|
| 145 |
source.connect(processorNode);
|
| 146 |
+
|
| 147 |
+
// Keep worklet active
|
| 148 |
const gain = audioContext.createGain();
|
| 149 |
gain.gain.value = 0;
|
| 150 |
processorNode.connect(gain).connect(audioContext.destination);
|
|
|
|
| 151 |
|
| 152 |
processorNode.port.onmessage = (event) => {
|
| 153 |
if (!audioWs || audioWs.readyState !== WebSocket.OPEN) return;
|
|
|
|
| 157 |
|
| 158 |
// Connect playback node
|
| 159 |
playerNode.connect(audioContext.destination);
|
| 160 |
+
log('Audio nodes ready (enhanced for AI processing)');
|
| 161 |
}
|
| 162 |
|
| 163 |
let _rxChunks = 0;
|
|
|
|
| 164 |
function setupAudioWebSocket() {
|
| 165 |
audioWs = new WebSocket(wsURL('/audio'));
|
| 166 |
audioWs.binaryType = 'arraybuffer';
|
| 167 |
+
audioWs.onopen = () => log('Audio WebSocket connected');
|
| 168 |
+
audioWs.onclose = () => log('Audio WebSocket disconnected');
|
| 169 |
+
audioWs.onerror = (e) => log('Audio WebSocket error');
|
| 170 |
audioWs.onmessage = (evt) => {
|
| 171 |
if (!(evt.data instanceof ArrayBuffer)) return;
|
| 172 |
+
|
| 173 |
const src = evt.data;
|
| 174 |
+
const copyBuf = src.slice(0);
|
| 175 |
+
|
| 176 |
+
// Amplitude analysis for voice activity detection
|
| 177 |
const view = new Int16Array(src);
|
| 178 |
let min = 32767, max = -32768;
|
| 179 |
+
for (let i = 0; i < view.length; i++) {
|
| 180 |
+
const v = view[i];
|
| 181 |
+
if (v < min) min = v;
|
| 182 |
+
if (v > max) max = v;
|
| 183 |
+
}
|
| 184 |
+
|
| 185 |
+
// Forward to player
|
| 186 |
if (playerNode) playerNode.port.postMessage(copyBuf, [copyBuf]);
|
| 187 |
+
|
| 188 |
_rxChunks++;
|
| 189 |
+
if ((_rxChunks % 30) === 0) { // Reduced logging frequency
|
| 190 |
+
log(`Audio processed: ${_rxChunks} chunks, amp:[${min},${max}]`);
|
|
|
|
|
|
|
|
|
|
| 191 |
}
|
| 192 |
};
|
| 193 |
}
|
|
|
|
| 198 |
log('No video track found');
|
| 199 |
return;
|
| 200 |
}
|
| 201 |
+
|
| 202 |
const processor = new MediaStreamTrackProcessor({ track });
|
| 203 |
const reader = processor.readable.getReader();
|
| 204 |
|
| 205 |
const canvas = document.createElement('canvas');
|
| 206 |
+
canvas.width = 512; // Increased resolution for AI processing
|
| 207 |
+
canvas.height = 512;
|
| 208 |
const ctx = canvas.getContext('2d');
|
| 209 |
|
| 210 |
async function readLoop() {
|
|
|
|
| 213 |
if (done) return;
|
| 214 |
|
| 215 |
const now = performance.now();
|
| 216 |
+
const elapsed = now - lastVideoSentTs;
|
| 217 |
+
const needSend = elapsed >= videoFrameIntervalMs;
|
| 218 |
|
| 219 |
if (needSend && frame) {
|
| 220 |
try {
|
| 221 |
+
// Draw frame with improved quality
|
| 222 |
if ('displayWidth' in frame && 'displayHeight' in frame) {
|
| 223 |
ctx.drawImage(frame, 0, 0, canvas.width, canvas.height);
|
| 224 |
} else {
|
|
|
|
| 225 |
const bmp = await createImageBitmap(frame);
|
| 226 |
ctx.drawImage(bmp, 0, 0, canvas.width, canvas.height);
|
| 227 |
bmp.close && bmp.close();
|
| 228 |
}
|
| 229 |
|
| 230 |
+
// Send to AI pipeline with higher quality
|
| 231 |
await new Promise((res, rej) => {
|
| 232 |
canvas.toBlob((blob) => {
|
| 233 |
if (!blob) return res();
|
|
|
|
| 237 |
}
|
| 238 |
res();
|
| 239 |
}).catch(rej);
|
| 240 |
+
}, 'image/jpeg', 0.8); // Higher quality for AI processing
|
| 241 |
});
|
| 242 |
+
|
| 243 |
lastVideoSentTs = now;
|
| 244 |
} catch (err) {
|
| 245 |
+
log('Video frame processing error');
|
| 246 |
console.error(err);
|
| 247 |
}
|
|
|
|
|
|
|
| 248 |
}
|
| 249 |
|
| 250 |
frame.close && frame.close();
|
|
|
|
| 260 |
function setupVideoWebSocket() {
|
| 261 |
videoWs = new WebSocket(wsURL('/video'));
|
| 262 |
videoWs.binaryType = 'arraybuffer';
|
| 263 |
+
videoWs.onopen = () => log('Video WebSocket connected');
|
| 264 |
+
videoWs.onclose = () => log('Video WebSocket disconnected');
|
| 265 |
+
videoWs.onerror = () => log('Video WebSocket error');
|
| 266 |
videoWs.onmessage = (evt) => {
|
| 267 |
if (!(evt.data instanceof ArrayBuffer)) return;
|
| 268 |
+
|
| 269 |
+
// Display AI-processed video
|
| 270 |
const blob = new Blob([evt.data], { type: 'image/jpeg' });
|
| 271 |
if (remoteImageURL) URL.revokeObjectURL(remoteImageURL);
|
| 272 |
remoteImageURL = URL.createObjectURL(blob);
|
| 273 |
REMOTE_VID_IMG.src = remoteImageURL;
|
| 274 |
+
|
| 275 |
+
// Update virtual camera if enabled
|
| 276 |
+
updateVirtualCamera(evt.data);
|
| 277 |
};
|
| 278 |
}
|
| 279 |
|
| 280 |
+
// Virtual Camera Support
|
| 281 |
+
function updateVirtualCamera(imageData) {
|
| 282 |
+
if (!virtualCameraStream) return;
|
| 283 |
+
|
| 284 |
+
try {
|
| 285 |
+
// Create image from received data
|
| 286 |
+
const blob = new Blob([imageData], { type: 'image/jpeg' });
|
| 287 |
+
const img = new Image();
|
| 288 |
+
|
| 289 |
+
img.onload = () => {
|
| 290 |
+
// Draw to virtual canvas
|
| 291 |
+
const ctx = VIRTUAL_CANVAS.getContext('2d');
|
| 292 |
+
VIRTUAL_CANVAS.width = 512;
|
| 293 |
+
VIRTUAL_CANVAS.height = 512;
|
| 294 |
+
ctx.drawImage(img, 0, 0, 512, 512);
|
| 295 |
+
};
|
| 296 |
+
|
| 297 |
+
img.src = URL.createObjectURL(blob);
|
| 298 |
+
} catch (error) {
|
| 299 |
+
console.error('Virtual camera update error:', error);
|
| 300 |
+
}
|
| 301 |
+
}
|
| 302 |
+
|
| 303 |
+
async function enableVirtualCamera() {
|
| 304 |
+
try {
|
| 305 |
+
if (!VIRTUAL_CANVAS.captureStream) {
|
| 306 |
+
showStatus('Virtual camera not supported in this browser', 'error');
|
| 307 |
+
return;
|
| 308 |
+
}
|
| 309 |
+
|
| 310 |
+
// Create virtual camera stream from canvas
|
| 311 |
+
virtualCameraStream = VIRTUAL_CANVAS.captureStream(30);
|
| 312 |
+
|
| 313 |
+
// Try to create a virtual camera device (browser-dependent)
|
| 314 |
+
if (navigator.mediaDevices.getDisplayMedia) {
|
| 315 |
+
log('Virtual camera enabled - canvas stream ready');
|
| 316 |
+
showStatus('Virtual camera enabled! Use canvas stream in video apps.', 'success');
|
| 317 |
+
VIRTUAL_CAM_BTN.textContent = 'Virtual Camera Active';
|
| 318 |
+
VIRTUAL_CAM_BTN.disabled = true;
|
| 319 |
+
} else {
|
| 320 |
+
showStatus('Virtual camera API not available', 'error');
|
| 321 |
+
}
|
| 322 |
+
} catch (error) {
|
| 323 |
+
showStatus(`Virtual camera error: ${error.message}`, 'error');
|
| 324 |
+
log(`Virtual camera error: ${error}`);
|
| 325 |
+
}
|
| 326 |
+
}
|
| 327 |
+
|
| 328 |
+
// Metrics and Performance Monitoring
|
| 329 |
+
function startMetricsUpdates() {
|
| 330 |
+
if (metricsInterval) clearInterval(metricsInterval);
|
| 331 |
+
|
| 332 |
+
metricsInterval = setInterval(async () => {
|
| 333 |
+
try {
|
| 334 |
+
const response = await fetch('/pipeline_status');
|
| 335 |
+
const data = await response.json();
|
| 336 |
+
|
| 337 |
+
if (data.initialized && data.stats) {
|
| 338 |
+
const stats = data.stats;
|
| 339 |
+
|
| 340 |
+
document.getElementById('fpsValue').textContent = stats.video_fps?.toFixed(1) || '0';
|
| 341 |
+
document.getElementById('latencyValue').textContent =
|
| 342 |
+
Math.round(stats.avg_video_latency_ms || 0) + 'ms';
|
| 343 |
+
document.getElementById('gpuValue').textContent =
|
| 344 |
+
stats.gpu_memory_used?.toFixed(1) + 'GB' || 'N/A';
|
| 345 |
+
document.getElementById('statusValue').textContent =
|
| 346 |
+
stats.models_loaded ? 'Active' : 'Loading';
|
| 347 |
+
}
|
| 348 |
+
} catch (error) {
|
| 349 |
+
console.error('Metrics update error:', error);
|
| 350 |
+
}
|
| 351 |
+
}, 2000); // Update every 2 seconds
|
| 352 |
+
}
|
| 353 |
+
|
| 354 |
async function start() {
|
| 355 |
+
if (!pipelineInitialized) {
|
| 356 |
+
showStatus('Please initialize the AI pipeline first', 'error');
|
| 357 |
+
return;
|
| 358 |
+
}
|
| 359 |
+
|
| 360 |
START_BTN.disabled = true;
|
| 361 |
+
START_BTN.textContent = 'Starting...';
|
| 362 |
+
|
| 363 |
+
log('Requesting media access...');
|
| 364 |
+
|
| 365 |
try {
|
| 366 |
+
const stream = await navigator.mediaDevices.getUserMedia({
|
| 367 |
+
audio: true,
|
| 368 |
+
video: {
|
| 369 |
+
width: 640,
|
| 370 |
+
height: 480,
|
| 371 |
+
frameRate: 30
|
| 372 |
+
}
|
| 373 |
+
});
|
| 374 |
+
|
| 375 |
+
LOCAL_VID.srcObject = stream;
|
| 376 |
+
log('Media access granted');
|
| 377 |
+
|
| 378 |
+
// Setup WebSocket connections
|
| 379 |
+
setupAudioWebSocket();
|
| 380 |
+
setupVideoWebSocket();
|
| 381 |
+
|
| 382 |
+
// Setup audio and video processing
|
| 383 |
+
await setupAudio(stream);
|
| 384 |
+
await setupVideo(stream);
|
| 385 |
+
|
| 386 |
+
isRunning = true;
|
| 387 |
+
START_BTN.style.display = 'none';
|
| 388 |
+
STOP_BTN.disabled = false;
|
| 389 |
+
STOP_BTN.style.display = 'inline-block';
|
| 390 |
+
|
| 391 |
+
log(`Real-time AI avatar started: ${videoMaxFps} fps, 160ms audio chunks`);
|
| 392 |
+
showStatus('AI Avatar system is now running!', 'success');
|
| 393 |
+
|
| 394 |
+
} catch (error) {
|
| 395 |
+
showStatus(`Media access failed: ${error.message}`, 'error');
|
| 396 |
+
log(`getUserMedia failed: ${error}`);
|
| 397 |
START_BTN.disabled = false;
|
| 398 |
+
START_BTN.textContent = 'Start Capture';
|
| 399 |
}
|
| 400 |
+
}
|
|
|
|
| 401 |
|
| 402 |
+
function stop() {
|
| 403 |
+
log('Stopping AI avatar system...');
|
| 404 |
+
|
| 405 |
+
// Close WebSocket connections
|
| 406 |
+
if (audioWs) {
|
| 407 |
+
audioWs.close();
|
| 408 |
+
audioWs = null;
|
| 409 |
+
}
|
| 410 |
+
if (videoWs) {
|
| 411 |
+
videoWs.close();
|
| 412 |
+
videoWs = null;
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
// Stop media tracks
|
| 416 |
+
if (LOCAL_VID.srcObject) {
|
| 417 |
+
LOCAL_VID.srcObject.getTracks().forEach(track => track.stop());
|
| 418 |
+
LOCAL_VID.srcObject = null;
|
| 419 |
+
}
|
| 420 |
+
|
| 421 |
+
// Reset audio context
|
| 422 |
+
if (audioContext) {
|
| 423 |
+
audioContext.close();
|
| 424 |
+
audioContext = null;
|
| 425 |
+
}
|
| 426 |
+
|
| 427 |
+
// Reset UI
|
| 428 |
+
isRunning = false;
|
| 429 |
+
START_BTN.disabled = false;
|
| 430 |
+
START_BTN.textContent = 'Start Capture';
|
| 431 |
+
START_BTN.style.display = 'inline-block';
|
| 432 |
+
STOP_BTN.disabled = true;
|
| 433 |
+
STOP_BTN.style.display = 'none';
|
| 434 |
+
|
| 435 |
+
log('System stopped');
|
| 436 |
+
showStatus('AI Avatar system stopped', 'info');
|
| 437 |
}
|
| 438 |
|
| 439 |
+
// Event Listeners
|
| 440 |
+
INIT_BTN.addEventListener('click', initializePipeline);
|
| 441 |
START_BTN.addEventListener('click', start);
|
| 442 |
+
STOP_BTN.addEventListener('click', stop);
|
| 443 |
+
REFERENCE_INPUT.addEventListener('change', handleReferenceUpload);
|
| 444 |
+
VIRTUAL_CAM_BTN.addEventListener('click', enableVirtualCamera);
|
| 445 |
|
| 446 |
+
// Debug functions
|
| 447 |
function testTone(seconds = 1, freq = 440) {
|
| 448 |
+
if (!audioContext || !playerNode) {
|
| 449 |
+
log('testTone: audio not ready');
|
| 450 |
+
return;
|
| 451 |
+
}
|
| 452 |
+
|
| 453 |
const sampleRate = audioContext.sampleRate;
|
| 454 |
const total = Math.floor(sampleRate * seconds);
|
| 455 |
const int16 = new Int16Array(total);
|
| 456 |
+
|
| 457 |
+
for (let i = 0; i < total; i++) {
|
| 458 |
const s = Math.sin(2 * Math.PI * freq * (i / sampleRate));
|
| 459 |
int16[i] = s * 32767;
|
| 460 |
}
|
| 461 |
+
|
| 462 |
const chunk = Math.floor(sampleRate * 0.25);
|
| 463 |
for (let off = 0; off < int16.length; off += chunk) {
|
| 464 |
const view = int16.subarray(off, Math.min(off + chunk, int16.length));
|
|
|
|
| 465 |
const copy = new Int16Array(view.length);
|
| 466 |
copy.set(view);
|
| 467 |
playerNode.port.postMessage(copy.buffer, [copy.buffer]);
|
| 468 |
}
|
| 469 |
+
|
| 470 |
+
log(`Test tone ${freq}Hz for ${seconds}s injected`);
|
| 471 |
}
|
| 472 |
|
| 473 |
+
// Global API for debugging
|
| 474 |
+
window.__mirage = {
|
| 475 |
+
start,
|
| 476 |
+
stop,
|
| 477 |
+
initializePipeline,
|
| 478 |
+
audioWs: () => audioWs,
|
| 479 |
+
videoWs: () => videoWs,
|
| 480 |
+
testTone,
|
| 481 |
+
pipelineInitialized: () => pipelineInitialized,
|
| 482 |
+
referenceSet: () => referenceSet
|
| 483 |
+
};
|
| 484 |
+
|
| 485 |
+
// Auto-initialize on load for development
|
| 486 |
+
log('Mirage Real-time AI Avatar System loaded');
|
| 487 |
+
log('Click "Initialize AI Pipeline" to begin setup');
|
static/index.html
CHANGED
|
@@ -2,22 +2,171 @@
|
|
| 2 |
<html lang="en">
|
| 3 |
<head>
|
| 4 |
<meta charset="UTF-8" />
|
| 5 |
-
<title>Mirage
|
| 6 |
<meta name="viewport" content="width=device-width,initial-scale=1" />
|
| 7 |
<style>
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
</style>
|
| 11 |
</head>
|
| 12 |
<body>
|
| 13 |
-
<
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
</div>
|
| 19 |
-
<audio id="remoteAudio" autoplay></audio>
|
| 20 |
-
<div id="log"></div>
|
| 21 |
-
<script src="/static/app.js"></script>
|
| 22 |
</body>
|
| 23 |
</html>
|
|
|
|
| 2 |
<html lang="en">
|
| 3 |
<head>
|
| 4 |
<meta charset="UTF-8" />
|
| 5 |
+
<title>Mirage Real-time AI Avatar</title>
|
| 6 |
<meta name="viewport" content="width=device-width,initial-scale=1" />
|
| 7 |
<style>
|
| 8 |
+
body {
|
| 9 |
+
font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica,Arial,sans-serif;
|
| 10 |
+
margin: 20px;
|
| 11 |
+
background: #1a1a1a;
|
| 12 |
+
color: #fff;
|
| 13 |
+
}
|
| 14 |
+
.container { max-width: 1200px; margin: 0 auto; }
|
| 15 |
+
.header { text-align: center; margin-bottom: 30px; }
|
| 16 |
+
.controls {
|
| 17 |
+
display: flex;
|
| 18 |
+
gap: 10px;
|
| 19 |
+
margin-bottom: 20px;
|
| 20 |
+
flex-wrap: wrap;
|
| 21 |
+
align-items: center;
|
| 22 |
+
}
|
| 23 |
+
.video-container {
|
| 24 |
+
display: flex;
|
| 25 |
+
gap: 20px;
|
| 26 |
+
margin-bottom: 20px;
|
| 27 |
+
flex-wrap: wrap;
|
| 28 |
+
}
|
| 29 |
+
.video-box {
|
| 30 |
+
flex: 1;
|
| 31 |
+
min-width: 300px;
|
| 32 |
+
background: #2a2a2a;
|
| 33 |
+
border-radius: 8px;
|
| 34 |
+
padding: 15px;
|
| 35 |
+
}
|
| 36 |
+
video, img, canvas {
|
| 37 |
+
width: 100%;
|
| 38 |
+
max-width: 400px;
|
| 39 |
+
border-radius: 8px;
|
| 40 |
+
background: #000;
|
| 41 |
+
}
|
| 42 |
+
button {
|
| 43 |
+
background: #007bff;
|
| 44 |
+
color: white;
|
| 45 |
+
border: none;
|
| 46 |
+
padding: 10px 16px;
|
| 47 |
+
border-radius: 5px;
|
| 48 |
+
cursor: pointer;
|
| 49 |
+
font-size: 14px;
|
| 50 |
+
}
|
| 51 |
+
button:hover { background: #0056b3; }
|
| 52 |
+
button:disabled {
|
| 53 |
+
background: #6c757d;
|
| 54 |
+
cursor: not-allowed;
|
| 55 |
+
}
|
| 56 |
+
.status {
|
| 57 |
+
padding: 10px;
|
| 58 |
+
border-radius: 5px;
|
| 59 |
+
margin: 10px 0;
|
| 60 |
+
}
|
| 61 |
+
.status.success { background: #28a745; }
|
| 62 |
+
.status.error { background: #dc3545; }
|
| 63 |
+
.status.info { background: #17a2b8; }
|
| 64 |
+
#log {
|
| 65 |
+
font: 11px/1.3 monospace;
|
| 66 |
+
white-space: pre-line;
|
| 67 |
+
background: #000;
|
| 68 |
+
padding: 15px;
|
| 69 |
+
border-radius: 8px;
|
| 70 |
+
height: 200px;
|
| 71 |
+
overflow-y: auto;
|
| 72 |
+
color: #0f0;
|
| 73 |
+
}
|
| 74 |
+
.metrics {
|
| 75 |
+
display: grid;
|
| 76 |
+
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
| 77 |
+
gap: 15px;
|
| 78 |
+
margin: 20px 0;
|
| 79 |
+
}
|
| 80 |
+
.metric-card {
|
| 81 |
+
background: #2a2a2a;
|
| 82 |
+
padding: 15px;
|
| 83 |
+
border-radius: 8px;
|
| 84 |
+
border-left: 4px solid #007bff;
|
| 85 |
+
}
|
| 86 |
+
.metric-value {
|
| 87 |
+
font-size: 24px;
|
| 88 |
+
font-weight: bold;
|
| 89 |
+
color: #007bff;
|
| 90 |
+
}
|
| 91 |
+
.metric-label {
|
| 92 |
+
font-size: 12px;
|
| 93 |
+
color: #888;
|
| 94 |
+
text-transform: uppercase;
|
| 95 |
+
}
|
| 96 |
+
input[type="file"] {
|
| 97 |
+
margin: 10px 0;
|
| 98 |
+
}
|
| 99 |
+
.virtual-camera-info {
|
| 100 |
+
background: #2a2a2a;
|
| 101 |
+
padding: 15px;
|
| 102 |
+
border-radius: 8px;
|
| 103 |
+
margin: 20px 0;
|
| 104 |
+
}
|
| 105 |
</style>
|
| 106 |
</head>
|
| 107 |
<body>
|
| 108 |
+
<div class="container">
|
| 109 |
+
<div class="header">
|
| 110 |
+
<h1>🎭 Mirage Real-time AI Avatar</h1>
|
| 111 |
+
<p>Live face animation and voice conversion with <250ms latency</p>
|
| 112 |
+
</div>
|
| 113 |
+
|
| 114 |
+
<div class="controls">
|
| 115 |
+
<button id="initBtn">Initialize AI Pipeline</button>
|
| 116 |
+
<button id="startBtn" disabled>Start Capture</button>
|
| 117 |
+
<button id="stopBtn" disabled>Stop</button>
|
| 118 |
+
<input type="file" id="referenceInput" accept="image/*" disabled>
|
| 119 |
+
<button id="virtualCamBtn" disabled>Enable Virtual Camera</button>
|
| 120 |
+
</div>
|
| 121 |
+
|
| 122 |
+
<div id="statusDiv"></div>
|
| 123 |
+
|
| 124 |
+
<div class="metrics" id="metrics">
|
| 125 |
+
<div class="metric-card">
|
| 126 |
+
<div class="metric-value" id="fpsValue">0</div>
|
| 127 |
+
<div class="metric-label">Video FPS</div>
|
| 128 |
+
</div>
|
| 129 |
+
<div class="metric-card">
|
| 130 |
+
<div class="metric-value" id="latencyValue">0ms</div>
|
| 131 |
+
<div class="metric-label">Avg Latency</div>
|
| 132 |
+
</div>
|
| 133 |
+
<div class="metric-card">
|
| 134 |
+
<div class="metric-value" id="gpuValue">N/A</div>
|
| 135 |
+
<div class="metric-label">GPU Memory</div>
|
| 136 |
+
</div>
|
| 137 |
+
<div class="metric-card">
|
| 138 |
+
<div class="metric-value" id="statusValue">Idle</div>
|
| 139 |
+
<div class="metric-label">Pipeline Status</div>
|
| 140 |
+
</div>
|
| 141 |
+
</div>
|
| 142 |
+
|
| 143 |
+
<div class="video-container">
|
| 144 |
+
<div class="video-box">
|
| 145 |
+
<h3>📹 Local Camera</h3>
|
| 146 |
+
<video id="localVid" autoplay muted playsinline></video>
|
| 147 |
+
</div>
|
| 148 |
+
<div class="video-box">
|
| 149 |
+
<h3>🤖 AI Avatar Output</h3>
|
| 150 |
+
<img id="remoteVid" alt="AI avatar output" />
|
| 151 |
+
<canvas id="virtualCanvas" style="display: none;"></canvas>
|
| 152 |
+
</div>
|
| 153 |
+
</div>
|
| 154 |
+
|
| 155 |
+
<div class="virtual-camera-info">
|
| 156 |
+
<h3>📺 Virtual Camera Integration</h3>
|
| 157 |
+
<p>The AI avatar output can be used as a virtual camera in:</p>
|
| 158 |
+
<ul>
|
| 159 |
+
<li>🎥 Zoom, Google Meet, Microsoft Teams</li>
|
| 160 |
+
<li>💬 Discord, Slack, WhatsApp Desktop</li>
|
| 161 |
+
<li>📱 OBS Studio, Streamlabs</li>
|
| 162 |
+
</ul>
|
| 163 |
+
<p><strong>Setup:</strong> Enable virtual camera, then select "Mirage Virtual Camera" in your video app settings.</p>
|
| 164 |
+
</div>
|
| 165 |
+
|
| 166 |
+
<audio id="remoteAudio" autoplay></audio>
|
| 167 |
+
<div id="log"></div>
|
| 168 |
+
|
| 169 |
+
<script src="/static/app.js"></script>
|
| 170 |
</div>
|
|
|
|
|
|
|
|
|
|
| 171 |
</body>
|
| 172 |
</html>
|
virtual_camera.py
ADDED
|
@@ -0,0 +1,306 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Virtual Camera Integration
|
| 3 |
+
Enables AI avatar output to be used as virtual camera in third-party apps
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
import sys
|
| 7 |
+
import numpy as np
|
| 8 |
+
import cv2
|
| 9 |
+
import threading
|
| 10 |
+
import time
|
| 11 |
+
import logging
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
from typing import Optional, Callable
|
| 14 |
+
import subprocess
|
| 15 |
+
import platform
|
| 16 |
+
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
class VirtualCamera:
|
| 20 |
+
"""Virtual camera device for streaming AI avatar output"""
|
| 21 |
+
|
| 22 |
+
def __init__(self, width: int = 640, height: int = 480, fps: int = 30):
|
| 23 |
+
self.width = width
|
| 24 |
+
self.height = height
|
| 25 |
+
self.fps = fps
|
| 26 |
+
self.frame_interval = 1.0 / fps
|
| 27 |
+
|
| 28 |
+
self.device_path = None
|
| 29 |
+
self.process = None
|
| 30 |
+
self.is_running = False
|
| 31 |
+
self.current_frame = None
|
| 32 |
+
self.frame_lock = threading.Lock()
|
| 33 |
+
|
| 34 |
+
# Platform-specific setup
|
| 35 |
+
self.platform = platform.system().lower()
|
| 36 |
+
self._setup_platform()
|
| 37 |
+
|
| 38 |
+
def _setup_platform(self):
|
| 39 |
+
"""Setup platform-specific virtual camera"""
|
| 40 |
+
if self.platform == "darwin": # macOS
|
| 41 |
+
self._setup_macos()
|
| 42 |
+
elif self.platform == "linux":
|
| 43 |
+
self._setup_linux()
|
| 44 |
+
elif self.platform == "windows":
|
| 45 |
+
self._setup_windows()
|
| 46 |
+
else:
|
| 47 |
+
logger.warning(f"Virtual camera not supported on {self.platform}")
|
| 48 |
+
|
| 49 |
+
def _setup_macos(self):
|
| 50 |
+
"""Setup virtual camera on macOS"""
|
| 51 |
+
try:
|
| 52 |
+
# Check if obs-mac-virtualcam is available
|
| 53 |
+
result = subprocess.run(['which', 'obs'], capture_output=True, text=True)
|
| 54 |
+
if result.returncode == 0:
|
| 55 |
+
logger.info("OBS Virtual Camera detected on macOS")
|
| 56 |
+
self.device_path = "/dev/obs-virtualcam"
|
| 57 |
+
else:
|
| 58 |
+
logger.warning("OBS Virtual Camera not found on macOS")
|
| 59 |
+
except Exception as e:
|
| 60 |
+
logger.error(f"macOS virtual camera setup error: {e}")
|
| 61 |
+
|
| 62 |
+
def _setup_linux(self):
|
| 63 |
+
"""Setup virtual camera on Linux using v4l2loopback"""
|
| 64 |
+
try:
|
| 65 |
+
# Check if v4l2loopback is available
|
| 66 |
+
result = subprocess.run(['lsmod'], capture_output=True, text=True)
|
| 67 |
+
if 'v4l2loopback' in result.stdout:
|
| 68 |
+
# Find available loopback device
|
| 69 |
+
for i in range(10):
|
| 70 |
+
device = f"/dev/video{i}"
|
| 71 |
+
if os.path.exists(device):
|
| 72 |
+
try:
|
| 73 |
+
# Test if device is writable
|
| 74 |
+
with open(device, 'wb') as f:
|
| 75 |
+
self.device_path = device
|
| 76 |
+
logger.info(f"Found v4l2loopback device: {device}")
|
| 77 |
+
break
|
| 78 |
+
except PermissionError:
|
| 79 |
+
continue
|
| 80 |
+
else:
|
| 81 |
+
logger.warning("v4l2loopback not loaded. Install with: sudo modprobe v4l2loopback")
|
| 82 |
+
except Exception as e:
|
| 83 |
+
logger.error(f"Linux virtual camera setup error: {e}")
|
| 84 |
+
|
| 85 |
+
def _setup_windows(self):
|
| 86 |
+
"""Setup virtual camera on Windows using OBS Virtual Camera"""
|
| 87 |
+
try:
|
| 88 |
+
# Check for OBS Virtual Camera
|
| 89 |
+
obs_paths = [
|
| 90 |
+
r"C:\Program Files\obs-studio\bin\64bit\obs64.exe",
|
| 91 |
+
r"C:\Program Files (x86)\obs-studio\bin\32bit\obs32.exe"
|
| 92 |
+
]
|
| 93 |
+
|
| 94 |
+
for path in obs_paths:
|
| 95 |
+
if os.path.exists(path):
|
| 96 |
+
logger.info("OBS Virtual Camera available on Windows")
|
| 97 |
+
self.device_path = "obs-virtualcam"
|
| 98 |
+
return
|
| 99 |
+
|
| 100 |
+
logger.warning("OBS Virtual Camera not found on Windows")
|
| 101 |
+
except Exception as e:
|
| 102 |
+
logger.error(f"Windows virtual camera setup error: {e}")
|
| 103 |
+
|
| 104 |
+
def start(self) -> bool:
|
| 105 |
+
"""Start the virtual camera"""
|
| 106 |
+
if self.is_running:
|
| 107 |
+
logger.warning("Virtual camera already running")
|
| 108 |
+
return True
|
| 109 |
+
|
| 110 |
+
if not self.device_path:
|
| 111 |
+
logger.error("No virtual camera device available")
|
| 112 |
+
return False
|
| 113 |
+
|
| 114 |
+
try:
|
| 115 |
+
if self.platform == "linux" and self.device_path.startswith("/dev/video"):
|
| 116 |
+
# Use FFmpeg for Linux v4l2loopback
|
| 117 |
+
cmd = [
|
| 118 |
+
'ffmpeg',
|
| 119 |
+
'-f', 'rawvideo',
|
| 120 |
+
'-pixel_format', 'bgr24',
|
| 121 |
+
'-video_size', f'{self.width}x{self.height}',
|
| 122 |
+
'-framerate', str(self.fps),
|
| 123 |
+
'-i', 'pipe:0',
|
| 124 |
+
'-f', 'v4l2',
|
| 125 |
+
'-pix_fmt', 'yuv420p',
|
| 126 |
+
self.device_path,
|
| 127 |
+
'-y'
|
| 128 |
+
]
|
| 129 |
+
|
| 130 |
+
self.process = subprocess.Popen(
|
| 131 |
+
cmd,
|
| 132 |
+
stdin=subprocess.PIPE,
|
| 133 |
+
stdout=subprocess.DEVNULL,
|
| 134 |
+
stderr=subprocess.DEVNULL
|
| 135 |
+
)
|
| 136 |
+
|
| 137 |
+
self.is_running = True
|
| 138 |
+
logger.info(f"Virtual camera started on {self.device_path}")
|
| 139 |
+
return True
|
| 140 |
+
|
| 141 |
+
elif self.platform == "darwin":
|
| 142 |
+
# For macOS, we'll use a different approach
|
| 143 |
+
logger.info("macOS virtual camera setup complete")
|
| 144 |
+
self.is_running = True
|
| 145 |
+
return True
|
| 146 |
+
|
| 147 |
+
elif self.platform == "windows":
|
| 148 |
+
# For Windows, integrate with OBS Virtual Camera
|
| 149 |
+
logger.info("Windows virtual camera setup complete")
|
| 150 |
+
self.is_running = True
|
| 151 |
+
return True
|
| 152 |
+
|
| 153 |
+
except Exception as e:
|
| 154 |
+
logger.error(f"Failed to start virtual camera: {e}")
|
| 155 |
+
return False
|
| 156 |
+
|
| 157 |
+
return False
|
| 158 |
+
|
| 159 |
+
def stop(self):
|
| 160 |
+
"""Stop the virtual camera"""
|
| 161 |
+
self.is_running = False
|
| 162 |
+
|
| 163 |
+
if self.process:
|
| 164 |
+
try:
|
| 165 |
+
self.process.terminate()
|
| 166 |
+
self.process.wait(timeout=5)
|
| 167 |
+
except subprocess.TimeoutExpired:
|
| 168 |
+
self.process.kill()
|
| 169 |
+
finally:
|
| 170 |
+
self.process = None
|
| 171 |
+
|
| 172 |
+
logger.info("Virtual camera stopped")
|
| 173 |
+
|
| 174 |
+
def update_frame(self, frame: np.ndarray):
|
| 175 |
+
"""Update the current frame to be streamed"""
|
| 176 |
+
with self.frame_lock:
|
| 177 |
+
# Resize frame to virtual camera dimensions
|
| 178 |
+
self.current_frame = cv2.resize(frame, (self.width, self.height))
|
| 179 |
+
|
| 180 |
+
# Send frame to virtual camera if running
|
| 181 |
+
if self.is_running and self.process:
|
| 182 |
+
try:
|
| 183 |
+
frame_data = self.current_frame.tobytes()
|
| 184 |
+
self.process.stdin.write(frame_data)
|
| 185 |
+
self.process.stdin.flush()
|
| 186 |
+
except Exception as e:
|
| 187 |
+
logger.error(f"Failed to write frame: {e}")
|
| 188 |
+
|
| 189 |
+
def get_frame(self) -> Optional[np.ndarray]:
|
| 190 |
+
"""Get the current frame"""
|
| 191 |
+
with self.frame_lock:
|
| 192 |
+
return self.current_frame.copy() if self.current_frame is not None else None
|
| 193 |
+
|
| 194 |
+
class VirtualCameraManager:
|
| 195 |
+
"""Manager for virtual camera instances"""
|
| 196 |
+
|
| 197 |
+
def __init__(self):
|
| 198 |
+
self.cameras = {}
|
| 199 |
+
self.default_camera = None
|
| 200 |
+
|
| 201 |
+
def create_camera(self, name: str = "mirage_avatar", width: int = 640, height: int = 480, fps: int = 30) -> VirtualCamera:
|
| 202 |
+
"""Create a new virtual camera"""
|
| 203 |
+
if name in self.cameras:
|
| 204 |
+
logger.warning(f"Camera {name} already exists")
|
| 205 |
+
return self.cameras[name]
|
| 206 |
+
|
| 207 |
+
camera = VirtualCamera(width, height, fps)
|
| 208 |
+
self.cameras[name] = camera
|
| 209 |
+
|
| 210 |
+
if self.default_camera is None:
|
| 211 |
+
self.default_camera = camera
|
| 212 |
+
|
| 213 |
+
logger.info(f"Created virtual camera: {name}")
|
| 214 |
+
return camera
|
| 215 |
+
|
| 216 |
+
def get_camera(self, name: str = None) -> Optional[VirtualCamera]:
|
| 217 |
+
"""Get a virtual camera by name"""
|
| 218 |
+
if name is None:
|
| 219 |
+
return self.default_camera
|
| 220 |
+
return self.cameras.get(name)
|
| 221 |
+
|
| 222 |
+
def start_camera(self, name: str = None) -> bool:
|
| 223 |
+
"""Start a virtual camera"""
|
| 224 |
+
camera = self.get_camera(name)
|
| 225 |
+
if camera:
|
| 226 |
+
return camera.start()
|
| 227 |
+
return False
|
| 228 |
+
|
| 229 |
+
def stop_camera(self, name: str = None):
|
| 230 |
+
"""Stop a virtual camera"""
|
| 231 |
+
camera = self.get_camera(name)
|
| 232 |
+
if camera:
|
| 233 |
+
camera.stop()
|
| 234 |
+
|
| 235 |
+
def update_frame(self, frame: np.ndarray, name: str = None):
|
| 236 |
+
"""Update frame for a virtual camera"""
|
| 237 |
+
camera = self.get_camera(name)
|
| 238 |
+
if camera:
|
| 239 |
+
camera.update_frame(frame)
|
| 240 |
+
|
| 241 |
+
def stop_all(self):
|
| 242 |
+
"""Stop all virtual cameras"""
|
| 243 |
+
for camera in self.cameras.values():
|
| 244 |
+
camera.stop()
|
| 245 |
+
self.cameras.clear()
|
| 246 |
+
self.default_camera = None
|
| 247 |
+
|
| 248 |
+
# Global manager instance
|
| 249 |
+
_camera_manager = VirtualCameraManager()
|
| 250 |
+
|
| 251 |
+
def get_virtual_camera_manager() -> VirtualCameraManager:
|
| 252 |
+
"""Get the global virtual camera manager"""
|
| 253 |
+
return _camera_manager
|
| 254 |
+
|
| 255 |
+
def install_virtual_camera_dependencies():
|
| 256 |
+
"""Install platform-specific virtual camera dependencies"""
|
| 257 |
+
system = platform.system().lower()
|
| 258 |
+
|
| 259 |
+
if system == "linux":
|
| 260 |
+
print("To enable virtual camera on Linux:")
|
| 261 |
+
print("1. Install v4l2loopback:")
|
| 262 |
+
print(" sudo apt-get install v4l2loopback-dkms")
|
| 263 |
+
print("2. Load the module:")
|
| 264 |
+
print(" sudo modprobe v4l2loopback devices=1 video_nr=10 card_label='Mirage Virtual Camera'")
|
| 265 |
+
print("3. Install FFmpeg:")
|
| 266 |
+
print(" sudo apt-get install ffmpeg")
|
| 267 |
+
|
| 268 |
+
elif system == "darwin":
|
| 269 |
+
print("To enable virtual camera on macOS:")
|
| 270 |
+
print("1. Install OBS Studio with Virtual Camera plugin")
|
| 271 |
+
print("2. Or use other virtual camera software like CamTwist")
|
| 272 |
+
|
| 273 |
+
elif system == "windows":
|
| 274 |
+
print("To enable virtual camera on Windows:")
|
| 275 |
+
print("1. Install OBS Studio")
|
| 276 |
+
print("2. Enable Virtual Camera in OBS Tools menu")
|
| 277 |
+
print("3. Or use other virtual camera software like ManyCam")
|
| 278 |
+
|
| 279 |
+
if __name__ == "__main__":
|
| 280 |
+
# Test virtual camera setup
|
| 281 |
+
install_virtual_camera_dependencies()
|
| 282 |
+
|
| 283 |
+
# Create test camera
|
| 284 |
+
manager = get_virtual_camera_manager()
|
| 285 |
+
camera = manager.create_camera("test")
|
| 286 |
+
|
| 287 |
+
if camera.start():
|
| 288 |
+
print("Virtual camera started successfully!")
|
| 289 |
+
|
| 290 |
+
# Generate test pattern
|
| 291 |
+
test_frame = np.zeros((480, 640, 3), dtype=np.uint8)
|
| 292 |
+
cv2.putText(test_frame, "Mirage AI Avatar", (50, 240),
|
| 293 |
+
cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 3)
|
| 294 |
+
|
| 295 |
+
for i in range(100):
|
| 296 |
+
# Update test pattern
|
| 297 |
+
frame = test_frame.copy()
|
| 298 |
+
cv2.putText(frame, f"Frame {i}", (50, 400),
|
| 299 |
+
cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
|
| 300 |
+
|
| 301 |
+
camera.update_frame(frame)
|
| 302 |
+
time.sleep(0.1)
|
| 303 |
+
|
| 304 |
+
camera.stop()
|
| 305 |
+
else:
|
| 306 |
+
print("Failed to start virtual camera")
|