Spaces:
Paused
title: Mirage Real-time AI Avatar
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
hardware: a10g-large
python_version: '3.10'
tags:
- real-time
- ai-avatar
- face-swap
- voice-conversion
- virtual-camera
short_description: Real-time AI avatar with face swap + voice conversion
π Mirage: Real-time AI Avatar System
Mirage performs real-time identity-preserving face swap plus optional facial enhancement and (stub) voice conversion, streaming back a virtual camera + microphone feed with subβ250ms target latency. Designed for live calls, streaming overlays, and privacy where you want a consistent alternate appearance.
π Features
- Real-time Face Swap (InSwapper): Identity transfer from a single reference image to your live video.
- Enhancement (Optional): CodeFormer restoration (fidelityβcontrollable) if weights present.
- Low Latency WebRTC: Bi-directional streaming via aiortc (camera + mic) with adaptive frame scaling.
- Voice Conversion Stub: Pluggable path ready for RVC / HuBERT integration (currently pass-through by default).
- Virtual Camera: Output suitable for Zoom, Meet, Discord, OBS (via local virtual camera module).
- Model Auto-Provisioning: Deterministic downloader for required swap + enhancer weights.
- Metrics & Health: JSON endpoints for latency, FPS, GPU memory, and pipeline stats.
π― Use Cases
- Video Conferencing Privacy: Appear as a consistent alternate identity.
- Streaming / VTubing: Lightweight swap + enhancement pipeline for overlays.
- A/B Creative Experiments: Rapid prototyping of face identity transforms.
- Data Minimization: Keep original face private while communicating.
π οΈ Technology Stack
- Face Detection & Embedding: InsightFace
buffalo_l(SCRFD + embedding). - Face Swap Core:
inswapper_128_fp16.onnx(InSwapper) via InsightFace model zoo. - Enhancer (optional): CodeFormer 0.1 (fidelity controllable).
- Backend: FastAPI + aiortc (WebRTC) + asyncio.
- Metrics: Custom endpoints (
/metrics,/gpu) with rolling latency/FPS stats. - Downloader: Atomic, lock-protected model fetcher (
model_downloader.py). - Frontend: Minimal WebRTC client (
static/).
π Performance Targets
- Processing Window: <50ms typical swap @ 512px (A10G) w/ single face.
- End-to-end Latency Goal: <250ms (capture β swap β enhancement β return).
- Adaptive Scale: Frames >512px longest side are downscaled before inference.
- Enhancement Overhead: CodeFormer ~18β35ms (A10G, single face, 512px) β approximate; adjust fidelity to trade quality vs latency.
π Quick Start (Hugging Face Space)
- Open the Space UI and allow camera/microphone.
- Click Initialize β triggers model download (if not already cached) & pipeline load.
- Upload a clear, front-facing reference image (only largest face is used).
- Start streaming β swapped frames appear in the preview.
- (Optional) Provide CodeFormer weights (
models/codeformer/codeformer.pth) for enhancement. - Use the virtual camera integration locally (if running self-hosted) to broadcast swapped output to Zoom/OBS.
π§ Technical Details
Latency Optimization
- Adaptive quality control based on processing time
- Frame buffering with overflow protection
- GPU memory management and cleanup
- Audio-video synchronization within 150ms
Model Flow
- Capture frame β optional downscale to <=512 max side
- InsightFace detector+embedding obtains face bboxes + identity vectors
- InSwapper ONNX performs identity replacement using source embedding
- Optional CodeFormer enhancer refines facial region
- Frame returned to WebRTC outbound track
Real-time Features
- WebRTC (aiortc) low-latency transport.
- Asynchronous frame processing (background tasks) to avoid blocking capture.
- Adaptive pre-inference downscale heuristic (cap largest dimension to 512).
- Metrics-driven latency tracking for dynamic future pacing.
π± Virtual Camera Integration
The system creates a virtual camera device that can be used in:
- Video Conferencing: Zoom, Google Meet, Microsoft Teams, Discord
- Streaming Software: OBS Studio, Streamlabs, XSplit
- Social Media: WhatsApp Desktop, Skype, Facebook Messenger
- Gaming: Steam, Discord voice channels
β‘ Metrics & Observability
Key endpoints (base URL: running server root):
| Endpoint | Description |
|---|---|
/metrics |
Core video/audio latency & FPS stats |
/gpu |
GPU presence + memory usage (torch / nvidia-smi) |
/webrtc/ping |
WebRTC router availability & TURN status |
/pipeline_status (if implemented) |
High-level pipeline readiness |
Pipeline stats (subset) from swap pipeline:
{
"frames": 240,
"avg_latency_ms": 42.7,
"swap_faces_last": 1,
"enhanced_frames": 180,
"enhancer": "codeformer",
"codeformer_fidelity": 0.75,
"codeformer_loaded": true
}
π Privacy & Security
- No reference image persisted to disk (processed in-memory).
- Only model weights are cached; media frames are transient.
- Optional API key enforcement via
MIRAGE_API_KEY+MIRAGE_REQUIRE_API_KEY=1.
π§ Environment Variables (Face Swap & Enhancers)
| Variable | Purpose | Default |
|---|---|---|
MIRAGE_DOWNLOAD_MODELS |
Auto download required models on startup | 1 |
MIRAGE_INSWAPPER_URL |
Override InSwapper ONNX URL | internal default |
MIRAGE_CODEFORMER_URL |
Override CodeFormer weight URL | 0.1 release |
MIRAGE_CODEFORMER_FIDELITY |
0.0=more detail recovery, 1.0=preserve input | 0.75 |
MIRAGE_MAX_FACES |
Swap up to N largest faces per frame | 1 |
MIRAGE_CUDA_ONLY |
Restrict ONNX to CUDA EP + CPU fallback | unset |
MIRAGE_API_KEY |
Shared secret for control / TURN token | unset |
MIRAGE_REQUIRE_API_KEY |
Enforce API key if set | 0 |
MIRAGE_TOKEN_TTL |
Signed token lifetime (seconds) | 300 |
MIRAGE_STUN_URLS |
Comma list of STUN servers | Google defaults |
MIRAGE_TURN_URL |
TURN URI(s) (comma separated) | unset |
MIRAGE_TURN_USER |
TURN username | unset |
MIRAGE_TURN_PASS |
TURN credential | unset |
MIRAGE_FORCE_RELAY |
Force relay-only traffic | 0 |
MIRAGE_TURN_TLS_ONLY |
Filter TURN to TLS/TCP | 1 |
MIRAGE_PREFER_H264 |
Prefer H264 codec in SDP munging | 0 |
MIRAGE_VOICE_ENABLE |
Enable voice processor stub | 0 |
MIRAGE_PERSIST_MODELS |
Persist models in /data/mirage_models via symlink /app/models |
1 |
MIRAGE_PROVISION_FRESH |
Force re-download of required models (ignores sentinel) | 0 |
MIRAGE_PROC_MAX_DIM |
Max dimension (longest side) for processing downscale | 512 |
MIRAGE_DEBUG_OVERLAY |
Draw green bbox + SWAP label on swapped faces | 0 |
MIRAGE_SWAP_DEBUG |
Verbose per-frame swap decision logging | 0 |
CodeFormer fidelity example:
MIRAGE_CODEFORMER_FIDELITY=0.6
Processing Resolution & Visual Debug Overlay
Two new controls help you verify that swapping is occurring and tune visual quality vs latency:
| Control | Effect | Guidance |
|---|---|---|
MIRAGE_PROC_MAX_DIM |
Caps the longest side of a frame before inference. Frames larger than this are downscaled for detection/swap, then returned at original size. | Raise (e.g. 640, 720) for crisper facial detail if GPU headroom allows; lower (384β512) to reduce latency on weaker GPUs. Minimum enforced is 64. |
MIRAGE_DEBUG_OVERLAY |
When enabled (1), draws a green rectangle and the text SWAP over each face region that was swapped in the most recent frame. |
Use temporarily to confirm active swapping; disable for production to avoid visual artifacts. |
Example (higher detail + overlay for confirmation):
MIRAGE_PROC_MAX_DIM=640
MIRAGE_DEBUG_OVERLAY=1
If you still perceive βno changeβ while counters show swaps:
- Ensure your reference image is a clear, well-lit, frontal face (avoid extreme angles / occlusions).
- Increase
MIRAGE_PROC_MAX_DIMto 640 or 720 for sharper results. - Temporarily enable
MIRAGE_DEBUG_OVERLAY=1to visualize the swapped region. - Check
/debug/pipelinefortotal_faces_swappedandswap_faces_last> 0.
π Requirements
- GPU: NVIDIA (Ampere+ recommended). CPU-only will be extremely slow.
- VRAM: ~3β4GB baseline (swap + detector) + optional enhancer overhead.
- RAM: 8GB+ (12β16GB recommended for multitasking).
- Browser: Chromium-based / Firefox with WebRTC.
- Reference Image: Clear, frontal, good lighting, minimal occlusions.
π οΈ Development / Running Locally
Download models & start server:
python model_downloader.py # or set MIRAGE_DOWNLOAD_MODELS=1 and let startup handle
uvicorn app:app --port 7860 --host 0.0.0.0
Open the browser client at http://localhost:7860.
Set a reference image via UI (Base64 upload path) then begin WebRTC session. Inspect /metrics for swap latency and webrtc/debug_state for connection internals.
π License
MIT License - Feel free to use and modify for your projects!
π Acknowledgments
- InsightFace (detection + swap)
- CodeFormer (fidelity-controllable enhancement)
- Hugging Face (inference infra)
Metrics Endpoints (Current Subset)
GET /metricsGET /gpuGET /webrtc/pingGET /webrtc/debug_state- (Legacy endpoints referenced in SPEC may be pruned in future refactors.)
Voice Stub Activation
Set MIRAGE_VOICE_ENABLE=1 to route audio through the placeholder voice processor. Current behavior is passβthrough while preserving structural hooks for future RVC model integration.
Future Parameterization
- Frontend will fetch a
/configendpoint to alignchunk_msandvideo_max_fpsdynamically. - Adaptation layer will adjust chunk size and video quality based on runtime ratios.
Accessing Endpoints on Hugging Face Spaces
When viewing the Space at https://huggingface.co/spaces/Islamckennon/mirage you are on the Hub UI (repository page). API paths appended there (e.g. /metrics, /gpu) will 404 because that domain serves repo metadata, not your running container.
Your running app is exposed on a separate subdomain:
https://islamckennon-mirage.hf.space
(Pattern: https://<username>-<space_name>.hf.space)
So the full endpoint URLs are, for example:
https://islamckennon-mirage.hf.space/metrics
https://islamckennon-mirage.hf.space/gpu
If the Space is private you must be logged into Hugging Face in the browser for these to load.
Troubleshooting "Restarting" Status
If the Space shows a perpetual "Restarting" badge:
- Open the Logs panel and switch to the Container tab (not just Build) to see runtime exceptions.
- Look for the
[startup] { ... }line. If absent, the app may be crashing before FastAPI starts (syntax error, missing dependency, etc.). - Ensure the container listens on port 7860 (this repo's Dockerfile already does). The startup log now prints the
portvalue it detected. - GPU provisioning can briefly cycle while allocating hardware; give it a minute after the first restart. If it loops >5 times, inspect for CUDA driver errors or
torchimport failures. - Test locally with
uvicorn app:app --port 7860to rule out code issues. - Use
curl -s https://islamckennon-mirage.hf.space/health(if public) to verify liveness.
If problems persist, capture the Container log stack trace and open an issue.
Model Auto-Download
model_downloader.py manages required weights with atomic file locks. It supports overriding sources via env variables and gracefully continues if optional enhancers fail to download.
Persistent Storage Strategy (Hugging Face Spaces)
By default (MIRAGE_PERSIST_MODELS=1), the container will:
- Create (if missing) a persistent directory:
/data/mirage_models. - Migrate any existing files from an earlier ephemeral
/app/models(first run only, if the persistent dir is empty). - Symlink
/app/models -> /data/mirage_models. - Run integrity checks each startup: if the sentinel
.provisionedexists but any required model (currentlyinswapper/inswapper_128_fp16.onnx) is missing, the downloader is re-invoked automatically.
Disable persistence with:
MIRAGE_PERSIST_MODELS=0
This forces models to re-download on each cold start (not recommended for production latency / rate-limits).
Sentinel files:
.provisionedβ marker indicating a successful prior provisioning..provisioned_meta.jsonβ sizes and metadata of required models at provisioning time (informational).
If you set MIRAGE_PROVISION_FRESH=1, the sentinel is removed and a full re-download is attempted (useful when updating model versions or clearing partial/corrupt files).
Troubleshooting missing models:
- Call
/debug/modelsβ it now reports symlink status, sentinel presence, and sizes. - If
inswapperis missing but sentinel present, integrity logic should already trigger re-provision. If not, trigger manually withMIRAGE_PROVISION_FRESH=1.
Persistence Strategy (Hugging Face Spaces)
By default (MIRAGE_PERSIST_MODELS=1) the container will:
- Create a persistent directory at
/data/mirage_models. - Symlink
/app/models -> /data/mirage_modelsso model downloads survive restarts. - Maintain a sentinel file
.provisionedplus a meta JSON summarizing required model sizes. - On startup, if the sentinel exists but required files are missing (stale / manual deletion), it forces a re-download.
Disable persistence (always ephemeral) with:
MIRAGE_PERSIST_MODELS=0
You can also force a fresh provisioning ignoring the sentinel:
MIRAGE_PROVISION_FRESH=1
Debug current model status:
curl -s https://<space-subdomain>.hf.space/debug/models | jq
Example response:
{
"inswapper": {"exists": true, "size": 87916544},
"codeformer": {"exists": true, "size": 178140560},
"sentinel": {"exists": true, "meta_exists": true},
"storage": {
"root_is_symlink": true,
"root_path": "/app/models",
"target": "/data/mirage_models",
"persist_mode_env": "1"
},
"pipeline_initialized": false
}
Endpoints Recap
See Metrics Endpoints section above. Typical usage examples:
curl -s http://localhost:7860/metrics/async | jq
curl -s http://localhost:7860/metrics/pacing | jq '.latency_ema_ms, .pacing_hint'
curl -s http://localhost:7860/metrics/motion | jq '.recent_motion[-5:]'
Pacing Hint Logic
pacing_hint is derived from a latency exponential moving average vs target frame time:
- ~1.0: Balanced.
- <0.85: System overloaded β consider lowering capture FPS or resolution.
1.15: Headroom available β you may increase FPS modestly.
Motion Magnitude
Aggregated from per-frame keypoint motion vectors; higher values trigger more frequent face detection to avoid drift. Low motion stretches automatically reduce detection frequency to save compute.
Enhancer Fidelity (CodeFormer)
Fidelity weight (w):
- Lower (e.g. 0.3β0.5): More aggressive restoration, may alter identity details.
- Higher (0.7β0.9): Preserve more original swapped structure, less smoothing.
Tune with
MIRAGE_CODEFORMER_FIDELITY.
Latency Histogram Snapshots
/metrics/stage_histogram exposes periodic snapshots (e.g. every N frames) of stage latency distribution to help identify tail regressions. Use to tune pacing thresholds or decide on model quantization.
Security Notes
If exposing publicly:
- Set
MIRAGE_API_KEYandMIRAGE_REQUIRE_API_KEY=1. - Serve behind TLS (reverse proxy like Caddy / Nginx for certificate management).
- Optionally restrict TURN server usage or enforce relay only for stricter NAT traversal control.
Planned Voice Pipeline (Future)
Placeholder directories exist for future real-time voice conversion integration.
models/
hubert/ # HuBERT feature extractor checkpoint(s)
rmvpe/ # RMVPE pitch extraction weights
rvc/ # RVC (voice conversion) model checkpoints
Expected File Names & Relative Paths
You can adapt names, but these canonical filenames will be referenced in future code examples:
| Component | Recommended Source | Save As (relative path) |
|---|---|---|
| HuBERT Base | facebook/hubert-base-ls960 (Torch .pt) or official fairseq release |
models/hubert/hubert_base.pt |
| RMVPE Weights | Community RMVPE release (pitch extraction) | models/rmvpe/rmvpe.pt |
| RVC Model Checkpoint | Your trained / downloaded RVC model | models/rvc/model.pth |
Optional additional assets (not yet required):
| Type | Path Example |
|---|---|
| Speaker embedding(s) | models/rvc/spk_embeds.npy |
| Index file (faiss) | models/rvc/features.index |
Manual Download (Lightweight Instructions)
Because licenses vary and some distributions require acceptance, we do not auto-download by default. Manually fetch the files you are licensed to use:
# HuBERT (example using torch hub or direct URL)
curl -L -o models/hubert/hubert_base.pt \
https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt
# RMVPE (replace URL with the official/community mirror you trust)
curl -L -o models/rmvpe/rmvpe.pt \
https://example.com/path/to/rmvpe.pt
# RVC model (place your trained checkpoint)
cp /path/to/your_rvc_model.pth models/rvc/model.pth
All of these binary patterns are ignored by git via .gitignore (we only keep .gitkeep & documentation). Verify after download:
ls -lh models/hubert models/rmvpe models/rvc
Optional Convenience Script
You can create scripts/download_models.sh (not yet included) with the above curl commands; keep URLs commented if redistribution is unclear. Example skeleton:
#!/usr/bin/env bash
set -euo pipefail
mkdir -p models/hubert models/rmvpe models/rvc
echo "(Add real URLs you are licensed to download)"
# curl -L -o models/hubert/hubert_base.pt <URL>
# curl -L -o models/rmvpe/rmvpe.pt <URL>
Integrity / Size Hints (Approximate)
| File | Typical Size |
|---|---|
| hubert_base.pt | ~360 MB |
| rmvpe.pt | ~90β150 MB (varies) |
| model.pth (RVC) | 50β200+ MB |
Ensure your Space has enough disk (HF GPU Spaces usually allow several GB, but keep total under limits).
License Notes
Review and comply with each model's license (Fairseq / Facebook AI for HuBERT, RMVPE authors, your own RVC training data constraints). Do not commit weights.
Future code will detect presence and log which components are available at startup.
License
MIT