Spaces:

Islamckennon
/

mirage

Paused

App Files Files Community

mirage / README.md

MacBook pro

docs: document MIRAGE_PROC_MAX_DIM & MIRAGE_DEBUG_OVERLAY; add swap debug vars; capture active providers

b96043e about 2 months ago

preview code

raw

history blame contribute delete

18.8 kB

metadata

title: Mirage Real-time AI Avatar
emoji: 🎭
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
hardware: a10g-large
python_version: '3.10'
tags:
  - real-time
  - ai-avatar
  - face-swap
  - voice-conversion
  - virtual-camera
short_description: Real-time AI avatar with face swap + voice conversion

🎭 Mirage: Real-time AI Avatar System

Mirage performs real-time identity-preserving face swap plus optional facial enhancement and (stub) voice conversion, streaming back a virtual camera + microphone feed with sub‑250ms target latency. Designed for live calls, streaming overlays, and privacy where you want a consistent alternate appearance.

🚀 Features

Real-time Face Swap (InSwapper): Identity transfer from a single reference image to your live video.
Enhancement (Optional): CodeFormer restoration (fidelity‑controllable) if weights present.
Low Latency WebRTC: Bi-directional streaming via aiortc (camera + mic) with adaptive frame scaling.
Voice Conversion Stub: Pluggable path ready for RVC / HuBERT integration (currently pass-through by default).
Virtual Camera: Output suitable for Zoom, Meet, Discord, OBS (via local virtual camera module).
Model Auto-Provisioning: Deterministic downloader for required swap + enhancer weights.
Metrics & Health: JSON endpoints for latency, FPS, GPU memory, and pipeline stats.

🎯 Use Cases

Video Conferencing Privacy: Appear as a consistent alternate identity.
Streaming / VTubing: Lightweight swap + enhancement pipeline for overlays.
A/B Creative Experiments: Rapid prototyping of face identity transforms.
Data Minimization: Keep original face private while communicating.

🛠️ Technology Stack

Face Detection & Embedding: InsightFace buffalo_l (SCRFD + embedding).
Face Swap Core: inswapper_128_fp16.onnx (InSwapper) via InsightFace model zoo.
Enhancer (optional): CodeFormer 0.1 (fidelity controllable).
Backend: FastAPI + aiortc (WebRTC) + asyncio.
Metrics: Custom endpoints (/metrics, /gpu) with rolling latency/FPS stats.
Downloader: Atomic, lock-protected model fetcher (model_downloader.py).
Frontend: Minimal WebRTC client (static/).

📊 Performance Targets

Processing Window: <50ms typical swap @ 512px (A10G) w/ single face.
End-to-end Latency Goal: <250ms (capture → swap → enhancement → return).
Adaptive Scale: Frames >512px longest side are downscaled before inference.
Enhancement Overhead: CodeFormer ~18–35ms (A10G, single face, 512px) – approximate; adjust fidelity to trade quality vs latency.

🚀 Quick Start (Hugging Face Space)

Open the Space UI and allow camera/microphone.
Click Initialize – triggers model download (if not already cached) & pipeline load.
Upload a clear, front-facing reference image (only largest face is used).
Start streaming – swapped frames appear in the preview.
(Optional) Provide CodeFormer weights (models/codeformer/codeformer.pth) for enhancement.
Use the virtual camera integration locally (if running self-hosted) to broadcast swapped output to Zoom/OBS.

🔧 Technical Details

Latency Optimization

Adaptive quality control based on processing time
Frame buffering with overflow protection
GPU memory management and cleanup
Audio-video synchronization within 150ms

Model Flow

Capture frame → optional downscale to <=512 max side
InsightFace detector+embedding obtains face bboxes + identity vectors
InSwapper ONNX performs identity replacement using source embedding
Optional CodeFormer enhancer refines facial region
Frame returned to WebRTC outbound track

Real-time Features

WebRTC (aiortc) low-latency transport.
Asynchronous frame processing (background tasks) to avoid blocking capture.
Adaptive pre-inference downscale heuristic (cap largest dimension to 512).
Metrics-driven latency tracking for dynamic future pacing.

📱 Virtual Camera Integration

The system creates a virtual camera device that can be used in:

Video Conferencing: Zoom, Google Meet, Microsoft Teams, Discord
Streaming Software: OBS Studio, Streamlabs, XSplit
Social Media: WhatsApp Desktop, Skype, Facebook Messenger
Gaming: Steam, Discord voice channels

⚡ Metrics & Observability

Key endpoints (base URL: running server root):

Endpoint	Description
`/metrics`	Core video/audio latency & FPS stats
`/gpu`	GPU presence + memory usage (torch / nvidia-smi)
`/webrtc/ping`	WebRTC router availability & TURN status
`/pipeline_status` (if implemented)	High-level pipeline readiness

Pipeline stats (subset) from swap pipeline:

{
    "frames": 240,
    "avg_latency_ms": 42.7,
    "swap_faces_last": 1,
    "enhanced_frames": 180,
    "enhancer": "codeformer",
    "codeformer_fidelity": 0.75,
    "codeformer_loaded": true
}

🔒 Privacy & Security

No reference image persisted to disk (processed in-memory).
Only model weights are cached; media frames are transient.
Optional API key enforcement via MIRAGE_API_KEY + MIRAGE_REQUIRE_API_KEY=1.

🔧 Environment Variables (Face Swap & Enhancers)

Variable	Purpose	Default
`MIRAGE_DOWNLOAD_MODELS`	Auto download required models on startup	`1`
`MIRAGE_INSWAPPER_URL`	Override InSwapper ONNX URL	internal default
`MIRAGE_CODEFORMER_URL`	Override CodeFormer weight URL	0.1 release
`MIRAGE_CODEFORMER_FIDELITY`	0.0=more detail recovery, 1.0=preserve input	`0.75`
`MIRAGE_MAX_FACES`	Swap up to N largest faces per frame	`1`
`MIRAGE_CUDA_ONLY`	Restrict ONNX to CUDA EP + CPU fallback	unset
`MIRAGE_API_KEY`	Shared secret for control / TURN token	unset
`MIRAGE_REQUIRE_API_KEY`	Enforce API key if set	`0`
`MIRAGE_TOKEN_TTL`	Signed token lifetime (seconds)	`300`
`MIRAGE_STUN_URLS`	Comma list of STUN servers	Google defaults
`MIRAGE_TURN_URL`	TURN URI(s) (comma separated)	unset
`MIRAGE_TURN_USER`	TURN username	unset
`MIRAGE_TURN_PASS`	TURN credential	unset
`MIRAGE_FORCE_RELAY`	Force relay-only traffic	`0`
`MIRAGE_TURN_TLS_ONLY`	Filter TURN to TLS/TCP	`1`
`MIRAGE_PREFER_H264`	Prefer H264 codec in SDP munging	`0`
`MIRAGE_VOICE_ENABLE`	Enable voice processor stub	`0`
`MIRAGE_PERSIST_MODELS`	Persist models in `/data/mirage_models` via symlink `/app/models`	`1`
`MIRAGE_PROVISION_FRESH`	Force re-download of required models (ignores sentinel)	`0`
`MIRAGE_PROC_MAX_DIM`	Max dimension (longest side) for processing downscale	`512`
`MIRAGE_DEBUG_OVERLAY`	Draw green bbox + SWAP label on swapped faces	`0`
`MIRAGE_SWAP_DEBUG`	Verbose per-frame swap decision logging	`0`

CodeFormer fidelity example:

MIRAGE_CODEFORMER_FIDELITY=0.6

Processing Resolution & Visual Debug Overlay

Two new controls help you verify that swapping is occurring and tune visual quality vs latency:

Control	Effect	Guidance
`MIRAGE_PROC_MAX_DIM`	Caps the longest side of a frame before inference. Frames larger than this are downscaled for detection/swap, then returned at original size.	Raise (e.g. 640, 720) for crisper facial detail if GPU headroom allows; lower (384–512) to reduce latency on weaker GPUs. Minimum enforced is 64.
`MIRAGE_DEBUG_OVERLAY`	When enabled (`1`), draws a green rectangle and the text `SWAP` over each face region that was swapped in the most recent frame.	Use temporarily to confirm active swapping; disable for production to avoid visual artifacts.

Example (higher detail + overlay for confirmation):

MIRAGE_PROC_MAX_DIM=640
MIRAGE_DEBUG_OVERLAY=1

If you still perceive “no change” while counters show swaps:

Ensure your reference image is a clear, well-lit, frontal face (avoid extreme angles / occlusions).
Increase MIRAGE_PROC_MAX_DIM to 640 or 720 for sharper results.
Temporarily enable MIRAGE_DEBUG_OVERLAY=1 to visualize the swapped region.
Check /debug/pipeline for total_faces_swapped and swap_faces_last > 0.

📋 Requirements

GPU: NVIDIA (Ampere+ recommended). CPU-only will be extremely slow.
VRAM: ~3–4GB baseline (swap + detector) + optional enhancer overhead.
RAM: 8GB+ (12–16GB recommended for multitasking).
Browser: Chromium-based / Firefox with WebRTC.
Reference Image: Clear, frontal, good lighting, minimal occlusions.

🛠️ Development / Running Locally

Download models & start server:

python model_downloader.py  # or set MIRAGE_DOWNLOAD_MODELS=1 and let startup handle
uvicorn app:app --port 7860 --host 0.0.0.0

Open the browser client at http://localhost:7860.

Set a reference image via UI (Base64 upload path) then begin WebRTC session. Inspect /metrics for swap latency and webrtc/debug_state for connection internals.

📄 License

MIT License - Feel free to use and modify for your projects!

🙏 Acknowledgments

InsightFace (detection + swap)
CodeFormer (fidelity-controllable enhancement)
Hugging Face (inference infra)

Metrics Endpoints (Current Subset)

GET /metrics
GET /gpu
GET /webrtc/ping
GET /webrtc/debug_state
(Legacy endpoints referenced in SPEC may be pruned in future refactors.)

Voice Stub Activation

Set MIRAGE_VOICE_ENABLE=1 to route audio through the placeholder voice processor. Current behavior is pass‑through while preserving structural hooks for future RVC model integration.

Future Parameterization

Frontend will fetch a /config endpoint to align chunk_ms and video_max_fps dynamically.
Adaptation layer will adjust chunk size and video quality based on runtime ratios.

Accessing Endpoints on Hugging Face Spaces

When viewing the Space at https://huggingface.co/spaces/Islamckennon/mirage you are on the Hub UI (repository page). API paths appended there (e.g. /metrics, /gpu) will 404 because that domain serves repo metadata, not your running container.

Your running app is exposed on a separate subdomain:

https://islamckennon-mirage.hf.space

(Pattern: https://<username>-<space_name>.hf.space)

So the full endpoint URLs are, for example:

https://islamckennon-mirage.hf.space/metrics
https://islamckennon-mirage.hf.space/gpu

If the Space is private you must be logged into Hugging Face in the browser for these to load.

Troubleshooting "Restarting" Status

If the Space shows a perpetual "Restarting" badge:

Open the Logs panel and switch to the Container tab (not just Build) to see runtime exceptions.
Look for the [startup] { ... } line. If absent, the app may be crashing before FastAPI starts (syntax error, missing dependency, etc.).
Ensure the container listens on port 7860 (this repo's Dockerfile already does). The startup log now prints the port value it detected.
GPU provisioning can briefly cycle while allocating hardware; give it a minute after the first restart. If it loops >5 times, inspect for CUDA driver errors or torch import failures.
Test locally with uvicorn app:app --port 7860 to rule out code issues.
Use curl -s https://islamckennon-mirage.hf.space/health (if public) to verify liveness.

If problems persist, capture the Container log stack trace and open an issue.

Model Auto-Download

model_downloader.py manages required weights with atomic file locks. It supports overriding sources via env variables and gracefully continues if optional enhancers fail to download.

Persistent Storage Strategy (Hugging Face Spaces)

By default (MIRAGE_PERSIST_MODELS=1), the container will:

Create (if missing) a persistent directory: /data/mirage_models.
Migrate any existing files from an earlier ephemeral /app/models (first run only, if the persistent dir is empty).
Symlink /app/models -> /data/mirage_models.
Run integrity checks each startup: if the sentinel .provisioned exists but any required model (currently inswapper/inswapper_128_fp16.onnx) is missing, the downloader is re-invoked automatically.

Disable persistence with:

MIRAGE_PERSIST_MODELS=0

This forces models to re-download on each cold start (not recommended for production latency / rate-limits).

Sentinel files:

.provisioned – marker indicating a successful prior provisioning.
.provisioned_meta.json – sizes and metadata of required models at provisioning time (informational).

If you set MIRAGE_PROVISION_FRESH=1, the sentinel is removed and a full re-download is attempted (useful when updating model versions or clearing partial/corrupt files).

Troubleshooting missing models:

Call /debug/models – it now reports symlink status, sentinel presence, and sizes.
If inswapper is missing but sentinel present, integrity logic should already trigger re-provision. If not, trigger manually with MIRAGE_PROVISION_FRESH=1.

Persistence Strategy (Hugging Face Spaces)

By default (MIRAGE_PERSIST_MODELS=1) the container will:

Create a persistent directory at /data/mirage_models.
Symlink /app/models -> /data/mirage_models so model downloads survive restarts.
Maintain a sentinel file .provisioned plus a meta JSON summarizing required model sizes.
On startup, if the sentinel exists but required files are missing (stale / manual deletion), it forces a re-download.

Disable persistence (always ephemeral) with:

MIRAGE_PERSIST_MODELS=0

You can also force a fresh provisioning ignoring the sentinel:

MIRAGE_PROVISION_FRESH=1

Debug current model status:

curl -s https://<space-subdomain>.hf.space/debug/models | jq

Example response:

{
    "inswapper": {"exists": true, "size": 87916544},
    "codeformer": {"exists": true, "size": 178140560},
    "sentinel": {"exists": true, "meta_exists": true},
    "storage": {
        "root_is_symlink": true,
        "root_path": "/app/models",
        "target": "/data/mirage_models",
        "persist_mode_env": "1"
    },
    "pipeline_initialized": false
}

Endpoints Recap

See Metrics Endpoints section above. Typical usage examples:

curl -s http://localhost:7860/metrics/async | jq
curl -s http://localhost:7860/metrics/pacing | jq '.latency_ema_ms, .pacing_hint'
curl -s http://localhost:7860/metrics/motion | jq '.recent_motion[-5:]'

Pacing Hint Logic

pacing_hint is derived from a latency exponential moving average vs target frame time:

~1.0: Balanced.
<0.85: System overloaded – consider lowering capture FPS or resolution.
1.15: Headroom available – you may increase FPS modestly.

Motion Magnitude

Aggregated from per-frame keypoint motion vectors; higher values trigger more frequent face detection to avoid drift. Low motion stretches automatically reduce detection frequency to save compute.

Enhancer Fidelity (CodeFormer)

Fidelity weight (w):

Lower (e.g. 0.3–0.5): More aggressive restoration, may alter identity details.
Higher (0.7–0.9): Preserve more original swapped structure, less smoothing. Tune with MIRAGE_CODEFORMER_FIDELITY.

Latency Histogram Snapshots

/metrics/stage_histogram exposes periodic snapshots (e.g. every N frames) of stage latency distribution to help identify tail regressions. Use to tune pacing thresholds or decide on model quantization.

Security Notes

If exposing publicly:

Set MIRAGE_API_KEY and MIRAGE_REQUIRE_API_KEY=1.
Serve behind TLS (reverse proxy like Caddy / Nginx for certificate management).
Optionally restrict TURN server usage or enforce relay only for stricter NAT traversal control.

Planned Voice Pipeline (Future)

Placeholder directories exist for future real-time voice conversion integration.

models/
    hubert/            # HuBERT feature extractor checkpoint(s)
    rmvpe/             # RMVPE pitch extraction weights
    rvc/               # RVC (voice conversion) model checkpoints

Expected File Names & Relative Paths

You can adapt names, but these canonical filenames will be referenced in future code examples:

Component	Recommended Source	Save As (relative path)
HuBERT Base	`facebook/hubert-base-ls960` (Torch .pt) or official fairseq release	`models/hubert/hubert_base.pt`
RMVPE Weights	Community RMVPE release (pitch extraction)	`models/rmvpe/rmvpe.pt`
RVC Model Checkpoint	Your trained / downloaded RVC model	`models/rvc/model.pth`

Optional additional assets (not yet required):

Type	Path Example
Speaker embedding(s)	`models/rvc/spk_embeds.npy`
Index file (faiss)	`models/rvc/features.index`

Manual Download (Lightweight Instructions)

Because licenses vary and some distributions require acceptance, we do not auto-download by default. Manually fetch the files you are licensed to use:

# HuBERT (example using torch hub or direct URL)
curl -L -o models/hubert/hubert_base.pt \
    https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt

# RMVPE (replace URL with the official/community mirror you trust)
curl -L -o models/rmvpe/rmvpe.pt \
    https://example.com/path/to/rmvpe.pt

# RVC model (place your trained checkpoint)
cp /path/to/your_rvc_model.pth models/rvc/model.pth

All of these binary patterns are ignored by git via .gitignore (we only keep .gitkeep & documentation). Verify after download:

ls -lh models/hubert models/rmvpe models/rvc

Optional Convenience Script

You can create scripts/download_models.sh (not yet included) with the above curl commands; keep URLs commented if redistribution is unclear. Example skeleton:

#!/usr/bin/env bash
set -euo pipefail
mkdir -p models/hubert models/rmvpe models/rvc
echo "(Add real URLs you are licensed to download)"
# curl -L -o models/hubert/hubert_base.pt <URL>
# curl -L -o models/rmvpe/rmvpe.pt <URL>

Integrity / Size Hints (Approximate)

File	Typical Size
hubert_base.pt	~360 MB
rmvpe.pt	~90–150 MB (varies)
model.pth (RVC)	50–200+ MB

Ensure your Space has enough disk (HF GPU Spaces usually allow several GB, but keep total under limits).

License Notes

Review and comply with each model's license (Fairseq / Facebook AI for HuBERT, RMVPE authors, your own RVC training data constraints). Do not commit weights.

Future code will detect presence and log which components are available at startup.

License

MIT