Spaces:
Running
Running
File size: 7,384 Bytes
1c988c1 f64b107 1c988c1 06523e9 9438bb6 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 2f49513 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 2f49513 53ea588 2f49513 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 2f49513 06523e9 53ea588 2f49513 9312c3a 2f49513 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
---
title: Voice Agent WebRTC + LangGraph
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
suggested_hardware: t4-small
short_description: Voice agent with LangGraph, WebRTC, ASR & TTS
---
# Voice Agent WebRTC + LangGraph (Quick Start)
This repository includes a complete voice agent stack:
- LangGraph dev server for local agents
- Pipecat-based speech pipeline (WebRTC, ASR, LangGraph LLM adapter, TTS)
- Static UI you can open in a browser
Primary example: `examples/voice_agent_webrtc_langgraph/`
## 1) Mandatory environment variables
Create `.env` in `examples/voice_agent_webrtc_langgraph/` (copy from `env.example`) and set at least:
- `RIVA_API_KEY` or `NVIDIA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
- `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
- `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
- `USER_EMAIL` (e.g. `test@example.com`)
- `LANGGRAPH_STREAM_MODE` (default `values`)
- `LANGGRAPH_DEBUG_STREAM` (default `true`)
Optional but useful:
- `RIVA_ASR_LANGUAGE` (default `en-US`)
- `RIVA_TTS_LANGUAGE` (default `en-US`)
- `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
- `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
- `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot with a custom audio prompt
- `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
- `ENABLE_SPECULATIVE_SPEECH` (default `true`)
- `LANGGRAPH_AUTH_TOKEN` (or `AUTH0_ACCESS_TOKEN`/`AUTH_BEARER_TOKEN`) if your LangGraph server requires auth
- TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`
## 2) What it does
- Starts LangGraph dev server serving agents from `examples/voice_agent_webrtc_langgraph/agents/`.
- Starts the Pipecat pipeline (`pipeline.py`) exposing:
- HTTP: `http://<host>:7860` (health, RTC config)
- WebSocket: `ws://<host>:7860/ws` (audio + transcripts)
- Static UI: `http://<host>:7860/` (served by FastAPI)
Defaults:
- ASR: NVIDIA Riva (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_ASR_FUNCTION_ID`
- LLM: LangGraph adapter, streaming from the selected assistant
- TTS: NVIDIA Riva Magpie (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_TTS_FUNCTION_ID`
## 3) Run
### Option A: Docker (recommended)
From `examples/voice_agent_webrtc_langgraph/`:
```bash
docker compose up --build -d
```
Then open `http://<machine-ip>:7860/`.
Chrome on http origins: enable "Insecure origins treated as secure" at `chrome://flags/` and add `http://<machine-ip>:7860`.
#### Building for Different Examples
The Dockerfile in the repository root is generalized to work with any example. Use the `EXAMPLE_NAME` build argument to specify which example to use:
**For voice_agent_webrtc_langgraph (default):**
```bash
docker build --build-arg EXAMPLE_NAME=voice_agent_webrtc_langgraph -t my-voice-agent .
docker run -p 7860:7860 --env-file examples/voice_agent_webrtc_langgraph/.env my-voice-agent
```
**For voice_agent_multi_thread:**
```bash
docker build --build-arg EXAMPLE_NAME=voice_agent_multi_thread -t my-voice-agent .
docker run -p 7860:7860 --env-file examples/voice_agent_multi_thread/.env my-voice-agent
```
The Dockerfile will automatically:
- Build the UI for the specified example
- Copy only the files for that example
- Set up the correct working directory
- Configure the start script to run the correct example
**Note:** The UI is served on the same port as the API (7860). The FastAPI app serves both the WebSocket/HTTP endpoints and the static UI files.
### Option B: Python (local)
Requires Python 3.12 and `uv`.
```bash
cd examples/voice_agent_webrtc_langgraph
uv run pipeline.py
```
Then start the UI from `ui/` (see `examples/voice_agent_webrtc_langgraph/ui/README.md`).
## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
The default TTS in `examples/voice_agent_webrtc_langgraph/pipeline.py` is NVIDIA Riva Magpie via NIM:
```python
from nvidia_pipecat.services.riva_speech import RivaTTSService
tts = RivaTTSService(
api_key=os.getenv("RIVA_API_KEY"),
function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
zero_shot_audio_prompt_file=(
Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
),
)
```
To use ElevenLabs instead:
1) Ensure ElevenLabs support is available (included via project deps).
2) Set environment:
- `ELEVENLABS_API_KEY`
- Optionally `ELEVENLABS_VOICE_ID` and any model-specific settings
3) Edit `examples/voice_agent_webrtc_langgraph/pipeline.py` to import and construct ElevenLabs TTS:
```python
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
# Replace the RivaTTSService(...) block with:
tts = ElevenLabsTTSServiceWithEndOfSpeech(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
sample_rate=16000,
channels=1,
)
```
No other pipeline changes are required; transcript synchronization supports ElevenLabs end‑of‑speech events.
Notes for Magpie Zero‑shot:
- Set `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
- If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`, or set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download on startup.
## 5) Troubleshooting
- Healthcheck: `curl -f http://localhost:7860/get_prompt`
- If the UI can't access the mic on http, use the Chrome flag above or host the UI via HTTPS.
- For NAT/firewall issues, configure TURN or provide Twilio credentials.
## 6) Multi-threaded Voice Agent (voice_agent_multi_thread)
The `voice_agent_multi_thread` example includes a non-blocking multi-threaded agent implementation that allows users to continue conversing while long-running operations execute in the background.
### Build the Docker image:
```bash
docker build --build-arg EXAMPLE_NAME=voice_agent_multi_thread -t voice-agent-multi-thread .
```
### Run the container:
```bash
docker run -d --name voice-agent-multi-thread \
-p 2024:2024 \
-p 7862:7860 \
--env-file examples/voice_agent_multi_thread/.env \
voice-agent-multi-thread
```
Then access:
- **LangGraph API**: `http://localhost:2024`
- **Web UI**: `http://localhost:7862`
- **Pipeline WebSocket**: `ws://localhost:7862/ws`
The multi-threaded agent automatically enables for `telco-agent` and `wire-transfer-agent`, allowing the secondary thread to handle status checks and interim conversations while the main thread processes long-running tools.
### Stop and remove the container:
```bash
docker stop voice-agent-multi-thread && docker rm voice-agent-multi-thread
```
## 7) Manual Docker Commands (voice_agent_webrtc_langgraph)
If you prefer manual Docker commands instead of docker-compose:
```bash
docker build -t ace-voice-webrtc:latest \
-f examples/voice_agent_webrtc_langgraph/Dockerfile \
.
docker run --name ace-voice-webrtc -d \
-p 7860:7860 \
-p 2024:2024 \
--env-file examples/voice_agent_webrtc_langgraph/.env \
-e LANGGRAPH_ASSISTANT=healthcare-agent \
ace-voice-webrtc:latest
``` |