File size: 7,384 Bytes
1c988c1
 
 
 
 
 
 
 
 
f64b107
1c988c1
 
06523e9
9438bb6
06523e9
 
 
 
53ea588
06523e9
53ea588
 
06523e9
 
53ea588
06523e9
 
 
 
 
 
53ea588
06523e9
 
 
 
 
 
 
 
 
 
53ea588
 
06523e9
 
 
 
 
2f49513
53ea588
06523e9
 
 
 
53ea588
 
06523e9
53ea588
06523e9
 
53ea588
 
06523e9
53ea588
 
2f49513
53ea588
2f49513
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53ea588
06523e9
 
53ea588
 
06523e9
 
53ea588
06523e9
53ea588
 
06523e9
 
53ea588
06523e9
 
53ea588
06523e9
 
 
 
 
 
 
 
 
 
53ea588
 
06523e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53ea588
06523e9
53ea588
06523e9
 
 
53ea588
 
06523e9
 
2f49513
06523e9
53ea588
2f49513
 
 
 
 
 
 
9312c3a
2f49513
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: Voice Agent WebRTC + LangGraph
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
suggested_hardware: t4-small
short_description: Voice agent with LangGraph, WebRTC, ASR & TTS
---

# Voice Agent WebRTC + LangGraph (Quick Start)

This repository includes a complete voice agent stack:
- LangGraph dev server for local agents
- Pipecat-based speech pipeline (WebRTC, ASR, LangGraph LLM adapter, TTS)
- Static UI you can open in a browser

Primary example: `examples/voice_agent_webrtc_langgraph/`


## 1) Mandatory environment variables
Create `.env` in `examples/voice_agent_webrtc_langgraph/` (copy from `env.example`) and set at least:

- `RIVA_API_KEY` or `NVIDIA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
- `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
- `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
- `USER_EMAIL` (e.g. `test@example.com`)
- `LANGGRAPH_STREAM_MODE` (default `values`)
- `LANGGRAPH_DEBUG_STREAM` (default `true`)

Optional but useful:
- `RIVA_ASR_LANGUAGE` (default `en-US`)
- `RIVA_TTS_LANGUAGE` (default `en-US`)
- `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
- `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
- `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot with a custom audio prompt
- `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
- `ENABLE_SPECULATIVE_SPEECH` (default `true`)
- `LANGGRAPH_AUTH_TOKEN` (or `AUTH0_ACCESS_TOKEN`/`AUTH_BEARER_TOKEN`) if your LangGraph server requires auth
- TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`


## 2) What it does
- Starts LangGraph dev server serving agents from `examples/voice_agent_webrtc_langgraph/agents/`.
- Starts the Pipecat pipeline (`pipeline.py`) exposing:
  - HTTP: `http://<host>:7860` (health, RTC config)
  - WebSocket: `ws://<host>:7860/ws` (audio + transcripts)
  - Static UI: `http://<host>:7860/` (served by FastAPI)

Defaults:
- ASR: NVIDIA Riva (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_ASR_FUNCTION_ID`
- LLM: LangGraph adapter, streaming from the selected assistant
- TTS: NVIDIA Riva Magpie (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_TTS_FUNCTION_ID`


## 3) Run

### Option A: Docker (recommended)
From `examples/voice_agent_webrtc_langgraph/`:

```bash
docker compose up --build -d
```

Then open `http://<machine-ip>:7860/`.

Chrome on http origins: enable "Insecure origins treated as secure" at `chrome://flags/` and add `http://<machine-ip>:7860`.

#### Building for Different Examples
The Dockerfile in the repository root is generalized to work with any example. Use the `EXAMPLE_NAME` build argument to specify which example to use:

**For voice_agent_webrtc_langgraph (default):**
```bash
docker build --build-arg EXAMPLE_NAME=voice_agent_webrtc_langgraph -t my-voice-agent .
docker run -p 7860:7860 --env-file examples/voice_agent_webrtc_langgraph/.env my-voice-agent
```

**For voice_agent_multi_thread:**
```bash
docker build --build-arg EXAMPLE_NAME=voice_agent_multi_thread -t my-voice-agent .
docker run -p 7860:7860 --env-file examples/voice_agent_multi_thread/.env my-voice-agent
```

The Dockerfile will automatically:
- Build the UI for the specified example
- Copy only the files for that example
- Set up the correct working directory
- Configure the start script to run the correct example

**Note:** The UI is served on the same port as the API (7860). The FastAPI app serves both the WebSocket/HTTP endpoints and the static UI files.

### Option B: Python (local)
Requires Python 3.12 and `uv`.

```bash
cd examples/voice_agent_webrtc_langgraph
uv run pipeline.py
```
Then start the UI from `ui/` (see `examples/voice_agent_webrtc_langgraph/ui/README.md`).


## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
The default TTS in `examples/voice_agent_webrtc_langgraph/pipeline.py` is NVIDIA Riva Magpie via NIM:

```python
from nvidia_pipecat.services.riva_speech import RivaTTSService

tts = RivaTTSService(
    api_key=os.getenv("RIVA_API_KEY"),
    function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
    voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
    model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
    language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
    zero_shot_audio_prompt_file=(
        Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
    ),
)
```

To use ElevenLabs instead:
1) Ensure ElevenLabs support is available (included via project deps).
2) Set environment:
   - `ELEVENLABS_API_KEY`
   - Optionally `ELEVENLABS_VOICE_ID` and any model-specific settings
3) Edit `examples/voice_agent_webrtc_langgraph/pipeline.py` to import and construct ElevenLabs TTS:

```python
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech

# Replace the RivaTTSService(...) block with:
tts = ElevenLabsTTSServiceWithEndOfSpeech(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
    sample_rate=16000,
    channels=1,
)
```

No other pipeline changes are required; transcript synchronization supports ElevenLabs end‑of‑speech events.

Notes for Magpie Zero‑shot:
- Set `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
- If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`, or set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download on startup.


## 5) Troubleshooting
- Healthcheck: `curl -f http://localhost:7860/get_prompt`
- If the UI can't access the mic on http, use the Chrome flag above or host the UI via HTTPS.
- For NAT/firewall issues, configure TURN or provide Twilio credentials.


## 6) Multi-threaded Voice Agent (voice_agent_multi_thread)

The `voice_agent_multi_thread` example includes a non-blocking multi-threaded agent implementation that allows users to continue conversing while long-running operations execute in the background.

### Build the Docker image:
```bash
docker build --build-arg EXAMPLE_NAME=voice_agent_multi_thread -t voice-agent-multi-thread .
```

### Run the container:
```bash
docker run -d --name voice-agent-multi-thread \
  -p 2024:2024 \
  -p 7862:7860 \
  --env-file examples/voice_agent_multi_thread/.env \
  voice-agent-multi-thread
```

Then access:
- **LangGraph API**: `http://localhost:2024`
- **Web UI**: `http://localhost:7862`
- **Pipeline WebSocket**: `ws://localhost:7862/ws`

The multi-threaded agent automatically enables for `telco-agent` and `wire-transfer-agent`, allowing the secondary thread to handle status checks and interim conversations while the main thread processes long-running tools.

### Stop and remove the container:
```bash
docker stop voice-agent-multi-thread && docker rm voice-agent-multi-thread
```


## 7) Manual Docker Commands (voice_agent_webrtc_langgraph)

If you prefer manual Docker commands instead of docker-compose:

```bash
docker build -t ace-voice-webrtc:latest \
  -f examples/voice_agent_webrtc_langgraph/Dockerfile \
  .

docker run --name ace-voice-webrtc -d \
  -p 7860:7860 \
  -p 2024:2024 \
  --env-file examples/voice_agent_webrtc_langgraph/.env \
  -e LANGGRAPH_ASSISTANT=healthcare-agent \
  ace-voice-webrtc:latest
```