Spaces:
Running
Running
Commit
·
06523e9
1
Parent(s):
e9446cb
Added the readme
Browse files- README.md +85 -60
- examples/voice_agent_webrtc_langgraph/README.md +82 -156
README.md
CHANGED
|
@@ -1,92 +1,117 @@
|
|
| 1 |
-
|
| 2 |
-
title: Ace Controller Pipeline
|
| 3 |
-
emoji: 🐠
|
| 4 |
-
colorFrom: indigo
|
| 5 |
-
colorTo: gray
|
| 6 |
-
sdk: docker
|
| 7 |
-
pinned: false
|
| 8 |
-
short_description: Voice Demos with Ace Controller
|
| 9 |
-
---
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
## Main Features
|
| 16 |
|
| 17 |
-
|
|
|
|
| 18 |
|
| 19 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
The ACE Controller SDK was used to build the [ACE Controller Microservice](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html).Check out the [ACE documentation](https://docs.nvidia.com/ace/tokkio/latest/customization/customization-options.html) for more details on how to configure the ACE Controller MS with your custom pipelines.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
The NVIDIA Pipecat package is released as a wheel on PyPI. Create a Python virtual environment and use the pip command to install the nvidia-pipecat package.
|
| 29 |
|
| 30 |
-
|
| 31 |
-
pip install nvidia-pipecat
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
You can start building pipecat pipelines utilizing services from the NVIDIA Pipecat package. For more details, follow [the ACE Controller](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html) and [the Pipecat Framework](https://docs.pipecat.ai/getting-started/overview) documentation.
|
| 35 |
-
|
| 36 |
-
## Hacking on the framework itself
|
| 37 |
|
| 38 |
-
|
|
|
|
| 39 |
|
| 40 |
-
### Using UV
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
To get started, first install the [UV package manager](https://docs.astral.sh/uv/#highlights).
|
| 44 |
-
|
| 45 |
-
Then, create a virtual environment with all the required dependencies by running the following commands:
|
| 46 |
```bash
|
| 47 |
-
|
| 48 |
-
uv sync
|
| 49 |
-
source .venv/bin/activate
|
| 50 |
```
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
uv run pytest
|
| 59 |
-
```
|
| 60 |
|
| 61 |
-
To format the code, use:
|
| 62 |
```bash
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
To run the linter, use:
|
| 67 |
-
```
|
| 68 |
-
ruff check
|
| 69 |
```
|
|
|
|
| 70 |
|
| 71 |
|
| 72 |
-
|
|
|
|
| 73 |
|
| 74 |
-
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
```
|
| 80 |
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
-
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
|
| 89 |
-
## CONTRIBUTING
|
| 90 |
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
| 92 |
|
|
|
|
| 1 |
+
# Voice Agent WebRTC + LangGraph (Quick Start)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
This repository includes a complete voice agent stack:
|
| 4 |
+
- LangGraph dev server for local agents
|
| 5 |
+
- Pipecat-based speech pipeline (WebRTC, ASR, LangGraph LLM adapter, TTS)
|
| 6 |
+
- Static UI you can open in a browser
|
| 7 |
|
| 8 |
+
Primary example: `examples/voice_agent_webrtc_langgraph/`
|
| 9 |
|
|
|
|
| 10 |
|
| 11 |
+
## 1) Mandatory environment variables
|
| 12 |
+
Create `.env` in `examples/voice_agent_webrtc_langgraph/` (copy from `env.example`) and set at least:
|
| 13 |
|
| 14 |
+
- `RIVA_API_KEY` or `NVIDIA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
|
| 15 |
+
- `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
|
| 16 |
+
- `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
|
| 17 |
+
- `USER_EMAIL` (e.g. `test@example.com`)
|
| 18 |
+
- `LANGGRAPH_STREAM_MODE` (default `values`)
|
| 19 |
+
- `LANGGRAPH_DEBUG_STREAM` (default `true`)
|
| 20 |
|
| 21 |
+
Optional but useful:
|
| 22 |
+
- `RIVA_ASR_LANGUAGE` (default `en-US`)
|
| 23 |
+
- `RIVA_TTS_LANGUAGE` (default `en-US`)
|
| 24 |
+
- `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
|
| 25 |
+
- `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
|
| 26 |
+
- `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot with a custom audio prompt
|
| 27 |
+
- `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
|
| 28 |
+
- `ENABLE_SPECULATIVE_SPEECH` (default `true`)
|
| 29 |
+
- `LANGGRAPH_AUTH_TOKEN` (or `AUTH0_ACCESS_TOKEN`/`AUTH_BEARER_TOKEN`) if your LangGraph server requires auth
|
| 30 |
+
- TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
+
## 2) What it does
|
| 34 |
+
- Starts LangGraph dev server serving agents from `examples/voice_agent_webrtc_langgraph/agents/`.
|
| 35 |
+
- Starts the Pipecat pipeline (`pipeline.py`) exposing:
|
| 36 |
+
- HTTP: `http://<host>:7860` (health, RTC config)
|
| 37 |
+
- WebSocket: `ws://<host>:7860/ws` (audio + transcripts)
|
| 38 |
+
- Serves the built UI at `http://<host>:9000/` (via Docker).
|
| 39 |
|
| 40 |
+
Defaults:
|
| 41 |
+
- ASR: NVIDIA Riva (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_ASR_FUNCTION_ID`
|
| 42 |
+
- LLM: LangGraph adapter, streaming from the selected assistant
|
| 43 |
+
- TTS: NVIDIA Riva Magpie (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_TTS_FUNCTION_ID`
|
| 44 |
|
|
|
|
| 45 |
|
| 46 |
+
## 3) Run
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
### Option A: Docker (recommended)
|
| 49 |
+
From `examples/voice_agent_webrtc_langgraph/`:
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
```bash
|
| 52 |
+
docker compose up --build -d
|
|
|
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
+
Then open `http://<machine-ip>:9000/`.
|
| 56 |
|
| 57 |
+
Chrome on http origins: enable “Insecure origins treated as secure” at `chrome://flags/` and add `http://<machine-ip>:9000`.
|
| 58 |
|
| 59 |
+
### Option B: Python (local)
|
| 60 |
+
Requires Python 3.12 and `uv`.
|
|
|
|
|
|
|
| 61 |
|
|
|
|
| 62 |
```bash
|
| 63 |
+
cd examples/voice_agent_webrtc_langgraph
|
| 64 |
+
uv run pipeline.py
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
```
|
| 66 |
+
Then start the UI from `ui/` (see `examples/voice_agent_webrtc_langgraph/ui/README.md`).
|
| 67 |
|
| 68 |
|
| 69 |
+
## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
|
| 70 |
+
The default TTS in `examples/voice_agent_webrtc_langgraph/pipeline.py` is NVIDIA Riva Magpie via NIM:
|
| 71 |
|
| 72 |
+
```python
|
| 73 |
+
from nvidia_pipecat.services.riva_speech import RivaTTSService
|
| 74 |
|
| 75 |
+
tts = RivaTTSService(
|
| 76 |
+
api_key=os.getenv("RIVA_API_KEY"),
|
| 77 |
+
function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
|
| 78 |
+
voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
|
| 79 |
+
model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
|
| 80 |
+
language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
|
| 81 |
+
zero_shot_audio_prompt_file=(
|
| 82 |
+
Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
|
| 83 |
+
),
|
| 84 |
+
)
|
| 85 |
```
|
| 86 |
|
| 87 |
+
To use ElevenLabs instead:
|
| 88 |
+
1) Ensure ElevenLabs support is available (included via project deps).
|
| 89 |
+
2) Set environment:
|
| 90 |
+
- `ELEVENLABS_API_KEY`
|
| 91 |
+
- Optionally `ELEVENLABS_VOICE_ID` and any model-specific settings
|
| 92 |
+
3) Edit `examples/voice_agent_webrtc_langgraph/pipeline.py` to import and construct ElevenLabs TTS:
|
| 93 |
+
|
| 94 |
+
```python
|
| 95 |
+
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
|
| 96 |
+
|
| 97 |
+
# Replace the RivaTTSService(...) block with:
|
| 98 |
+
tts = ElevenLabsTTSServiceWithEndOfSpeech(
|
| 99 |
+
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
| 100 |
+
voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
|
| 101 |
+
sample_rate=16000,
|
| 102 |
+
channels=1,
|
| 103 |
+
)
|
| 104 |
+
```
|
| 105 |
|
| 106 |
+
No other pipeline changes are required; transcript synchronization supports ElevenLabs end‑of‑speech events.
|
| 107 |
|
| 108 |
+
Notes for Magpie Zero‑shot:
|
| 109 |
+
- Set `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
|
| 110 |
+
- If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`, or set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download on startup.
|
| 111 |
|
|
|
|
| 112 |
|
| 113 |
+
## 5) Troubleshooting
|
| 114 |
+
- Healthcheck: `curl -f http://localhost:7860/get_prompt`
|
| 115 |
+
- If the UI can’t access the mic on http, use the Chrome flag above or host the UI via HTTPS.
|
| 116 |
+
- For NAT/firewall issues, configure TURN or provide Twilio credentials.
|
| 117 |
|
examples/voice_agent_webrtc_langgraph/README.md
CHANGED
|
@@ -1,186 +1,112 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
## Prerequisites
|
| 6 |
-
- You have access and are logged into NVIDIA NGC. For step-by-step instructions, refer to [the NGC Getting Started Guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#registering-activating-ngc-account).
|
| 7 |
|
| 8 |
-
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
-
```bash
|
| 15 |
-
cp env.example .env # and add your credentials
|
| 16 |
-
```
|
| 17 |
|
| 18 |
-
2
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
```bash
|
| 27 |
docker compose up --build -d
|
| 28 |
```
|
| 29 |
|
| 30 |
-
Then
|
| 31 |
-
|
| 32 |
-
Note: To enable microphone access in Chrome, go to `chrome://flags/`, enable "Insecure origins treated as secure", add `http://<machine-ip>:9000` to the list, and restart Chrome.
|
| 33 |
-
|
| 34 |
-
## Option 2: Deploy Using Python Environment
|
| 35 |
-
|
| 36 |
-
### Requirements
|
| 37 |
|
| 38 |
-
|
| 39 |
-
- [uv](https://github.com/astral-sh/uv)
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
### Run
|
| 44 |
|
| 45 |
```bash
|
| 46 |
uv run pipeline.py
|
| 47 |
```
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
##
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
```
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
```python
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
)
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
#### Update ui/src/config.ts
|
| 78 |
-
|
| 79 |
-
Add the following configuration to your `ui/src/config.ts` file to use the coturn server:
|
| 80 |
-
|
| 81 |
-
```typescript
|
| 82 |
-
export const RTC_CONFIG: ConstructorParameters<typeof RTCPeerConnection>[0] = {
|
| 83 |
-
iceServers: [
|
| 84 |
-
{
|
| 85 |
-
urls: "<turn_server_url>",
|
| 86 |
-
username: "<turn_server_username>",
|
| 87 |
-
credential: "<turn_server_credential>",
|
| 88 |
-
},
|
| 89 |
-
],
|
| 90 |
-
};
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
## Bot customizations
|
| 94 |
-
|
| 95 |
-
### Enabling Speculative Speech Processing
|
| 96 |
-
|
| 97 |
-
Speculative speech processing reduces bot response latency by working directly on Riva ASR early interim user transcripts instead of waiting for final transcripts. This feature only works when using Riva ASR.
|
| 98 |
-
|
| 99 |
-
- Update `ENABLE_SPECULATIVE_SPEECH` environment variable as `true` in docker-compose.yml under `python-app` service
|
| 100 |
-
- See the [ACE Controller Microservice documentation on Speculative Speech Processing](https://docs.nvidia.com/ace/ace-controller-microservice/1.0/user-guide.html#speculative-speech-processing) for more details.
|
| 101 |
-
|
| 102 |
-
### Switching ASR, LLM, and TTS Models
|
| 103 |
-
|
| 104 |
-
You can easily customize ASR (Automatic Speech Recognition), LLM (Large Language Model), and TTS (Text-to-Speech) services by configuring environment variables. This allows you to switch between NIM cloud-hosted models and locally deployed models.
|
| 105 |
-
|
| 106 |
-
The following environment variables control the endpoints and models:
|
| 107 |
-
|
| 108 |
-
- `RIVA_ASR_URL`: Address of the Riva ASR (speech-to-text) service (e.g., `localhost:50051` for local, "grpc.nvcf.nvidia.com:443" for cloud endpoint).
|
| 109 |
-
- `RIVA_TTS_URL`: Address of the Riva TTS (text-to-speech) service. (e.g., `localhost:50051` for local, "grpc.nvcf.nvidia.com:443" for cloud endpoint).
|
| 110 |
-
- `NVIDIA_LLM_URL`: URL for the NVIDIA LLM service. (e.g., `http://<machine-ip>:8000/v1` for local, "https://integrate.api.nvidia.com/v1" for cloud endpoint. )
|
| 111 |
-
|
| 112 |
-
You can set model, language, and voice using the `RIVA_ASR_MODEL`, `RIVA_TTS_MODEL`, `NVIDIA_LLM_MODEL`, `RIVA_ASR_LANGUAGE`, `RIVA_TTS_LANGUAGE`, and `RIVA_TTS_VOICE_ID` environment variables.
|
| 113 |
-
|
| 114 |
-
Update these variables in your Docker Compose configuration to match your deployment and desired models. For more details on available models and configuration options, refer to the [NIM NVIDIA Magpie](https://build.nvidia.com/nvidia/magpie-tts-multilingual), [NIM NVIDIA Parakeet](https://build.nvidia.com/nvidia/parakeet-ctc-1_1b-asr/api), and [NIM META Llama](https://build.nvidia.com/meta/llama-3_1-8b-instruct) documentation.
|
| 115 |
-
|
| 116 |
-
#### Example: Switching to the Llama 3.3-70B Model
|
| 117 |
-
|
| 118 |
-
To use larger LLMs like Llama 3.3-70B model in your deployment, you need to update both the Docker Compose configuration and the environment variables for your Python application. Follow these steps:
|
| 119 |
-
|
| 120 |
-
- In your `docker-compose.yml` file, find the `nvidia-llm` service section.
|
| 121 |
-
- Change the NIM image to 70B model: `nvcr.io/nim/meta/llama-3.3-70b-instruct:latest`
|
| 122 |
-
- Update the `device_ids` to allocate at least two GPUs (for example, `['2', '3']`).
|
| 123 |
-
- Update the environment variable under python-app service to `NVIDIA_LLM_MODEL=meta/llama-3.3-70b-instruct`
|
| 124 |
-
|
| 125 |
-
#### Setting up Zero-shot Magpie Latest Model
|
| 126 |
-
|
| 127 |
-
Follow these steps to configure and use the latest Zero-shot Magpie TTS model:
|
| 128 |
-
|
| 129 |
-
1. **Update Docker Compose Configuration**
|
| 130 |
-
|
| 131 |
-
Modify the `riva-tts-magpie` service in your docker-compose file with the following configuration:
|
| 132 |
-
|
| 133 |
-
```yaml
|
| 134 |
-
riva-tts-magpie:
|
| 135 |
-
image: <magpie-tts-zeroshot-image:version> # Replace this with the actual image tag
|
| 136 |
-
environment:
|
| 137 |
-
- NGC_API_KEY=${ZEROSHOT_TTS_NVIDIA_API_KEY}
|
| 138 |
-
- NIM_HTTP_API_PORT=9000
|
| 139 |
-
- NIM_GRPC_API_PORT=50051
|
| 140 |
-
ports:
|
| 141 |
-
- "49000:50051"
|
| 142 |
-
shm_size: 16GB
|
| 143 |
-
deploy:
|
| 144 |
-
resources:
|
| 145 |
-
reservations:
|
| 146 |
-
devices:
|
| 147 |
-
- driver: nvidia
|
| 148 |
-
device_ids: ['0']
|
| 149 |
-
capabilities: [gpu]
|
| 150 |
```
|
| 151 |
|
| 152 |
-
|
| 153 |
-
```bash
|
| 154 |
-
ZEROSHOT_TTS_NVIDIA_API_KEY=
|
| 155 |
-
```
|
| 156 |
-
|
| 157 |
-
2. **Configure TTS Voice Settings**
|
| 158 |
|
| 159 |
-
|
|
|
|
|
|
|
| 160 |
|
| 161 |
-
```bash
|
| 162 |
-
RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
|
| 163 |
-
RIVA_TTS_MODEL=magpie_tts_ensemble-Magpie-ZeroShot
|
| 164 |
-
```
|
| 165 |
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
- Add your audio prompt file to the workspace
|
| 171 |
-
- Mount the audio file into your container by adding a volume in your `docker-compose.yml` under the `python-app` service:
|
| 172 |
-
```yaml
|
| 173 |
-
services:
|
| 174 |
-
python-app:
|
| 175 |
-
# ... existing code ...
|
| 176 |
-
volumes:
|
| 177 |
-
- ./audio_prompts:/app/audio_prompts
|
| 178 |
-
```
|
| 179 |
-
- Set the `ZERO_SHOT_AUDIO_PROMPT` environment variable to the path relative to your application root:
|
| 180 |
-
```yaml
|
| 181 |
-
environment:
|
| 182 |
-
- ZERO_SHOT_AUDIO_PROMPT=audio_prompts/voice_sample.wav # Path relative to app root
|
| 183 |
-
```
|
| 184 |
-
|
| 185 |
-
Note: The zero-shot audio prompt is only required when using the Magpie Zero-shot model. For standard Magpie multilingual models, this configuration should be omitted.
|
| 186 |
|
|
|
|
| 1 |
+
# Voice Agent WebRTC + LangGraph (Quick Start)
|
| 2 |
|
| 3 |
+
This example launches a complete voice agent stack:
|
| 4 |
+
- LangGraph dev server for local agents
|
| 5 |
+
- Pipecat-based speech pipeline (WebRTC, ASR, LLM adapter, TTS)
|
| 6 |
+
- Static UI you can open in a browser
|
| 7 |
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
## 1) Mandatory environment variables
|
| 10 |
+
Create `.env` next to this README (or copy from `env.example`) and set at least:
|
| 11 |
|
| 12 |
+
- `NVIDIA_API_KEY` or `RIVA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
|
| 13 |
+
- `USE_LANGGRAPH=true`: enable LangGraph-backed LLM
|
| 14 |
+
- `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
|
| 15 |
+
- `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
|
| 16 |
+
- `USER_EMAIL` (any email for routing, e.g. `test@example.com`)
|
| 17 |
+
- `LANGGRAPH_STREAM_MODE` (default `values`)
|
| 18 |
+
- `LANGGRAPH_DEBUG_STREAM` (default `true`)
|
| 19 |
|
| 20 |
+
Optional but commonly used:
|
| 21 |
+
- `RIVA_ASR_LANGUAGE` (default `en-US`)
|
| 22 |
+
- `RIVA_TTS_LANGUAGE` (default `en-US`)
|
| 23 |
+
- `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
|
| 24 |
+
- `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
|
| 25 |
+
- `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot and a custom voice prompt
|
| 26 |
+
- `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
|
| 27 |
+
- `ENABLE_SPECULATIVE_SPEECH` (default `true`)
|
| 28 |
+
- TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`
|
| 29 |
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
## 2) What it does
|
| 32 |
+
- Starts LangGraph dev server to serve local agents from `agents/`.
|
| 33 |
+
- Starts the Pipecat pipeline (`pipeline.py`) exposing:
|
| 34 |
+
- HTTP: `http://<host>:7860` (health and RTC config)
|
| 35 |
+
- WebSocket: `ws://<host>:7860/ws` for audio and transcripts
|
| 36 |
+
- Serves the built UI at `http://<host>:9000/` (via the container).
|
| 37 |
|
| 38 |
+
By default it uses:
|
| 39 |
+
- ASR: NVIDIA Riva (NIM) with `RIVA_API_KEY` and `NVIDIA_ASR_FUNCTION_ID`
|
| 40 |
+
- LLM: LangGraph adapter streaming from the selected assistant
|
| 41 |
+
- TTS: NVIDIA Riva Magpie (NIM) with `RIVA_API_KEY` and `NVIDIA_TTS_FUNCTION_ID`
|
| 42 |
|
| 43 |
+
|
| 44 |
+
## 3) Run
|
| 45 |
+
|
| 46 |
+
### Option A: Docker (recommended)
|
| 47 |
+
From this directory:
|
| 48 |
|
| 49 |
```bash
|
| 50 |
docker compose up --build -d
|
| 51 |
```
|
| 52 |
|
| 53 |
+
Then open `http://<machine-ip>:9000/`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
Chrome on http origins: enable “Insecure origins treated as secure” at `chrome://flags/` and add `http://<machine-ip>:9000`.
|
|
|
|
| 56 |
|
| 57 |
+
### Option B: Python (local)
|
| 58 |
+
Requires Python 3.12 and `uv`.
|
|
|
|
| 59 |
|
| 60 |
```bash
|
| 61 |
uv run pipeline.py
|
| 62 |
```
|
| 63 |
+
Then start the UI from `ui/` (see `ui/README.md`).
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
|
| 67 |
+
The default TTS in `pipeline.py` is NVIDIA Riva Magpie via NIM:
|
| 68 |
+
|
| 69 |
+
```startLine:endLine:examples/voice_agent_webrtc_langgraph/pipeline.py
|
| 70 |
+
tts = RivaTTSService(
|
| 71 |
+
api_key=os.getenv("RIVA_API_KEY"),
|
| 72 |
+
function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
|
| 73 |
+
voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
|
| 74 |
+
model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
|
| 75 |
+
language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
|
| 76 |
+
zero_shot_audio_prompt_file=(
|
| 77 |
+
Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
|
| 78 |
+
),
|
| 79 |
+
)
|
| 80 |
```
|
| 81 |
|
| 82 |
+
To use ElevenLabs instead:
|
| 83 |
+
1) Ensure `pipecat` ElevenLabs dependency is available (already included via project deps).
|
| 84 |
+
2) Set environment:
|
| 85 |
+
- `ELEVENLABS_API_KEY`
|
| 86 |
+
- Optionally `ELEVENLABS_VOICE_ID` and model settings supported by ElevenLabs
|
| 87 |
+
3) Change the TTS construction in `pipeline.py` to use `ElevenLabsTTSServiceWithEndOfSpeech`:
|
| 88 |
|
| 89 |
```python
|
| 90 |
+
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
|
| 91 |
+
|
| 92 |
+
# Replace RivaTTSService(...) with:
|
| 93 |
+
tts = ElevenLabsTTSServiceWithEndOfSpeech(
|
| 94 |
+
api_key=os.getenv("ELEVENLABS_API_KEY"),
|
| 95 |
+
voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
|
| 96 |
+
sample_rate=16000,
|
| 97 |
+
channels=1,
|
| 98 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
```
|
| 100 |
|
| 101 |
+
That’s it. No other pipeline changes are required. The transcript synchronization already supports ElevenLabs end‑of‑speech events.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
+
Notes for Magpie Zero‑shot:
|
| 104 |
+
- Provide `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
|
| 105 |
+
- If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`. You can also set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download at startup.
|
| 106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
+
## 5) Troubleshooting
|
| 109 |
+
- Healthcheck: `curl -f http://localhost:7860/get_prompt`
|
| 110 |
+
- If UI can’t access mic on http, use Chrome flag above or host UI via HTTPS.
|
| 111 |
+
- For NAT/firewall issues, configure TURN or Twilio credentials.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
|