fciannella commited on
Commit
06523e9
·
1 Parent(s): e9446cb

Added the readme

Browse files
README.md CHANGED
@@ -1,92 +1,117 @@
1
- ---
2
- title: Ace Controller Pipeline
3
- emoji: 🐠
4
- colorFrom: indigo
5
- colorTo: gray
6
- sdk: docker
7
- pinned: false
8
- short_description: Voice Demos with Ace Controller
9
- ---
10
 
11
- # ACE Controller SDK
 
 
 
12
 
13
- The ACE Controller SDK allows you to build your own ACE Controller service to manage multimodal, real-time interactions with voice bots and avatars using NVIDIA ACE. With the SDK, you can create controllers that leverage the Python-based open-source [Pipecat framework](https://github.com/pipecat-ai/pipecat) for creating real-time, voice-enabled, and multimodal conversational AI agents. The SDK contains enhancements to the Pipecat framework, enabling developers to effortlessly customize, debug, and deploy complex pipelines while integrating robust NVIDIA Services into the Pipecat ecosystem.
14
 
15
- ## Main Features
16
 
17
- - **Pipecat Extension:** A Pipecat extension to connect with ACE services and NVIDIA NIMs, facilitating the creation of human-avatar interactions. The NVIDIA Pipecat library augments [the Pipecat framework](https://github.com/pipecat-ai/pipecat) by adding additional frame processors and services, as well as new multimodal frames to enhance avatar interactions. This includes the integration of NVIDIA services and NIMs such as [NVIDIA Riva](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html), [NVIDIA Audio2Face](https://build.nvidia.com/nvidia/audio2face-3d), and [NVIDIA Foundational RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline).
 
18
 
19
- - **HTTP and WebSocket Server Implementation:** The SDK provides a FastAPI-based HTTP and WebSocket server implementation compatible with ACE. It includes functionality for stream and pipeline management by offering new Pipecat pipeline runners and transports. For ease of use and distribution, this functionality is currently included in the `nvidia-pipecat` Python library as well.
 
 
 
 
 
20
 
21
- ## ACE Controller Microservice
 
 
 
 
 
 
 
 
 
22
 
23
- The ACE Controller SDK was used to build the [ACE Controller Microservice](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html).Check out the [ACE documentation](https://docs.nvidia.com/ace/tokkio/latest/customization/customization-options.html) for more details on how to configure the ACE Controller MS with your custom pipelines.
24
 
 
 
 
 
 
 
25
 
26
- ## Getting Started
 
 
 
27
 
28
- The NVIDIA Pipecat package is released as a wheel on PyPI. Create a Python virtual environment and use the pip command to install the nvidia-pipecat package.
29
 
30
- ```bash
31
- pip install nvidia-pipecat
32
- ```
33
-
34
- You can start building pipecat pipelines utilizing services from the NVIDIA Pipecat package. For more details, follow [the ACE Controller](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html) and [the Pipecat Framework](https://docs.pipecat.ai/getting-started/overview) documentation.
35
-
36
- ## Hacking on the framework itself
37
 
38
- If you wish to work directly with the source code or modify services from the nvidia-pipecat package, you can utilize either the UV or Nix development setup as outlined below.
 
39
 
40
- ### Using UV
41
-
42
-
43
- To get started, first install the [UV package manager](https://docs.astral.sh/uv/#highlights).
44
-
45
- Then, create a virtual environment with all the required dependencies by running the following commands:
46
  ```bash
47
- uv venv
48
- uv sync
49
- source .venv/bin/activate
50
  ```
51
 
52
- Once the environment is set up, you can begin building pipelines or modifying the services in the source code.
53
 
54
- If you wish to contribute your changes to the repository, please ensure you run the unit tests, linter, and formatting tool.
55
 
56
- To run unit tests, use:
57
- ```
58
- uv run pytest
59
- ```
60
 
61
- To format the code, use:
62
  ```bash
63
- ruff format
64
- ```
65
-
66
- To run the linter, use:
67
- ```
68
- ruff check
69
  ```
 
70
 
71
 
72
- ### Using Nix
 
73
 
74
- To set up your development environment using [the Nix](https://nixos.org/download/#nix-install-linux), follow these steps:
 
75
 
76
- Initialize the development environment: Simply run the following command:
77
- ```bash
78
- nix develop
 
 
 
 
 
 
 
79
  ```
80
 
81
- This setup provides you with a fully configured environment, allowing you to focus on development without worrying about dependency management.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
- To ensure that all checks such as the formatting and linter for the repository are passing, use the following command:
84
 
85
- ```bash
86
- nix flake check
87
- ```
88
 
89
- ## CONTRIBUTING
90
 
91
- We invite contributions! Open a GitHub issue or pull request! See contributing guildelines [here](./CONTRIBUTING.md).
 
 
 
92
 
 
1
+ # Voice Agent WebRTC + LangGraph (Quick Start)
 
 
 
 
 
 
 
 
2
 
3
+ This repository includes a complete voice agent stack:
4
+ - LangGraph dev server for local agents
5
+ - Pipecat-based speech pipeline (WebRTC, ASR, LangGraph LLM adapter, TTS)
6
+ - Static UI you can open in a browser
7
 
8
+ Primary example: `examples/voice_agent_webrtc_langgraph/`
9
 
 
10
 
11
+ ## 1) Mandatory environment variables
12
+ Create `.env` in `examples/voice_agent_webrtc_langgraph/` (copy from `env.example`) and set at least:
13
 
14
+ - `RIVA_API_KEY` or `NVIDIA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
15
+ - `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
16
+ - `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
17
+ - `USER_EMAIL` (e.g. `test@example.com`)
18
+ - `LANGGRAPH_STREAM_MODE` (default `values`)
19
+ - `LANGGRAPH_DEBUG_STREAM` (default `true`)
20
 
21
+ Optional but useful:
22
+ - `RIVA_ASR_LANGUAGE` (default `en-US`)
23
+ - `RIVA_TTS_LANGUAGE` (default `en-US`)
24
+ - `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
25
+ - `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
26
+ - `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot with a custom audio prompt
27
+ - `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
28
+ - `ENABLE_SPECULATIVE_SPEECH` (default `true`)
29
+ - `LANGGRAPH_AUTH_TOKEN` (or `AUTH0_ACCESS_TOKEN`/`AUTH_BEARER_TOKEN`) if your LangGraph server requires auth
30
+ - TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`
31
 
 
32
 
33
+ ## 2) What it does
34
+ - Starts LangGraph dev server serving agents from `examples/voice_agent_webrtc_langgraph/agents/`.
35
+ - Starts the Pipecat pipeline (`pipeline.py`) exposing:
36
+ - HTTP: `http://<host>:7860` (health, RTC config)
37
+ - WebSocket: `ws://<host>:7860/ws` (audio + transcripts)
38
+ - Serves the built UI at `http://<host>:9000/` (via Docker).
39
 
40
+ Defaults:
41
+ - ASR: NVIDIA Riva (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_ASR_FUNCTION_ID`
42
+ - LLM: LangGraph adapter, streaming from the selected assistant
43
+ - TTS: NVIDIA Riva Magpie (NIM) via `RIVA_API_KEY` and built-in `NVIDIA_TTS_FUNCTION_ID`
44
 
 
45
 
46
+ ## 3) Run
 
 
 
 
 
 
47
 
48
+ ### Option A: Docker (recommended)
49
+ From `examples/voice_agent_webrtc_langgraph/`:
50
 
 
 
 
 
 
 
51
  ```bash
52
+ docker compose up --build -d
 
 
53
  ```
54
 
55
+ Then open `http://<machine-ip>:9000/`.
56
 
57
+ Chrome on http origins: enable “Insecure origins treated as secure” at `chrome://flags/` and add `http://<machine-ip>:9000`.
58
 
59
+ ### Option B: Python (local)
60
+ Requires Python 3.12 and `uv`.
 
 
61
 
 
62
  ```bash
63
+ cd examples/voice_agent_webrtc_langgraph
64
+ uv run pipeline.py
 
 
 
 
65
  ```
66
+ Then start the UI from `ui/` (see `examples/voice_agent_webrtc_langgraph/ui/README.md`).
67
 
68
 
69
+ ## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
70
+ The default TTS in `examples/voice_agent_webrtc_langgraph/pipeline.py` is NVIDIA Riva Magpie via NIM:
71
 
72
+ ```python
73
+ from nvidia_pipecat.services.riva_speech import RivaTTSService
74
 
75
+ tts = RivaTTSService(
76
+ api_key=os.getenv("RIVA_API_KEY"),
77
+ function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
78
+ voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
79
+ model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
80
+ language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
81
+ zero_shot_audio_prompt_file=(
82
+ Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
83
+ ),
84
+ )
85
  ```
86
 
87
+ To use ElevenLabs instead:
88
+ 1) Ensure ElevenLabs support is available (included via project deps).
89
+ 2) Set environment:
90
+ - `ELEVENLABS_API_KEY`
91
+ - Optionally `ELEVENLABS_VOICE_ID` and any model-specific settings
92
+ 3) Edit `examples/voice_agent_webrtc_langgraph/pipeline.py` to import and construct ElevenLabs TTS:
93
+
94
+ ```python
95
+ from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
96
+
97
+ # Replace the RivaTTSService(...) block with:
98
+ tts = ElevenLabsTTSServiceWithEndOfSpeech(
99
+ api_key=os.getenv("ELEVENLABS_API_KEY"),
100
+ voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
101
+ sample_rate=16000,
102
+ channels=1,
103
+ )
104
+ ```
105
 
106
+ No other pipeline changes are required; transcript synchronization supports ElevenLabs end‑of‑speech events.
107
 
108
+ Notes for Magpie Zero‑shot:
109
+ - Set `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
110
+ - If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`, or set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download on startup.
111
 
 
112
 
113
+ ## 5) Troubleshooting
114
+ - Healthcheck: `curl -f http://localhost:7860/get_prompt`
115
+ - If the UI can’t access the mic on http, use the Chrome flag above or host the UI via HTTPS.
116
+ - For NAT/firewall issues, configure TURN or provide Twilio credentials.
117
 
examples/voice_agent_webrtc_langgraph/README.md CHANGED
@@ -1,186 +1,112 @@
1
- # Speech to Speech Demo
2
 
3
- In this example, we showcase how to build a speech-to-speech voice assistant pipeline using WebRTC with real-time transcripts. It uses Pipecat pipeline with FastAPI on the backend, and React on the frontend. This pipeline uses a WebRTC based SmallWebRTCTransport, Riva ASR and TTS models and NVIDIA LLM Service.
 
 
 
4
 
5
- ## Prerequisites
6
- - You have access and are logged into NVIDIA NGC. For step-by-step instructions, refer to [the NGC Getting Started Guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#registering-activating-ngc-account).
7
 
8
- - You have Docker installed with support for NVIDIA GPUs. For more information, refer to [the Support Matrix](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/support-matrix.html#support-matrix).
 
9
 
10
- ## Setup API keys
 
 
 
 
 
 
11
 
12
- 1. Copy and configure the environment file:
 
 
 
 
 
 
 
 
13
 
14
- ```bash
15
- cp env.example .env # and add your credentials
16
- ```
17
 
18
- 2. Ensure you have the required API keys:
19
- - NVIDIA_API_KEY - Required for accessing NIM ASR, TTS and LLM models
20
- - (Optional) ZEROSHOT_TTS_NVIDIA_API_KEY - Required for zero-shot TTS
 
 
 
21
 
22
- ## Option 1: Deploy Using Docker
 
 
 
23
 
24
- From the example/voice_agent_webrtc directory, run:
 
 
 
 
25
 
26
  ```bash
27
  docker compose up --build -d
28
  ```
29
 
30
- Then visit `http://<machine-ip>:9000/` in your browser to start interacting with the application.
31
-
32
- Note: To enable microphone access in Chrome, go to `chrome://flags/`, enable "Insecure origins treated as secure", add `http://<machine-ip>:9000` to the list, and restart Chrome.
33
-
34
- ## Option 2: Deploy Using Python Environment
35
-
36
- ### Requirements
37
 
38
- - Python (>=3.12)
39
- - [uv](https://github.com/astral-sh/uv)
40
 
41
- All Python dependencies are listed in `pyproject.toml` and can be installed with `uv`.
42
-
43
- ### Run
44
 
45
  ```bash
46
  uv run pipeline.py
47
  ```
48
-
49
- Then run the ui from [`ui/README.md`](ui/README.md).
50
-
51
- ## Using Coturn Server
52
-
53
- If you want to share widely or want to deploy on cloud platforms, you will need to setup coturn server. Follow instructions below for modifications required in example code for using coturn:
54
-
55
- ### Deploy Coturn Server
56
-
57
- Update HOST_IP_EXTERNAL and run the below command:
58
-
59
- ```bash
60
- docker run -d --network=host instrumentisto/coturn -n --verbose --log-file=stdout --external-ip=<HOST_IP_EXTERNAL> --listening-ip=<HOST_IP_EXTERNAL> --lt-cred-mech --fingerprint --user=admin:admin --no-multicast-peers --realm=tokkio.realm.org --min-port=51000 --max-port=52000
 
 
 
 
61
  ```
62
 
63
- #### Update pipeline.py
64
-
65
- Add the following configuration to your `pipeline.py` file to use the coturn server:
 
 
 
66
 
67
  ```python
68
- ice_servers = [
69
- IceServer(
70
- urls="<TURN_SERVER_URL>",
71
- username="<TURN_USERNAME>",
72
- credential="<TURN_PASSWORD>"
73
- )
74
- ]
75
- ```
76
-
77
- #### Update ui/src/config.ts
78
-
79
- Add the following configuration to your `ui/src/config.ts` file to use the coturn server:
80
-
81
- ```typescript
82
- export const RTC_CONFIG: ConstructorParameters<typeof RTCPeerConnection>[0] = {
83
- iceServers: [
84
- {
85
- urls: "<turn_server_url>",
86
- username: "<turn_server_username>",
87
- credential: "<turn_server_credential>",
88
- },
89
- ],
90
- };
91
- ```
92
-
93
- ## Bot customizations
94
-
95
- ### Enabling Speculative Speech Processing
96
-
97
- Speculative speech processing reduces bot response latency by working directly on Riva ASR early interim user transcripts instead of waiting for final transcripts. This feature only works when using Riva ASR.
98
-
99
- - Update `ENABLE_SPECULATIVE_SPEECH` environment variable as `true` in docker-compose.yml under `python-app` service
100
- - See the [ACE Controller Microservice documentation on Speculative Speech Processing](https://docs.nvidia.com/ace/ace-controller-microservice/1.0/user-guide.html#speculative-speech-processing) for more details.
101
-
102
- ### Switching ASR, LLM, and TTS Models
103
-
104
- You can easily customize ASR (Automatic Speech Recognition), LLM (Large Language Model), and TTS (Text-to-Speech) services by configuring environment variables. This allows you to switch between NIM cloud-hosted models and locally deployed models.
105
-
106
- The following environment variables control the endpoints and models:
107
-
108
- - `RIVA_ASR_URL`: Address of the Riva ASR (speech-to-text) service (e.g., `localhost:50051` for local, "grpc.nvcf.nvidia.com:443" for cloud endpoint).
109
- - `RIVA_TTS_URL`: Address of the Riva TTS (text-to-speech) service. (e.g., `localhost:50051` for local, "grpc.nvcf.nvidia.com:443" for cloud endpoint).
110
- - `NVIDIA_LLM_URL`: URL for the NVIDIA LLM service. (e.g., `http://<machine-ip>:8000/v1` for local, "https://integrate.api.nvidia.com/v1" for cloud endpoint. )
111
-
112
- You can set model, language, and voice using the `RIVA_ASR_MODEL`, `RIVA_TTS_MODEL`, `NVIDIA_LLM_MODEL`, `RIVA_ASR_LANGUAGE`, `RIVA_TTS_LANGUAGE`, and `RIVA_TTS_VOICE_ID` environment variables.
113
-
114
- Update these variables in your Docker Compose configuration to match your deployment and desired models. For more details on available models and configuration options, refer to the [NIM NVIDIA Magpie](https://build.nvidia.com/nvidia/magpie-tts-multilingual), [NIM NVIDIA Parakeet](https://build.nvidia.com/nvidia/parakeet-ctc-1_1b-asr/api), and [NIM META Llama](https://build.nvidia.com/meta/llama-3_1-8b-instruct) documentation.
115
-
116
- #### Example: Switching to the Llama 3.3-70B Model
117
-
118
- To use larger LLMs like Llama 3.3-70B model in your deployment, you need to update both the Docker Compose configuration and the environment variables for your Python application. Follow these steps:
119
-
120
- - In your `docker-compose.yml` file, find the `nvidia-llm` service section.
121
- - Change the NIM image to 70B model: `nvcr.io/nim/meta/llama-3.3-70b-instruct:latest`
122
- - Update the `device_ids` to allocate at least two GPUs (for example, `['2', '3']`).
123
- - Update the environment variable under python-app service to `NVIDIA_LLM_MODEL=meta/llama-3.3-70b-instruct`
124
-
125
- #### Setting up Zero-shot Magpie Latest Model
126
-
127
- Follow these steps to configure and use the latest Zero-shot Magpie TTS model:
128
-
129
- 1. **Update Docker Compose Configuration**
130
-
131
- Modify the `riva-tts-magpie` service in your docker-compose file with the following configuration:
132
-
133
- ```yaml
134
- riva-tts-magpie:
135
- image: <magpie-tts-zeroshot-image:version> # Replace this with the actual image tag
136
- environment:
137
- - NGC_API_KEY=${ZEROSHOT_TTS_NVIDIA_API_KEY}
138
- - NIM_HTTP_API_PORT=9000
139
- - NIM_GRPC_API_PORT=50051
140
- ports:
141
- - "49000:50051"
142
- shm_size: 16GB
143
- deploy:
144
- resources:
145
- reservations:
146
- devices:
147
- - driver: nvidia
148
- device_ids: ['0']
149
- capabilities: [gpu]
150
  ```
151
 
152
- - Ensure your ZEROSHOT_TTS_NVIDIA_API_KEY key is properly set in your `.env` file:
153
- ```bash
154
- ZEROSHOT_TTS_NVIDIA_API_KEY=
155
- ```
156
-
157
- 2. **Configure TTS Voice Settings**
158
 
159
- Update the following environment variables under the `python-app` service:
 
 
160
 
161
- ```bash
162
- RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
163
- RIVA_TTS_MODEL=magpie_tts_ensemble-Magpie-ZeroShot
164
- ```
165
 
166
- 3. **Zero-shot Audio Prompt Configuration**
167
-
168
- To use a custom voice with zero-shot learning:
169
-
170
- - Add your audio prompt file to the workspace
171
- - Mount the audio file into your container by adding a volume in your `docker-compose.yml` under the `python-app` service:
172
- ```yaml
173
- services:
174
- python-app:
175
- # ... existing code ...
176
- volumes:
177
- - ./audio_prompts:/app/audio_prompts
178
- ```
179
- - Set the `ZERO_SHOT_AUDIO_PROMPT` environment variable to the path relative to your application root:
180
- ```yaml
181
- environment:
182
- - ZERO_SHOT_AUDIO_PROMPT=audio_prompts/voice_sample.wav # Path relative to app root
183
- ```
184
-
185
- Note: The zero-shot audio prompt is only required when using the Magpie Zero-shot model. For standard Magpie multilingual models, this configuration should be omitted.
186
 
 
1
+ # Voice Agent WebRTC + LangGraph (Quick Start)
2
 
3
+ This example launches a complete voice agent stack:
4
+ - LangGraph dev server for local agents
5
+ - Pipecat-based speech pipeline (WebRTC, ASR, LLM adapter, TTS)
6
+ - Static UI you can open in a browser
7
 
 
 
8
 
9
+ ## 1) Mandatory environment variables
10
+ Create `.env` next to this README (or copy from `env.example`) and set at least:
11
 
12
+ - `NVIDIA_API_KEY` or `RIVA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
13
+ - `USE_LANGGRAPH=true`: enable LangGraph-backed LLM
14
+ - `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
15
+ - `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
16
+ - `USER_EMAIL` (any email for routing, e.g. `test@example.com`)
17
+ - `LANGGRAPH_STREAM_MODE` (default `values`)
18
+ - `LANGGRAPH_DEBUG_STREAM` (default `true`)
19
 
20
+ Optional but commonly used:
21
+ - `RIVA_ASR_LANGUAGE` (default `en-US`)
22
+ - `RIVA_TTS_LANGUAGE` (default `en-US`)
23
+ - `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
24
+ - `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
25
+ - `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot and a custom voice prompt
26
+ - `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
27
+ - `ENABLE_SPECULATIVE_SPEECH` (default `true`)
28
+ - TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`
29
 
 
 
 
30
 
31
+ ## 2) What it does
32
+ - Starts LangGraph dev server to serve local agents from `agents/`.
33
+ - Starts the Pipecat pipeline (`pipeline.py`) exposing:
34
+ - HTTP: `http://<host>:7860` (health and RTC config)
35
+ - WebSocket: `ws://<host>:7860/ws` for audio and transcripts
36
+ - Serves the built UI at `http://<host>:9000/` (via the container).
37
 
38
+ By default it uses:
39
+ - ASR: NVIDIA Riva (NIM) with `RIVA_API_KEY` and `NVIDIA_ASR_FUNCTION_ID`
40
+ - LLM: LangGraph adapter streaming from the selected assistant
41
+ - TTS: NVIDIA Riva Magpie (NIM) with `RIVA_API_KEY` and `NVIDIA_TTS_FUNCTION_ID`
42
 
43
+
44
+ ## 3) Run
45
+
46
+ ### Option A: Docker (recommended)
47
+ From this directory:
48
 
49
  ```bash
50
  docker compose up --build -d
51
  ```
52
 
53
+ Then open `http://<machine-ip>:9000/`.
 
 
 
 
 
 
54
 
55
+ Chrome on http origins: enable “Insecure origins treated as secure” at `chrome://flags/` and add `http://<machine-ip>:9000`.
 
56
 
57
+ ### Option B: Python (local)
58
+ Requires Python 3.12 and `uv`.
 
59
 
60
  ```bash
61
  uv run pipeline.py
62
  ```
63
+ Then start the UI from `ui/` (see `ui/README.md`).
64
+
65
+
66
+ ## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
67
+ The default TTS in `pipeline.py` is NVIDIA Riva Magpie via NIM:
68
+
69
+ ```startLine:endLine:examples/voice_agent_webrtc_langgraph/pipeline.py
70
+ tts = RivaTTSService(
71
+ api_key=os.getenv("RIVA_API_KEY"),
72
+ function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
73
+ voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
74
+ model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
75
+ language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
76
+ zero_shot_audio_prompt_file=(
77
+ Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
78
+ ),
79
+ )
80
  ```
81
 
82
+ To use ElevenLabs instead:
83
+ 1) Ensure `pipecat` ElevenLabs dependency is available (already included via project deps).
84
+ 2) Set environment:
85
+ - `ELEVENLABS_API_KEY`
86
+ - Optionally `ELEVENLABS_VOICE_ID` and model settings supported by ElevenLabs
87
+ 3) Change the TTS construction in `pipeline.py` to use `ElevenLabsTTSServiceWithEndOfSpeech`:
88
 
89
  ```python
90
+ from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
91
+
92
+ # Replace RivaTTSService(...) with:
93
+ tts = ElevenLabsTTSServiceWithEndOfSpeech(
94
+ api_key=os.getenv("ELEVENLABS_API_KEY"),
95
+ voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
96
+ sample_rate=16000,
97
+ channels=1,
98
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ```
100
 
101
+ That’s it. No other pipeline changes are required. The transcript synchronization already supports ElevenLabs end‑of‑speech events.
 
 
 
 
 
102
 
103
+ Notes for Magpie Zero‑shot:
104
+ - Provide `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
105
+ - If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`. You can also set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download at startup.
106
 
 
 
 
 
107
 
108
+ ## 5) Troubleshooting
109
+ - Healthcheck: `curl -f http://localhost:7860/get_prompt`
110
+ - If UI can’t access mic on http, use Chrome flag above or host UI via HTTPS.
111
+ - For NAT/firewall issues, configure TURN or Twilio credentials.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112