Spaces:

nvidia
/

voice-agent-examples

Running

App Files Files Community

voice-agent-examples / tests /perf /README.md

fciannella

Working with service run on 7860

53ea588 3 months ago

preview code

raw

history blame

3.12 kB

Performance Testing

This directory contains tools for evaluating the voice agent pipeline's latency and scalability/throughput under various loads. These tests simulate real-world scenarios where multiple users interact with the voice agent simultaneously.

What the Tests Do

The performance tests:

Open WebSocket clients that simulate user interactions
Use pre-recorded audio files from audio_files/ as user queries
Send these queries to the voice agent pipeline and measure response times
Track various latency metrics including end-to-end latency, component-wise breakdowns
Can simulate multiple concurrent clients to test scaling
Detect any audio glitches during processing

Running Performance Tests

1. Start the Voice Agent Pipeline

First, start the voice agent pipeline and capture server logs for analysis. See the prerequisites and setup instructions in examples/speech-to-speech/README.md before proceeding.

If Using Docker

From examples/speech-to-speech/ directory run:

# Start the services
docker compose up -d

# Capture logs and save them into a file
docker compose logs -f python-app > bot_logs_test1.txt 2>&1

Before starting a new performance run:

# Clear existing Docker logs
sudo truncate -s 0 /var/lib/docker/containers/$(docker compose ps -q python-app)/$(docker compose ps -q python-app)-json.log

If Using Python Environment

From examples/speech-to-speech/ directory run:

python bot.py > bot_logs_test1.txt 2>&1

2. Run the Multi-Client Benchmark

./run_multi_client_benchmark.sh --host 0.0.0.0 --port 8100 --clients 10 --test-duration 150

Parameters:

--host: The host address (default: 0.0.0.0)
--port: The port where your voice agent is running (default: 8100)
--clients: Number of concurrent clients to simulate (default: 1)
--test-duration: Duration of the test in seconds (default: 150)

The script will:

Start the specified number of concurrent clients
Simulate user interactions using audio files
Measure latencies and detect audio glitches
Save detailed results in the results directory as JSON files
Output a summary to the console

3. Analyze Component-wise Latency

After the benchmark completes, analyze the server logs for detailed latency breakdowns:

python ttfb_analyzer.py <relative_path_to_bot_logs_test1.txt>

This will show:

Per-client latency metrics for LLM, TTS, and ASR components
Number of calls made by each client
Overall averages and P95 values
Component-wise timing breakdowns

Understanding the Results

The metrics measured include:

LLM TTFB: Time to first byte from the LLM model
TTS TTFB: Time to first byte from the TTS model
ASR Lat: Compute latency of the ASR model
LLM 1st: Time taken to generate first complete sentence from LLM
Calls: Number of API calls made to each service

The results help identify:

Performance bottlenecks in specific components
Scaling behavior under concurrent load
Potential audio quality issues
Overall system responsiveness