Spaces:

nvidia
/

voice-agent-examples

Running

App Files Files Community

voice-agent-examples / examples /voice_agent_webrtc_langgraph /agents /healthcare-agent /TESTING_GUIDE.md

fciannella

First functional multi thread

9312c3a about 2 months ago

preview code

raw

history blame contribute delete

9.25 kB

Healthcare Agent Testing Guide

This guide provides test scenarios to validate the improved tool calling behavior, especially the secret question authentication flow.

Test Patients

Two test patients are available in mock_data/patients.json:

Patient 1: John Marshall

Full Name: John Marshall
DOB: 1960-01-01 (January 1st, 1960)
MRN Last-4: 0001
Secret Question: "What is your favorite color?"
Secret Answer: blue
Allergies: Penicillin
Medications: Acetaminophen 500mg PRN

Patient 2: Francesco Ciannella

Full Name: Francesco Ciannella
DOB: 1990-01-01 (January 1st, 1990)
MRN Last-4: 6001
Secret Question: "What city were you born in?"
Secret Answer: rome
Allergies: NKA (No Known Allergies)
Medications: Lisinopril 10mg daily

Test Scenarios

✅ Test 1: Secret Question Flow (MOST IMPORTANT)

Purpose: Verify the agent correctly handles the two-step secret question authentication.

Steps:

Say: "My name is John Marshall"
Agent should call find_patient() and greet you
Say: "January 1st, 1960" (DOB)
Agent should call verify_identity(dob=...)
EXPECTED: Agent should ASK: "For security, what is your favorite color?"
Say: "Blue"
Agent should call verify_identity(dob=..., secret_answer="Blue")
EXPECTED: Agent confirms verification and asks about symptoms

Success Criteria:

✅ Agent explicitly asks the secret question
✅ Agent waits for your answer before proceeding
✅ Agent confirms "You're verified" after correct answer
✅ Agent does NOT ask for MRN last-4

Common Failure Modes (Old Behavior):

❌ Agent skips asking the secret question
❌ Agent asks for MRN last-4 instead of secret question
❌ Agent claims verification before getting secret answer

✅ Test 2: Wrong Secret Answer

Purpose: Verify agent handles incorrect secret answers gracefully.

Steps:

Say: "My name is John Marshall"
Say: "January 1st, 1960" (DOB)
Agent asks: "What is your favorite color?"
Say: "Red" (WRONG - correct is "blue")

Expected Behavior:

✅ Agent says verification failed
✅ Agent asks for MRN last-4 as alternative: "Could you provide the last 4 digits of your MRN?"
✅ Agent does NOT proceed to medical questions

✅ Test 3: MRN Last-4 Path (Alternative Auth)

Purpose: Verify agent accepts MRN last-4 without asking secret question.

Steps:

Say: "My name is John Marshall"
Say: "January 1st, 1960, and MRN last-4 is 0001"

Expected Behavior:

✅ Agent calls verify_identity(dob=..., mrn_last4="0001")
✅ Agent confirms verification immediately
✅ Agent does NOT ask secret question (since MRN provided)

✅ Test 4: Allergy Awareness

Purpose: Verify agent considers allergies before recommending medications.

Steps:

Complete authentication as John Marshall
Say: "I have a headache and a sore throat"
Agent calls get_patient_profile_tool() and triage_symptoms_tool()

Expected Behavior:

✅ Agent mentions: "You have a penicillin allergy"
✅ Agent recommends Acetaminophen (which patient already has)
✅ Agent does NOT recommend penicillin-based antibiotics
✅ Agent references existing medication: "Since you're already taking acetaminophen as needed..."

✅ Test 5: Urgent Triage (Chest Pain)

Purpose: Verify agent correctly escalates urgent symptoms.

Steps:

Complete authentication as Francesco Ciannella
Say: "I'm having severe chest pain and shortness of breath"

Expected Behavior:

✅ Agent calls triage_symptoms_tool(symptoms_text="severe chest pain and shortness of breath")
✅ Returns: {risk: "urgent", red_flags: ["chest pain"]}
✅ Agent immediately says: "Chest pain can be serious. Please call 911 now or go to the nearest emergency room."
✅ Agent does NOT offer to book a regular appointment
✅ Agent emphasizes urgency

✅ Test 6: Self-Care with Appointment Booking

Purpose: Verify full flow for non-urgent symptoms with appointment.

Steps:

Complete authentication as John Marshall
Say: "I have a mild headache and feel tired, but no fever, no neck stiffness"
Agent triages and provides self-care advice
Say: "Yes, I'd like to schedule an appointment"

Expected Behavior:

✅ Agent calls triage_symptoms_tool() → returns risk: "self_care"
✅ Agent provides advice: "Try rest, hydration, and acetaminophen as directed"
✅ Agent offers appointment: "Would you like a telehealth appointment?"
✅ Agent calls list_providers_tool() → presents options
✅ Agent calls get_provider_slots_tool() → shows times in friendly format
✅ After you choose, agent calls schedule_appointment_tool()
✅ Agent confirms: "Booked. I'll send details to your phone ending in 0101."
✅ Agent calls get_preferred_pharmacy_tool() and confirms pharmacy
✅ At end, agent calls log_call_tool() silently

✅ Test 7: Francesco Ciannella Secret Question

Purpose: Test second patient with different secret question.

Steps:

Say: "My name is Francesco Ciannella"
Say: "January 1st, 1990"
EXPECTED: Agent asks "For security, what city were you born in?"
Say: "Rome"

Expected Behavior:

✅ Agent asks the correct secret question for Francesco
✅ Agent accepts "Rome" (case-insensitive)
✅ Agent confirms verification

✅ Test 8: Profile Not Found

Purpose: Verify agent handles unknown patients gracefully.

Steps:

Say: "My name is Jane Doe"

Expected Behavior:

✅ Agent calls find_patient(full_name="Jane Doe")
✅ Returns: {}
✅ Agent says: "I'm not finding you in our system. Could you verify the spelling of your name?"
✅ Agent does NOT proceed to verification without patient_id

Debugging Tips

Check Logs

Look for these log entries in app.log:

INFO - LLM tool_calls: ['find_patient']
INFO - tool find_patient result: {"patient_id": "pt_jmarshall", ...}
INFO - LLM tool_calls: ['verify_identity']
INFO - verify_identity: verified=False needs=['mrn_last4_or_secret']
INFO - LLM content: For security, what is your favorite color?
INFO - verify_identity: verified=True needs=[]
INFO - LLM tool_calls: ['get_patient_profile_tool']

Common Issues

Issue: Agent doesn't ask secret question

Check: Tool docstring has 7-step CRITICAL SECRET QUESTION FLOW section
Check: System prompt has explicit secret question instructions in step 2

Issue: Agent asks for MRN instead of secret question

Fix: Ensure prompt says "If 'question' field is present: READ THE EXACT QUESTION"

Issue: Agent proceeds without verification

Check: Prompt has "NEVER claim verification until verified=true"
Check: Tool auto-injects patient_id

Expected Tool Call Sequences

Minimal Flow (MRN path):

1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...", mrn_last4="0001")
3. get_patient_profile_tool()
4. triage_symptoms_tool(symptoms_text="...")
5. log_call_tool(notes="...", triage_json="...")

Secret Question Flow:

1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...")                    ← First call
3. verify_identity(dob="...", secret_answer="...") ← Second call
4. get_patient_profile_tool()
5. triage_symptoms_tool(symptoms_text="...")
6. log_call_tool(...)

Full Flow with Booking:

1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...")
3. verify_identity(dob="...", secret_answer="...")
4. get_patient_profile_tool()
5. triage_symptoms_tool(symptoms_text="...")
6. list_providers_tool()
7. get_provider_slots_tool(provider_id="...")
8. schedule_appointment_tool(provider_id="...", slot_iso="...")
9. get_preferred_pharmacy_tool()
10. log_call_tool(notes="...", triage_json="...")

Success Metrics

After improvements, you should see:

✅ 100% secret question ask rate (when MRN not provided)
✅ Zero skipped verifications
✅ Allergy mentions in every medication recommendation
✅ Proper urgent escalation for chest pain, severe symptoms
✅ Two-call pattern for secret question auth

Quick Test Script

# Test 1: John Marshall secret question
Agent: May I have your full name?
You: John Marshall
Agent: Please confirm your date of birth.
You: January 1st, 1960
Agent: For security, what is your favorite color?  ← MUST ASK THIS
You: Blue
Agent: Thank you, you're verified. What brings you in today?  ← MUST CONFIRM

# Test 2: Francesco with symptoms
You: Francesco Ciannella
Agent: Date of birth?
You: January 1st, 1990
Agent: For security, what city were you born in?  ← MUST ASK THIS
You: Rome
Agent: What's going on today?
You: I have a mild headache
Agent: [provides advice considering Lisinopril medication]  ← MUST MENTION MEDS

Reporting Issues

If tests fail, provide:

Patient name used
Exact conversation (user input + agent responses)
Expected behavior vs actual behavior
Log excerpt from app.log showing tool calls
Which test scenario from this guide