Spaces:
Running
Healthcare Agent Testing Guide
This guide provides test scenarios to validate the improved tool calling behavior, especially the secret question authentication flow.
Test Patients
Two test patients are available in mock_data/patients.json:
Patient 1: John Marshall
- Full Name: John Marshall
- DOB: 1960-01-01 (January 1st, 1960)
- MRN Last-4: 0001
- Secret Question: "What is your favorite color?"
- Secret Answer: blue
- Allergies: Penicillin
- Medications: Acetaminophen 500mg PRN
Patient 2: Francesco Ciannella
- Full Name: Francesco Ciannella
- DOB: 1990-01-01 (January 1st, 1990)
- MRN Last-4: 6001
- Secret Question: "What city were you born in?"
- Secret Answer: rome
- Allergies: NKA (No Known Allergies)
- Medications: Lisinopril 10mg daily
Test Scenarios
β Test 1: Secret Question Flow (MOST IMPORTANT)
Purpose: Verify the agent correctly handles the two-step secret question authentication.
Steps:
- Say: "My name is John Marshall"
- Agent should call
find_patient()and greet you - Say: "January 1st, 1960" (DOB)
- Agent should call
verify_identity(dob=...) - EXPECTED: Agent should ASK: "For security, what is your favorite color?"
- Say: "Blue"
- Agent should call
verify_identity(dob=..., secret_answer="Blue") - EXPECTED: Agent confirms verification and asks about symptoms
Success Criteria:
- β Agent explicitly asks the secret question
- β Agent waits for your answer before proceeding
- β Agent confirms "You're verified" after correct answer
- β Agent does NOT ask for MRN last-4
Common Failure Modes (Old Behavior):
- β Agent skips asking the secret question
- β Agent asks for MRN last-4 instead of secret question
- β Agent claims verification before getting secret answer
β Test 2: Wrong Secret Answer
Purpose: Verify agent handles incorrect secret answers gracefully.
Steps:
- Say: "My name is John Marshall"
- Say: "January 1st, 1960" (DOB)
- Agent asks: "What is your favorite color?"
- Say: "Red" (WRONG - correct is "blue")
Expected Behavior:
- β Agent says verification failed
- β Agent asks for MRN last-4 as alternative: "Could you provide the last 4 digits of your MRN?"
- β Agent does NOT proceed to medical questions
β Test 3: MRN Last-4 Path (Alternative Auth)
Purpose: Verify agent accepts MRN last-4 without asking secret question.
Steps:
- Say: "My name is John Marshall"
- Say: "January 1st, 1960, and MRN last-4 is 0001"
Expected Behavior:
- β
Agent calls
verify_identity(dob=..., mrn_last4="0001") - β Agent confirms verification immediately
- β Agent does NOT ask secret question (since MRN provided)
β Test 4: Allergy Awareness
Purpose: Verify agent considers allergies before recommending medications.
Steps:
- Complete authentication as John Marshall
- Say: "I have a headache and a sore throat"
- Agent calls
get_patient_profile_tool()andtriage_symptoms_tool()
Expected Behavior:
- β Agent mentions: "You have a penicillin allergy"
- β Agent recommends Acetaminophen (which patient already has)
- β Agent does NOT recommend penicillin-based antibiotics
- β Agent references existing medication: "Since you're already taking acetaminophen as needed..."
β Test 5: Urgent Triage (Chest Pain)
Purpose: Verify agent correctly escalates urgent symptoms.
Steps:
- Complete authentication as Francesco Ciannella
- Say: "I'm having severe chest pain and shortness of breath"
Expected Behavior:
- β
Agent calls
triage_symptoms_tool(symptoms_text="severe chest pain and shortness of breath") - β
Returns:
{risk: "urgent", red_flags: ["chest pain"]} - β Agent immediately says: "Chest pain can be serious. Please call 911 now or go to the nearest emergency room."
- β Agent does NOT offer to book a regular appointment
- β Agent emphasizes urgency
β Test 6: Self-Care with Appointment Booking
Purpose: Verify full flow for non-urgent symptoms with appointment.
Steps:
- Complete authentication as John Marshall
- Say: "I have a mild headache and feel tired, but no fever, no neck stiffness"
- Agent triages and provides self-care advice
- Say: "Yes, I'd like to schedule an appointment"
Expected Behavior:
- β
Agent calls
triage_symptoms_tool()β returnsrisk: "self_care" - β Agent provides advice: "Try rest, hydration, and acetaminophen as directed"
- β Agent offers appointment: "Would you like a telehealth appointment?"
- β
Agent calls
list_providers_tool()β presents options - β
Agent calls
get_provider_slots_tool()β shows times in friendly format - β
After you choose, agent calls
schedule_appointment_tool() - β Agent confirms: "Booked. I'll send details to your phone ending in 0101."
- β
Agent calls
get_preferred_pharmacy_tool()and confirms pharmacy - β
At end, agent calls
log_call_tool()silently
β Test 7: Francesco Ciannella Secret Question
Purpose: Test second patient with different secret question.
Steps:
- Say: "My name is Francesco Ciannella"
- Say: "January 1st, 1990"
- EXPECTED: Agent asks "For security, what city were you born in?"
- Say: "Rome"
Expected Behavior:
- β Agent asks the correct secret question for Francesco
- β Agent accepts "Rome" (case-insensitive)
- β Agent confirms verification
β Test 8: Profile Not Found
Purpose: Verify agent handles unknown patients gracefully.
Steps:
- Say: "My name is Jane Doe"
Expected Behavior:
- β
Agent calls
find_patient(full_name="Jane Doe") - β
Returns:
{} - β Agent says: "I'm not finding you in our system. Could you verify the spelling of your name?"
- β Agent does NOT proceed to verification without patient_id
Debugging Tips
Check Logs
Look for these log entries in app.log:
INFO - LLM tool_calls: ['find_patient']
INFO - tool find_patient result: {"patient_id": "pt_jmarshall", ...}
INFO - LLM tool_calls: ['verify_identity']
INFO - verify_identity: verified=False needs=['mrn_last4_or_secret']
INFO - LLM content: For security, what is your favorite color?
INFO - verify_identity: verified=True needs=[]
INFO - LLM tool_calls: ['get_patient_profile_tool']
Common Issues
Issue: Agent doesn't ask secret question
- Check: Tool docstring has 7-step CRITICAL SECRET QUESTION FLOW section
- Check: System prompt has explicit secret question instructions in step 2
Issue: Agent asks for MRN instead of secret question
- Fix: Ensure prompt says "If 'question' field is present: READ THE EXACT QUESTION"
Issue: Agent proceeds without verification
- Check: Prompt has "NEVER claim verification until verified=true"
- Check: Tool auto-injects patient_id
Expected Tool Call Sequences
Minimal Flow (MRN path):
1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...", mrn_last4="0001")
3. get_patient_profile_tool()
4. triage_symptoms_tool(symptoms_text="...")
5. log_call_tool(notes="...", triage_json="...")
Secret Question Flow:
1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...") β First call
3. verify_identity(dob="...", secret_answer="...") β Second call
4. get_patient_profile_tool()
5. triage_symptoms_tool(symptoms_text="...")
6. log_call_tool(...)
Full Flow with Booking:
1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...")
3. verify_identity(dob="...", secret_answer="...")
4. get_patient_profile_tool()
5. triage_symptoms_tool(symptoms_text="...")
6. list_providers_tool()
7. get_provider_slots_tool(provider_id="...")
8. schedule_appointment_tool(provider_id="...", slot_iso="...")
9. get_preferred_pharmacy_tool()
10. log_call_tool(notes="...", triage_json="...")
Success Metrics
After improvements, you should see:
- β 100% secret question ask rate (when MRN not provided)
- β Zero skipped verifications
- β Allergy mentions in every medication recommendation
- β Proper urgent escalation for chest pain, severe symptoms
- β Two-call pattern for secret question auth
Quick Test Script
# Test 1: John Marshall secret question
Agent: May I have your full name?
You: John Marshall
Agent: Please confirm your date of birth.
You: January 1st, 1960
Agent: For security, what is your favorite color? β MUST ASK THIS
You: Blue
Agent: Thank you, you're verified. What brings you in today? β MUST CONFIRM
# Test 2: Francesco with symptoms
You: Francesco Ciannella
Agent: Date of birth?
You: January 1st, 1990
Agent: For security, what city were you born in? β MUST ASK THIS
You: Rome
Agent: What's going on today?
You: I have a mild headache
Agent: [provides advice considering Lisinopril medication] β MUST MENTION MEDS
Reporting Issues
If tests fail, provide:
- Patient name used
- Exact conversation (user input + agent responses)
- Expected behavior vs actual behavior
- Log excerpt from
app.logshowing tool calls - Which test scenario from this guide