fciannella's picture
First functional multi thread
9312c3a

Healthcare Agent Testing Guide

This guide provides test scenarios to validate the improved tool calling behavior, especially the secret question authentication flow.

Test Patients

Two test patients are available in mock_data/patients.json:

Patient 1: John Marshall

  • Full Name: John Marshall
  • DOB: 1960-01-01 (January 1st, 1960)
  • MRN Last-4: 0001
  • Secret Question: "What is your favorite color?"
  • Secret Answer: blue
  • Allergies: Penicillin
  • Medications: Acetaminophen 500mg PRN

Patient 2: Francesco Ciannella

  • Full Name: Francesco Ciannella
  • DOB: 1990-01-01 (January 1st, 1990)
  • MRN Last-4: 6001
  • Secret Question: "What city were you born in?"
  • Secret Answer: rome
  • Allergies: NKA (No Known Allergies)
  • Medications: Lisinopril 10mg daily

Test Scenarios

βœ… Test 1: Secret Question Flow (MOST IMPORTANT)

Purpose: Verify the agent correctly handles the two-step secret question authentication.

Steps:

  1. Say: "My name is John Marshall"
  2. Agent should call find_patient() and greet you
  3. Say: "January 1st, 1960" (DOB)
  4. Agent should call verify_identity(dob=...)
  5. EXPECTED: Agent should ASK: "For security, what is your favorite color?"
  6. Say: "Blue"
  7. Agent should call verify_identity(dob=..., secret_answer="Blue")
  8. EXPECTED: Agent confirms verification and asks about symptoms

Success Criteria:

  • βœ… Agent explicitly asks the secret question
  • βœ… Agent waits for your answer before proceeding
  • βœ… Agent confirms "You're verified" after correct answer
  • βœ… Agent does NOT ask for MRN last-4

Common Failure Modes (Old Behavior):

  • ❌ Agent skips asking the secret question
  • ❌ Agent asks for MRN last-4 instead of secret question
  • ❌ Agent claims verification before getting secret answer

βœ… Test 2: Wrong Secret Answer

Purpose: Verify agent handles incorrect secret answers gracefully.

Steps:

  1. Say: "My name is John Marshall"
  2. Say: "January 1st, 1960" (DOB)
  3. Agent asks: "What is your favorite color?"
  4. Say: "Red" (WRONG - correct is "blue")

Expected Behavior:

  • βœ… Agent says verification failed
  • βœ… Agent asks for MRN last-4 as alternative: "Could you provide the last 4 digits of your MRN?"
  • βœ… Agent does NOT proceed to medical questions

βœ… Test 3: MRN Last-4 Path (Alternative Auth)

Purpose: Verify agent accepts MRN last-4 without asking secret question.

Steps:

  1. Say: "My name is John Marshall"
  2. Say: "January 1st, 1960, and MRN last-4 is 0001"

Expected Behavior:

  • βœ… Agent calls verify_identity(dob=..., mrn_last4="0001")
  • βœ… Agent confirms verification immediately
  • βœ… Agent does NOT ask secret question (since MRN provided)

βœ… Test 4: Allergy Awareness

Purpose: Verify agent considers allergies before recommending medications.

Steps:

  1. Complete authentication as John Marshall
  2. Say: "I have a headache and a sore throat"
  3. Agent calls get_patient_profile_tool() and triage_symptoms_tool()

Expected Behavior:

  • βœ… Agent mentions: "You have a penicillin allergy"
  • βœ… Agent recommends Acetaminophen (which patient already has)
  • βœ… Agent does NOT recommend penicillin-based antibiotics
  • βœ… Agent references existing medication: "Since you're already taking acetaminophen as needed..."

βœ… Test 5: Urgent Triage (Chest Pain)

Purpose: Verify agent correctly escalates urgent symptoms.

Steps:

  1. Complete authentication as Francesco Ciannella
  2. Say: "I'm having severe chest pain and shortness of breath"

Expected Behavior:

  • βœ… Agent calls triage_symptoms_tool(symptoms_text="severe chest pain and shortness of breath")
  • βœ… Returns: {risk: "urgent", red_flags: ["chest pain"]}
  • βœ… Agent immediately says: "Chest pain can be serious. Please call 911 now or go to the nearest emergency room."
  • βœ… Agent does NOT offer to book a regular appointment
  • βœ… Agent emphasizes urgency

βœ… Test 6: Self-Care with Appointment Booking

Purpose: Verify full flow for non-urgent symptoms with appointment.

Steps:

  1. Complete authentication as John Marshall
  2. Say: "I have a mild headache and feel tired, but no fever, no neck stiffness"
  3. Agent triages and provides self-care advice
  4. Say: "Yes, I'd like to schedule an appointment"

Expected Behavior:

  • βœ… Agent calls triage_symptoms_tool() β†’ returns risk: "self_care"
  • βœ… Agent provides advice: "Try rest, hydration, and acetaminophen as directed"
  • βœ… Agent offers appointment: "Would you like a telehealth appointment?"
  • βœ… Agent calls list_providers_tool() β†’ presents options
  • βœ… Agent calls get_provider_slots_tool() β†’ shows times in friendly format
  • βœ… After you choose, agent calls schedule_appointment_tool()
  • βœ… Agent confirms: "Booked. I'll send details to your phone ending in 0101."
  • βœ… Agent calls get_preferred_pharmacy_tool() and confirms pharmacy
  • βœ… At end, agent calls log_call_tool() silently

βœ… Test 7: Francesco Ciannella Secret Question

Purpose: Test second patient with different secret question.

Steps:

  1. Say: "My name is Francesco Ciannella"
  2. Say: "January 1st, 1990"
  3. EXPECTED: Agent asks "For security, what city were you born in?"
  4. Say: "Rome"

Expected Behavior:

  • βœ… Agent asks the correct secret question for Francesco
  • βœ… Agent accepts "Rome" (case-insensitive)
  • βœ… Agent confirms verification

βœ… Test 8: Profile Not Found

Purpose: Verify agent handles unknown patients gracefully.

Steps:

  1. Say: "My name is Jane Doe"

Expected Behavior:

  • βœ… Agent calls find_patient(full_name="Jane Doe")
  • βœ… Returns: {}
  • βœ… Agent says: "I'm not finding you in our system. Could you verify the spelling of your name?"
  • βœ… Agent does NOT proceed to verification without patient_id

Debugging Tips

Check Logs

Look for these log entries in app.log:

INFO - LLM tool_calls: ['find_patient']
INFO - tool find_patient result: {"patient_id": "pt_jmarshall", ...}
INFO - LLM tool_calls: ['verify_identity']
INFO - verify_identity: verified=False needs=['mrn_last4_or_secret']
INFO - LLM content: For security, what is your favorite color?
INFO - verify_identity: verified=True needs=[]
INFO - LLM tool_calls: ['get_patient_profile_tool']

Common Issues

Issue: Agent doesn't ask secret question

  • Check: Tool docstring has 7-step CRITICAL SECRET QUESTION FLOW section
  • Check: System prompt has explicit secret question instructions in step 2

Issue: Agent asks for MRN instead of secret question

  • Fix: Ensure prompt says "If 'question' field is present: READ THE EXACT QUESTION"

Issue: Agent proceeds without verification

  • Check: Prompt has "NEVER claim verification until verified=true"
  • Check: Tool auto-injects patient_id

Expected Tool Call Sequences

Minimal Flow (MRN path):

1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...", mrn_last4="0001")
3. get_patient_profile_tool()
4. triage_symptoms_tool(symptoms_text="...")
5. log_call_tool(notes="...", triage_json="...")

Secret Question Flow:

1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...")                    ← First call
3. verify_identity(dob="...", secret_answer="...") ← Second call
4. get_patient_profile_tool()
5. triage_symptoms_tool(symptoms_text="...")
6. log_call_tool(...)

Full Flow with Booking:

1. find_patient(full_name="John Marshall")
2. verify_identity(dob="...")
3. verify_identity(dob="...", secret_answer="...")
4. get_patient_profile_tool()
5. triage_symptoms_tool(symptoms_text="...")
6. list_providers_tool()
7. get_provider_slots_tool(provider_id="...")
8. schedule_appointment_tool(provider_id="...", slot_iso="...")
9. get_preferred_pharmacy_tool()
10. log_call_tool(notes="...", triage_json="...")

Success Metrics

After improvements, you should see:

  • βœ… 100% secret question ask rate (when MRN not provided)
  • βœ… Zero skipped verifications
  • βœ… Allergy mentions in every medication recommendation
  • βœ… Proper urgent escalation for chest pain, severe symptoms
  • βœ… Two-call pattern for secret question auth

Quick Test Script

# Test 1: John Marshall secret question
Agent: May I have your full name?
You: John Marshall
Agent: Please confirm your date of birth.
You: January 1st, 1960
Agent: For security, what is your favorite color?  ← MUST ASK THIS
You: Blue
Agent: Thank you, you're verified. What brings you in today?  ← MUST CONFIRM

# Test 2: Francesco with symptoms
You: Francesco Ciannella
Agent: Date of birth?
You: January 1st, 1990
Agent: For security, what city were you born in?  ← MUST ASK THIS
You: Rome
Agent: What's going on today?
You: I have a mild headache
Agent: [provides advice considering Lisinopril medication]  ← MUST MENTION MEDS

Reporting Issues

If tests fail, provide:

  1. Patient name used
  2. Exact conversation (user input + agent responses)
  3. Expected behavior vs actual behavior
  4. Log excerpt from app.log showing tool calls
  5. Which test scenario from this guide