care-notes / README.md
Akis Giannoukos
Updated Readme
fae1128
|
raw
history blame
3.76 kB
metadata
title: Conversational Assessment for Responsive Engagement (CARE) Notes
emoji: 🐢
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: AI-driven conversational module for depression-triage

PHQ-9 Clinician Agent (Voice-first)

A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration.

What it does

  • Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire).
  • Live inference of PHQ‑9 item scores, confidences, total score, and severity.
  • Automatic stop when minimum confidence across items reaches a threshold or risk is detected.
  • Optional TTS playback for clinician responses.

UI overview

  • Main tab: Large circular mic “Record” bubble
    • Tap to start, tap again to stop (processing runs on stop)
    • While speaking back (TTS), the bubble shows a speaking state
  • Chat tab: Plain chat transcript (for reviewing turns)
  • Advanced tab:
    • PHQ‑9 Assessment JSON (live)
    • Severity label
    • Confidence threshold slider (τ)
    • Toggle: Speak clinician responses (TTS)
    • Model ID textbox and “Apply model” button

Quick start (local)

  1. Python 3.10+ recommended.
  2. Install deps:
    pip install -r requirements.txt
    
  3. Run the app:
    python app.py
    
  4. Open the URL shown in the console (defaults to http://0.0.0.0:7860). Allow microphone access in your browser.

Configuration

Environment variables (all optional):

  • LLM_MODEL_ID (default google/gemma-2-2b-it): chat model id
  • ASR_MODEL_ID (default openai/whisper-tiny.en): speech-to-text model id
  • CONFIDENCE_THRESHOLD (default 0.8): stop when min item confidence ≥ τ
  • MAX_TURNS (default 12): hard stop cap
  • USE_TTS (default true): enable TTS playback
  • MODEL_CONFIG_PATH (default model_config.json): persisted model id
  • PORT (default 7860): server port

Notes:

  • If a GPU is available, the app will use it automatically for Transformers pipelines.
  • Changing the model in Advanced will reload the text-generation pipeline on the next turn.

How to use

  1. Go to Main and tap the mic bubble. Speak naturally.
  2. Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it.
  3. The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later.

Troubleshooting

  • No mic input detected:
    • Ensure the site has microphone permission in your browser settings.
    • Try refreshing the page after granting permission.
  • Can’t hear TTS:
    • Enable the “Speak clinician responses (TTS)” toggle in Advanced.
    • Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work.
  • Model download slow or fails:
    • Check internet connectivity and try again. Some models are large.
  • Assessment doesn’t stop:
    • Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (MAX_TURNS).

Safety

This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.).

Development notes

  • Framework: Gradio Blocks
  • ASR: Transformers pipeline (Whisper)
  • TTS: gTTS
  • Prosody features: librosa (lightweight proxies) for the scoring prompt

PRs and experiments are welcome. This is a research prototype and not a clinical tool.