File size: 3,758 Bytes
8991737
fae1128
8991737
 
 
 
 
 
 
fae1128
8991737
 
2e9e60e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: Conversational Assessment for Responsive Engagement (CARE) Notes
emoji: 🐢
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: AI-driven conversational module for depression-triage
---

# PHQ-9 Clinician Agent (Voice-first)

A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration.

## What it does
- Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire).
- Live inference of PHQ‑9 item scores, confidences, total score, and severity.
- Automatic stop when minimum confidence across items reaches a threshold or risk is detected.
- Optional TTS playback for clinician responses.

## UI overview
- Main tab: Large circular mic “Record” bubble
  - Tap to start, tap again to stop (processing runs on stop)
  - While speaking back (TTS), the bubble shows a speaking state
- Chat tab: Plain chat transcript (for reviewing turns)
- Advanced tab:
  - PHQ‑9 Assessment JSON (live)
  - Severity label
  - Confidence threshold slider (τ)
  - Toggle: Speak clinician responses (TTS)
  - Model ID textbox and “Apply model” button

## Quick start (local)
1. Python 3.10+ recommended.
2. Install deps:
   ```bash
   pip install -r requirements.txt
   ```
3. Run the app:
   ```bash
   python app.py
   ```
4. Open the URL shown in the console (defaults to `http://0.0.0.0:7860`). Allow microphone access in your browser.

## Configuration
Environment variables (all optional):
- `LLM_MODEL_ID` (default `google/gemma-2-2b-it`): chat model id
- `ASR_MODEL_ID` (default `openai/whisper-tiny.en`): speech-to-text model id
- `CONFIDENCE_THRESHOLD` (default `0.8`): stop when min item confidence ≥ τ
- `MAX_TURNS` (default `12`): hard stop cap
- `USE_TTS` (default `true`): enable TTS playback
- `MODEL_CONFIG_PATH` (default `model_config.json`): persisted model id
- `PORT` (default `7860`): server port

Notes:
- If a GPU is available, the app will use it automatically for Transformers pipelines.
- Changing the model in Advanced will reload the text-generation pipeline on the next turn.

## How to use
1. Go to Main and tap the mic bubble. Speak naturally.
2. Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it.
3. The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later.

## Troubleshooting
- No mic input detected:
  - Ensure the site has microphone permission in your browser settings.
  - Try refreshing the page after granting permission.
- Can’t hear TTS:
  - Enable the “Speak clinician responses (TTS)” toggle in Advanced.
  - Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work.
- Model download slow or fails:
  - Check internet connectivity and try again. Some models are large.
- Assessment doesn’t stop:
  - Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (`MAX_TURNS`).

## Safety
This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.).

## Development notes
- Framework: Gradio Blocks
- ASR: Transformers pipeline (Whisper)
- TTS: gTTS
- Prosody features: librosa (lightweight proxies) for the scoring prompt

PRs and experiments are welcome. This is a research prototype and not a clinical tool.