Spaces:

akisg
/

care-notes

Sleeping

App Files Files Community

Akis Giannoukos commited on 28 days ago

Commit

ada1ece

1 Parent(s): 5731404

changed heading and removed Apply model and restart button

Browse files

Files changed (2) hide show

README.md +15 -12
app.py +3 -3

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ Stop automatically when confidence in all PHQ-9 items is sufficiently high.
 Produce a final PHQ-9 severity report.
-The system will use MedGemma-4B-IT (instruction-tuned medical LLM) as the base model for both:
 -A Recording Agent (conversational component)
@@ -44,11 +44,11 @@ The system will use MedGemma-4B-IT (instruction-tuned medical LLM) as the base m
 Component	Description
 -Frontend Client:	Handles user interaction, voice input/output, and UI display.
 -Speech I/O Module:	Converts speech to text (ASR) and text to speech (TTS).
--Feature Extraction Module:	Extracts acoustic and prosodic features via OpenSmile for emotional/speech analysis.
 -Recording Agent (Chatbot):	Conducts clinician-like conversation with adaptive questioning.
 -Scoring Agent:	Evaluates PHQ-9 symptom probabilities after each exchange and determines confidence in final diagnosis.
 Controller / Orchestrator:	Manages communication between agents and triggers scoring cycles.
-Model Backend:	Hosts MedGemma-4B-IT, fine-tuned or prompted for clinician reasoning.
 2.2 Architecture Diagram (Text Description)
  ┌───────────────────────┐
@@ -68,8 +68,8 @@ Model Backend:	Hosts MedGemma-4B-IT, fine-tuned or prompted for clinician reason
            │
            ▼
  ┌────────────────────────────┐
- │ Feature Extraction Module  │
- │ - OpenSmile (prosody, pitch)│
  └─────────┬──────────────────┘
            │
            ▼
@@ -106,7 +106,7 @@ Maintain conversational context.
 Adapt follow-up questions based on inferred patient state.
-Produce text responses using MedGemma-4B-IT with a clinician-style prompt template.
 After each user response, trigger the Scoring Agent to reassess.
@@ -143,7 +143,10 @@ Parse the full transcript and extract statements relevant to each PHQ-9 item.
 Combine textual cues + acoustic cues.
-Use MedGemma’s reasoning chain to map features to PHQ-9 scores.
 When confidence for all ≥ threshold τ (e.g., 0.8), finalize results and signal termination.
@@ -153,7 +156,7 @@ User speaks → Audio captured.
 ASR transcribes text.
-OpenSmile extracts voice features.
 Recording Agent uses transcript (and optionally summarized features) → next conversational message.
@@ -167,13 +170,13 @@ TTS module vocalizes clinician output.
 5.1 Models and Libraries
 Function	Tool / Library
-Base LLM	MedGemma-4B-IT (from Hugging Face)
 Whisper
 gTTS (preferrably), TTS	Coqui TTS, gTTS, or Bark
-Audio Features	OpenSmile (IS09/ComParE configs)
-Backend	Python / FastAPI server
 Frontend	Gradio
-Communication	WebSocket or REST APIs
 5.2 Confidence Computation

 Produce a final PHQ-9 severity report.
+The system will use a configurable LLM (e.g., Gemma-2-2B-IT or MedGemma-4B-IT) as the base model for both:
 -A Recording Agent (conversational component)
 Component	Description
 -Frontend Client:	Handles user interaction, voice input/output, and UI display.
 -Speech I/O Module:	Converts speech to text (ASR) and text to speech (TTS).
+-Feature Extraction Module:	Extracts acoustic and prosodic features via librosa (lightweight prosody proxies) for emotional/speech analysis.
 -Recording Agent (Chatbot):	Conducts clinician-like conversation with adaptive questioning.
 -Scoring Agent:	Evaluates PHQ-9 symptom probabilities after each exchange and determines confidence in final diagnosis.
 Controller / Orchestrator:	Manages communication between agents and triggers scoring cycles.
+Model Backend:	Hosts a configurable LLM (e.g., Gemma-2-2B-IT, MedGemma-4B-IT), prompted for clinician reasoning.
 2.2 Architecture Diagram (Text Description)
  ┌───────────────────────┐
            │
            ▼
  ┌────────────────────────────┐
+│ Feature Extraction Module  │
+│ - librosa (prosody pitch, energy/loudness, timing/phonation)│
  └─────────┬──────────────────┘
            │
            ▼
 Adapt follow-up questions based on inferred patient state.
+Produce text responses using a configurable LLM (e.g. Gemma-2-2B-IT, MedGemma-4B-IT) with a clinician-style prompt template.
 After each user response, trigger the Scoring Agent to reassess.
 Combine textual cues + acoustic cues.
+Fusion mechanism: Acoustic features are summarized into a compact JSON and included in the scoring prompt alongside the transcript (early, prompt-level fusion).
+Use the LLM’s reasoning chain to map features to PHQ-9 scores.
 When confidence for all ≥ threshold τ (e.g., 0.8), finalize results and signal termination.
 ASR transcribes text.
+librosa/OpenSmile extracts voice features (prosody proxies).
 Recording Agent uses transcript (and optionally summarized features) → next conversational message.
 5.1 Models and Libraries
 Function	Tool / Library
+Base LLM	Configurable (e.g. Gemma-2-2B-IT, MedGemma-4B-IT)
 Whisper
 gTTS (preferrably), TTS	Coqui TTS, gTTS, or Bark
+Audio Features	librosa (RMS, ZCR, spectral centroid, f0, energy, duration)
+Backend	Python / Gradio (Spaces)
 Frontend	Gradio
+Communication	Gradio UI
 5.2 Confidence Computation

app.py CHANGED Viewed

@@ -579,8 +579,8 @@ def create_demo():
     with gr.Blocks(theme=gr.themes.Soft()) as demo:
         gr.Markdown(
             """
-            ### PHQ-9 Conversational Clinician Agent
-            Engage in a brief, empathetic conversation. Your audio is transcribed, analyzed, and used to infer PHQ-9 scores.
             The system stops when confidence is high enough or any safety risk is detected. It does not provide therapy or emergency counseling.
             """
         )
@@ -596,7 +596,7 @@ def create_demo():
                 model_id_tb = gr.Textbox(value=current_model_id, label="Chat Model ID", info="e.g., google/gemma-2-2b-it or google/medgemma-4b-it")
                 with gr.Row():
                     apply_model_btn = gr.Button("Apply model (no restart)")
-                    apply_model_restart_btn = gr.Button("Apply model and restart")
                 model_status = gr.Markdown(value=f"Current model: `{current_model_id}`")
         with gr.Row():

     with gr.Blocks(theme=gr.themes.Soft()) as demo:
         gr.Markdown(
             """
+            ### Conversational Assessment for Responsive Engagement (CARE) Notes
+            Engage in a brief conversation. Your audio is transcribed, analyzed, and used to infer PHQ-9 scores.
             The system stops when confidence is high enough or any safety risk is detected. It does not provide therapy or emergency counseling.
             """
         )
                 model_id_tb = gr.Textbox(value=current_model_id, label="Chat Model ID", info="e.g., google/gemma-2-2b-it or google/medgemma-4b-it")
                 with gr.Row():
                     apply_model_btn = gr.Button("Apply model (no restart)")
+                    # apply_model_restart_btn = gr.Button("Apply model and restart")
                 model_status = gr.Markdown(value=f"Current model: `{current_model_id}`")
         with gr.Row():