File size: 8,737 Bytes
712579e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# WARP.md

This file provides guidance to WARP (warp.dev) when working with code in this repository.

## Project Overview

**Wisal QA** is an advanced AI assistant specialized in Autism Spectrum Disorders (ASD), developed by Compumacy AI. The system uses sophisticated confidence scoring and relevance filtering to provide accurate, autism-focused responses while recognizing comorbid conditions and indirect relationships.

### Key Features

- Multi-tiered relevance assessment (0-100% confidence scoring)
- Comorbidity recognition (depression, anxiety, ADHD, sleep issues in autism context)
- Smart query rewriting with autism-specific context
- Multi-source information combining (web search, RAG, LLM generation)
- Real-time voice chat with transcription and TTS
- Document upload and Q&A functionality

## Development Commands

### Environment Setup

```bash
# Install dependencies
pip install -r requirements.txt

# Set up environment variables (copy from .env.example if available)
cp .env.example .env
# Edit .env with your API keys:
# - SILICONFLOW_API_KEY
# - GEMINI_API_KEY  
# - WEAVIATE_API_KEY
# - QDRANT_API_KEY
# - TAVILY_API_KEY
```

### Running the Application

```bash
# Main application (Gradio interface)
python main.py

# Alternative runner with pre-flight checks
python run_main.py

# Test functional components
python test.py
```

### Development and Testing

```bash
# Run single test file
python -m pytest test.py -v

# Test specific functionality
python -c "from utils import process_query; print(process_query('What is autism?'))"

# Test query processing pipeline
python -c "from query_utils import process_query_for_rewrite; print(process_query_for_rewrite('My child has depression'))"

# Functional RAG pipeline testing
python -c "from src.pipeline import create_rag_pipeline; pipeline = create_rag_pipeline(); print(pipeline('What is autism?', ['Autism is a developmental disorder...']))"
```

### Configuration Management

```bash
# View current configuration
cat config.yaml
cat src/config.yaml

# Test configuration loading
python -c "from src.config import load_config; print(load_config())"
```

### Logging and Debugging

```bash
# View recent logs
ls -la logs/
tail -f logs/log_*.txt

# Check logger functionality
python -c "from logger.custom_logger import CustomLoggerTracker; logger = CustomLoggerTracker().get_logger('test'); logger.info('Test message')"
```

## Architecture Overview

### High-Level System Design

The codebase follows a **multi-paradigm architecture** combining:

1. **Functional Programming** (src/ directory) - Pure functions, immutable data structures
2. **Traditional OOP** (root directory) - Gradio interface, handlers, utilities

### Core Components

#### 1. Confidence Scoring Pipeline (`query_utils.py`)

- **Enhanced relevance checking**: 6-tier confidence system (0-100%)
- **Smart rewriting**: Automatically frames questions in autism context
- **Comorbidity awareness**: Recognizes depression, anxiety, ADHD as autism-relevant

**Key Functions:**

- `enhanced_autism_relevance_check()` - Main confidence scoring
- `process_query_for_rewrite()` - Complete query processing pipeline
- `rewrite_query_for_autism()` - Context-aware query rewriting

#### 2. Multi-Source Processing Pipeline

The system combines three information sources:

- **Web Search** (`web_search.py`) - Real-time autism information via Tavily API
- **RAG Systems** (`rag.py`, `rag_domain_know_doc.py`) - Domain knowledge retrieval
- **LLM Generation** (`utils.py`) - Direct autism expertise via SiliconFlow/Qwen

#### 3. Functional RAG Architecture (`src/` directory)

Modern functional programming approach with:

- **Immutable data structures** (`@dataclass(frozen=True)`)
- **Pure functions** with consistent interfaces
- **Composable pipeline** (`src/pipeline.py`)
- **Model factories** (`src/models.py`) for API/local model abstraction

#### 4. Quality Control Layer

Multi-stage validation:

- **Pre-processing**: Query relevance filtering
- **Post-processing**: Answer autism-relevance checking  
- **Hallucination detection**: 5-point accuracy scoring
- **Translation support**: Auto-detect and translate responses

### Configuration Architecture

**Dual Configuration System:**

- `config.yaml` (root) - Application-level settings, API keys
- `src/config.yaml` - Functional pipeline configuration (models, chunking, performance)

**Model Support:**

- **API Models**: SiliconFlow, Gemini, Weaviate, Qdrant
- **Local Models**: Hugging Face transformers, sentence-transformers
- **Configurable switching** between API and local inference

### Key Architectural Patterns

#### 1. Confidence-Driven Processing

```python
# Query processing follows confidence scoring
if confidence_score >= 70:
    process_directly()
elif confidence_score >= 25:
    rewrite_for_autism_context()
else:
    politely_reject()
```

#### 2. Functional Composition

```python
# Pipeline composition in src/
pipeline = compose(
    chunk_documents,
    embed_chunks, 
    retrieve_similar_chunks,
    rerank_documents,
    generate_answer
)
```

#### 3. Multi-Modal Integration

- **Text Input/Output**: Primary interface
- **Voice Input**: Gemini transcription via WebRTC
- **Voice Output**: Gemini TTS with multiple voice options
- **Document Processing**: PDF, DOCX, TXT support

## Important Implementation Details

### Confidence Scoring Thresholds

```python
DIRECT_AUTISM_THRESHOLD = 85      # Accept as-is
HIGH_RELEVANCE_THRESHOLD = 70     # Accept as-is  
SIGNIFICANT_THRESHOLD = 55        # Rewrite for autism
MODERATE_THRESHOLD = 40           # Rewrite for autism
SOMEWHAT_THRESHOLD = 25           # Conditional rewrite
REJECTION_THRESHOLD = 24          # Reject
```

### Comorbidity Recognition Logic

The system specifically boosts scores for:

- **Depression in children/teens**: +15 points (65-75% final score)
- **Anxiety disorders**: +15 points
- **ADHD symptoms**: +15 points
- **Sleep disorders**: +15 points
- **Sensory processing**: +20 points

### Model Configuration Patterns

```python
# API vs Local model switching
if model_config.type == ModelType.API:
    return create_api_model(config)
else:
    return create_local_model(config)
```

### Error Handling Strategy

- **Graceful degradation**: Fallback to simpler models/methods
- **Comprehensive logging**: All failures logged with context
- **User-friendly messages**: Technical errors translated to helpful responses

## Development Guidelines

### Working with Confidence Scoring

- **Test edge cases**: Borderline queries (scores 25-75)
- **Validate comorbidity detection**: Depression/anxiety in autism context
- **Monitor false positives/negatives**: Use logging to track decision quality

### Adding New Features

1. **Functional approach**: Add to `src/` directory for pipeline components
2. **Integration**: Use existing confidence scoring for relevance checking
3. **Logging**: Integrate with `CustomLoggerTracker` for consistency
4. **Configuration**: Add settings to appropriate config.yaml

### Model Integration

- **API models**: Add to model factory in `src/models.py`
- **Local models**: Ensure HuggingFace compatibility
- **Configuration**: Update model configs in `src/config.yaml`

### Testing Autism Relevance

```python
# Test confidence scoring
from query_utils import enhanced_autism_relevance_check
result = enhanced_autism_relevance_check("My teenager seems depressed")
# Expected: score=65, action="rewrite_for_autism"
```

### Audio/Voice Features

- **WebRTC integration**: Real-time voice chat via `fastrtc`
- **Gemini STT/TTS**: Voice input/output processing
- **VAD (Voice Activity Detection)**: Automatic speech detection

## Common Development Patterns

### Adding a New Information Source

1. Create async function in dedicated module
2. Integrate with reranking system in `utils.py`
3. Add to multi-source processing pipeline
4. Update confidence thresholds if needed

### Extending Comorbidity Recognition  

1. Update confidence scoring prompts in `prompt_template.py`
2. Add condition-specific scoring logic in `query_utils.py`
3. Test with representative queries
4. Update documentation with new thresholds

### Document Processing Workflow

1. Upload via Gradio interface
2. Route to appropriate handler (`old_docs.py`, `rag_domain_know_doc.py`, `user_specific_documents.py`)
3. Chunk and embed using functional pipeline
4. Store in vector database (Weaviate/Qdrant)
5. Integrate with RAG retrieval

This codebase represents a sophisticated autism-focused AI system with strong architectural foundations for both traditional and functional programming paradigms. The confidence scoring system and comorbidity awareness are key differentiators that should be preserved and extended carefully.