Spaces:
Running
on
Zero
Running
on
Zero
File size: 13,160 Bytes
27db51f e3de51c 0496ab5 27db51f 0496ab5 27db51f 0496ab5 27db51f 0496ab5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
---
title: Gprmax Support
emoji: 👀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: true
---
# gprMax AI Support Assistant (GSoC 2025)
**What it is:** a small web app that helps people write gprMax `.in` files, understand commands, and troubleshoot simulations in a simple chat UI.
**Why it matters:** new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed.
**Live demo:** [Gprmax Support - a Hugging Face Space by jfang](https://huggingface.co/spaces/jfang/gprmax-support-gsoc25)
**Main model used by the app:** `jfang/gprmax-ft-Qwen3-4B-Instruct`. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency.
---
## What I built (GSoC progress)
- **Fine‑tuned model for gprMax**. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads `jfang/gprmax-ft-Qwen3-4B-Instruct`.
- **RAG (Retrieval‑Augmented Generation)** on top of the official gprMax documentation. On first run, the app clones the repo, chunks `/docs` files, and creates a **persistent ChromaDB** store. Then the model can “call a tool” to search docs and show sources.
- **Friendly UI** with Gradio: left side is chat; right side has two collapsible panels: **AI Thinking Process** and **Documentation Sources**. There are also **Settings** so people can tune temperature, max tokens, etc.
- **Reproducible fine‑tuning recipe** with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run.
- **Model Zoo (finetuned weights)**: I trained several variants and organized them here:
[https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)
> The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts.
---
## Quick start
### 1) Use it online (Hugging Face Space)
1. Open the Space.
2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file.
3. Check the right panels:
- **AI Thinking Process** shows the model’s step‑by‑step reasoning (what it’s thinking).
- **Documentation Sources** shows the retriever’s citations and short previews.
> The Space wraps generation with `@spaces.GPU(duration=60)` to keep GPU usage small and predictable.
### 2) Run it locally
```bash
pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces"
gradio app.py
```
- First run: if the vector DB is missing, the app will **auto‑build** it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.”
- The database is **persistent** (on disk), so later runs are faster. The builder stores a `metadata.json` with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default).
---
## Using the app (what to try)
Ask things like:
- “How do I create a basic gprMax input file for a simple GPR simulation?”
- “What’s the difference between `#domain` and `#dx_dy_dz`?”
- “How do I add a Ricker wavelet source?”
- “My simulation is taking too long—any tips to speed it up?”
- “How do I model a soil with different dielectric properties?”
When the model needs context, it emits a small JSON “tool call” to **search_documentation**. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets.
---
## Design principles (in simple terms)
- **Keep it modular.** Model, retriever, and UI are separate pieces. We can upgrade any part later.
- **Ground answers in docs.** The model can look things up and show sources, not just “guess.”
- **Make it light.** A 4B model plus a local vector DB runs on modest hardware and fits on Spaces.
- **Be transparent.** Show what the model is thinking and where facts come from.
- **Future‑proof.** Rebuild the DB when docs change; swap in new models or embeddings later.
---
## Architecture (at a glance)
```
User ↔ Gradio Chat UI
│
▼
Transformers (Qwen3‑4B fine‑tuned) → streams text + <think> ... </think>
│
(optional tool call as JSON)
▼
search_documentation(query)
│
▼
GprMaxRAGRetriever ── ChromaDB (persistent on disk)
│ │
▼ ▼
gprMax docs (cloned → chunked → indexed)
```
- **Model loading & streaming.** The app uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. The generator splits `<think>…</think>` into a separate “AI Thinking Process” pane.
- **Tool calling.** The system prompt describes a `search_documentation` tool and the exact JSON format for calling it.
- **RAG database.** The builder clones the official `gprMax` repo, reads `/docs` (`.rst`, `.md`, `.txt`), chunks with **size 1000 / overlap 200**, and stores to a **ChromaDB** collection named `gprmax_docs_v1`. Metadata includes `embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)"`.
- **Retriever.** Uses a persistent Chroma client and queries via `query_texts`. Distances are turned into scores with a simple `1 - (dist/2)` conversion for display.
---
## Technical choices (frameworks and why)
- **Transformers** to load and run the fine‑tuned Qwen 4B model, with `device_map="auto"` and `trust_remote_code=True`. This keeps the code short and makes GPU/CPU selection automatic.
- **Gradio** for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend.
- **ChromaDB** for a simple, persistent vector store that ships with the app. No external service is required.
- **GitPython + tqdm** to clone gprMax docs and show progress when building the DB.
---
## Reproducible fine‑tuning (LoRA / PEFT)
This is the core of the work. Below is **exactly** how the 4B model was trained and how someone else can redo it.
### What I trained
- **Base model:** `Qwen/Qwen3-4B` (using the Qwen3 chat template).
- **Method:** LoRA adapters (**rank=8**, **alpha=16**, **dropout=0.0**) applied to attention and MLP projection layers.
- **Outputs:** adapters + merged weights; the app uses the merged variant `jfang/gprmax-ft-Qwen3-4B-Instruct`.
- **Other models I trained:** see my collection:
[https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)
### Exact config used (YAML)
```yaml
bf16: true
cutoff_len: 2048
dataset: gpr-train
dataset_dir: data
ddp_timeout: 180000000
do_train: true
enable_thinking: true
finetuning_type: lora
flash_attn: auto
gradient_accumulation_steps: 8
include_num_input_tokens_seen: true
learning_rate: 5.0e-05
logging_steps: 5
lora_alpha: 16
lora_dropout: 0
lora_rank: 8
lora_target: all
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_samples: 100000
model_name_or_path: Qwen/Qwen3-4B
num_train_epochs: 2.0
optim: adamw_torch
output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27
packing: false
per_device_train_batch_size: 4
plot_loss: true
preprocessing_num_workers: 16
report_to: none
save_steps: 100
stage: sft
template: qwen3
trust_remote_code: true
warmup_steps: 0
```
**Metrics reported (4B run):**
```json
{
"epoch": 2.0,
"num_input_tokens_seen": 48562016,
"total_flos": 1.0635160197775688e+18,
"train_loss": 0.3312762507200241,
"train_runtime": 16760.735,
"train_samples_per_second": 1.909,
"train_steps_per_second": 0.06
}
```
**loss curve**
![[training_loss.png]]
### Path A — Simple HF/PEFT training script
```python
# train_lora_peft.py
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from trl import SFTTrainer
from peft import LoraConfig
BASE = "Qwen/Qwen3-4B"
tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
tok.padding_side = "right"
if tok.pad_token is None:
tok.pad_token = tok.eos_token
ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"})
def to_text(ex):
return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)}
ds = ds.map(to_text, remove_columns=ds["train"].column_names)
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True)
peft_cfg = LoraConfig(
r=8, lora_alpha=16, lora_dropout=0.0,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
task_type="CAUSAL_LM"
)
args = TrainingArguments(
output_dir="saves/Qwen3-4B-Instruct/lora/run-peft",
per_device_train_batch_size=4,
gradient_accumulation_steps=8,
learning_rate=5e-5,
num_train_epochs=2,
lr_scheduler_type="cosine",
logging_steps=5,
save_steps=100,
bf16=True,
report_to="none",
max_grad_norm=1.0
)
trainer = SFTTrainer(
model=model,
peft_config=peft_cfg,
tokenizer=tok,
train_dataset=ds["train"],
dataset_text_field="text",
max_seq_length=2048,
packing=False
)
trainer.train()
trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft")
tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft")
```
**Inference with adapter (or merge):**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B"
adapter = "saves/Qwen3-4B-Instruct/lora/run-peft"
tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = PeftModel.from_pretrained(model, adapter)
prompt = tok.apply_chat_template(
[{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}],
tokenize=False, add_generation_prompt=True
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0], skip_special_tokens=True))
# Optional: merge LoRA into base weights for publishing
# model = model.merge_and_unload()
# model.save_pretrained("merged-qwen3-4b-gprmax")
# tok.save_pretrained("merged-qwen3-4b-gprmax")
```
### How the fine‑tuned model plugs into the app
- `app.py` sets `MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct"` and uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`.
It also streams the **thinking** text (between `<think>...</think>`) to a separate UI pane.
- When the model emits the tool call JSON for `search_documentation`, the app uses the retriever to query the local ChromaDB and shows sources in the right pane.
---
## Project layout
```
.
├── app.py # Main Gradio app: model load, streaming, tool-calling
└── rag-db/
├── generate_db.py # Clone gprMax, chunk docs, build ChromaDB, save metadata
├── retriever.py # Persistent Chroma client + search utilities
└── chroma_db/ # (created at runtime) persistent vector DB + metadata.json
```
- The app will **auto‑build** the DB by **pulling gprMax github repo and embedding *latest* documents** if it’s missing, then load it for searches.
- The builder saves `metadata.json` with the collection name (`gprmax_docs_v1`), chunking settings, and the embedding label.
- The retriever uses a persistent client and turns distances into a simple score for display.
---
## Tips & troubleshooting
- **GPU out‑of‑memory?** Lower **Max New Tokens** in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU.
- **No docs in sources panel?** Build the DB manually:
```bash
python rag-db/generate_db.py --recreate
```
This clones the official repo, chunks `/docs` (size **1000**, overlap **200**), builds the `gprmax_docs_v1` collection, and writes metadata.
- **First response is slow.** That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster.
- Smaller models tend to **overthink**([Cuadron, Alejandro, et al.,2025](https://arxiv.org/abs/2502.08235)), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof.
## License note
The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content.
**Thanks:** the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB). |