Spaces:

jfang
/

gprmax-support-gsoc25

Running on Zero

File size: 13,160 Bytes

27db51f
e3de51c
0496ab5
27db51f
 
 
0496ab5
27db51f
0496ab5
27db51f
 
0496ab5

---
title: Gprmax Support
emoji: 👀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: true
---

# gprMax AI Support Assistant (GSoC 2025)

**What it is:** a small web app that helps people write gprMax `.in` files, understand commands, and troubleshoot simulations in a simple chat UI.  
**Why it matters:** new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed.

**Live demo:** [Gprmax Support - a Hugging Face Space by jfang](https://huggingface.co/spaces/jfang/gprmax-support-gsoc25)
**Main model used by the app:** `jfang/gprmax-ft-Qwen3-4B-Instruct`. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency.

---

## What I built (GSoC progress)

- **Fine‑tuned model for gprMax**. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads `jfang/gprmax-ft-Qwen3-4B-Instruct`.
    
- **RAG (Retrieval‑Augmented Generation)** on top of the official gprMax documentation. On first run, the app clones the repo, chunks `/docs` files, and creates a **persistent ChromaDB** store. Then the model can “call a tool” to search docs and show sources.
    
- **Friendly UI** with Gradio: left side is chat; right side has two collapsible panels: **AI Thinking Process** and **Documentation Sources**. There are also **Settings** so people can tune temperature, max tokens, etc.
    
- **Reproducible fine‑tuning recipe** with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run.
    
- **Model Zoo (finetuned weights)**: I trained several variants and organized them here:  
    [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)
    

> The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts.

---

## Quick start

### 1) Use it online (Hugging Face Space)

1. Open the Space.
    
2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file.
    
3. Check the right panels:
    
    - **AI Thinking Process** shows the model’s step‑by‑step reasoning (what it’s thinking).
        
    - **Documentation Sources** shows the retriever’s citations and short previews.
        

> The Space wraps generation with `@spaces.GPU(duration=60)` to keep GPU usage small and predictable.

### 2) Run it locally

```bash
pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces" 

gradio app.py
```

- First run: if the vector DB is missing, the app will **auto‑build** it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.”
    
- The database is **persistent** (on disk), so later runs are faster. The builder stores a `metadata.json` with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default).
    

---

## Using the app (what to try)

Ask things like:

- “How do I create a basic gprMax input file for a simple GPR simulation?”
    
- “What’s the difference between `#domain` and `#dx_dy_dz`?”
    
- “How do I add a Ricker wavelet source?”
    
- “My simulation is taking too long—any tips to speed it up?”
    
- “How do I model a soil with different dielectric properties?”
    

When the model needs context, it emits a small JSON “tool call” to **search_documentation**. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets.

---

## Design principles (in simple terms)

- **Keep it modular.** Model, retriever, and UI are separate pieces. We can upgrade any part later.
    
- **Ground answers in docs.** The model can look things up and show sources, not just “guess.”
    
- **Make it light.** A 4B model plus a local vector DB runs on modest hardware and fits on Spaces.
    
- **Be transparent.** Show what the model is thinking and where facts come from.
    
- **Future‑proof.** Rebuild the DB when docs change; swap in new models or embeddings later.
    

---

## Architecture (at a glance)

```
User ↔ Gradio Chat UI
          │
          ▼
 Transformers (Qwen3‑4B fine‑tuned) → streams text + <think> ... </think>
          │
   (optional tool call as JSON)
          ▼
search_documentation(query)
          │
          ▼
GprMaxRAGRetriever ── ChromaDB (persistent on disk)
          │                 │
          ▼                 ▼
     gprMax docs (cloned → chunked → indexed)
```

- **Model loading & streaming.** The app uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. The generator splits `<think>…</think>` into a separate “AI Thinking Process” pane.
    
- **Tool calling.** The system prompt describes a `search_documentation` tool and the exact JSON format for calling it.
    
- **RAG database.** The builder clones the official `gprMax` repo, reads `/docs` (`.rst`, `.md`, `.txt`), chunks with **size 1000 / overlap 200**, and stores to a **ChromaDB** collection named `gprmax_docs_v1`. Metadata includes `embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)"`.
    
- **Retriever.** Uses a persistent Chroma client and queries via `query_texts`. Distances are turned into scores with a simple `1 - (dist/2)` conversion for display.
    

---

## Technical choices (frameworks and why)

- **Transformers** to load and run the fine‑tuned Qwen 4B model, with `device_map="auto"` and `trust_remote_code=True`. This keeps the code short and makes GPU/CPU selection automatic.
    
- **Gradio** for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend.
    
- **ChromaDB** for a simple, persistent vector store that ships with the app. No external service is required.
    
- **GitPython + tqdm** to clone gprMax docs and show progress when building the DB.
    

---

## Reproducible fine‑tuning (LoRA / PEFT)

This is the core of the work. Below is **exactly** how the 4B model was trained and how someone else can redo it.

### What I trained

- **Base model:** `Qwen/Qwen3-4B` (using the Qwen3 chat template).
    
- **Method:** LoRA adapters (**rank=8**, **alpha=16**, **dropout=0.0**) applied to attention and MLP projection layers.
    
- **Outputs:** adapters + merged weights; the app uses the merged variant `jfang/gprmax-ft-Qwen3-4B-Instruct`.
    
- **Other models I trained:** see my collection:  
    [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)
    


### Exact config used (YAML)

```yaml
bf16: true
cutoff_len: 2048
dataset: gpr-train
dataset_dir: data
ddp_timeout: 180000000
do_train: true
enable_thinking: true
finetuning_type: lora
flash_attn: auto
gradient_accumulation_steps: 8
include_num_input_tokens_seen: true
learning_rate: 5.0e-05
logging_steps: 5
lora_alpha: 16
lora_dropout: 0
lora_rank: 8
lora_target: all
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_samples: 100000
model_name_or_path: Qwen/Qwen3-4B
num_train_epochs: 2.0
optim: adamw_torch
output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27
packing: false
per_device_train_batch_size: 4
plot_loss: true
preprocessing_num_workers: 16
report_to: none
save_steps: 100
stage: sft
template: qwen3
trust_remote_code: true
warmup_steps: 0
```

**Metrics reported (4B run):**

```json
{
  "epoch": 2.0,
  "num_input_tokens_seen": 48562016,
  "total_flos": 1.0635160197775688e+18,
  "train_loss": 0.3312762507200241,
  "train_runtime": 16760.735,
  "train_samples_per_second": 1.909,
  "train_steps_per_second": 0.06
}
```

**loss curve**
![[training_loss.png]]

### Path A — Simple HF/PEFT training script

```python
# train_lora_peft.py
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from trl import SFTTrainer
from peft import LoraConfig

BASE = "Qwen/Qwen3-4B"

tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
tok.padding_side = "right"
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"})

def to_text(ex):
    return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)}

ds = ds.map(to_text, remove_columns=ds["train"].column_names)

dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True)

peft_cfg = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.0,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    task_type="CAUSAL_LM"
)

args = TrainingArguments(
    output_dir="saves/Qwen3-4B-Instruct/lora/run-peft",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=5e-5,
    num_train_epochs=2,
    lr_scheduler_type="cosine",
    logging_steps=5,
    save_steps=100,
    bf16=True,
    report_to="none",
    max_grad_norm=1.0
)

trainer = SFTTrainer(
    model=model,
    peft_config=peft_cfg,
    tokenizer=tok,
    train_dataset=ds["train"],
    dataset_text_field="text",
    max_seq_length=2048,
    packing=False
)

trainer.train()
trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft")
tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft")
```

**Inference with adapter (or merge):**

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B"
adapter = "saves/Qwen3-4B-Instruct/lora/run-peft"

tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = PeftModel.from_pretrained(model, adapter)

prompt = tok.apply_chat_template(
    [{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}],
    tokenize=False, add_generation_prompt=True
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0], skip_special_tokens=True))

# Optional: merge LoRA into base weights for publishing
# model = model.merge_and_unload()
# model.save_pretrained("merged-qwen3-4b-gprmax")
# tok.save_pretrained("merged-qwen3-4b-gprmax")
```

### How the fine‑tuned model plugs into the app

- `app.py` sets `MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct"` and uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`.  
    It also streams the **thinking** text (between `<think>...</think>`) to a separate UI pane.
    
- When the model emits the tool call JSON for `search_documentation`, the app uses the retriever to query the local ChromaDB and shows sources in the right pane.
    

---

## Project layout

```
.
├── app.py                          # Main Gradio app: model load, streaming, tool-calling
└── rag-db/
    ├── generate_db.py              # Clone gprMax, chunk docs, build ChromaDB, save metadata
    ├── retriever.py                # Persistent Chroma client + search utilities
    └── chroma_db/                  # (created at runtime) persistent vector DB + metadata.json
```

- The app will **auto‑build** the DB by **pulling gprMax github repo and embedding *latest* documents** if it’s missing, then load it for searches.
    
- The builder saves `metadata.json` with the collection name (`gprmax_docs_v1`), chunking settings, and the embedding label.
    
- The retriever uses a persistent client and turns distances into a simple score for display.
    

---


## Tips & troubleshooting

- **GPU out‑of‑memory?** Lower **Max New Tokens** in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU.

- **No docs in sources panel?** Build the DB manually:

	```bash
	   python rag-db/generate_db.py --recreate
	```

	
	This clones the official repo, chunks `/docs` (size **1000**, overlap **200**), builds the `gprmax_docs_v1` collection, and writes metadata.

- **First response is slow.** That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster.

- Smaller models tend to **overthink**([Cuadron, Alejandro, et al.,2025](https://arxiv.org/abs/2502.08235)), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof.

## License note

The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content.

**Thanks:** the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB).