jfang's picture
Update README.md
0496ab5 verified

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: Gprmax Support
emoji: 👀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: true

gprMax AI Support Assistant (GSoC 2025)

What it is: a small web app that helps people write gprMax .in files, understand commands, and troubleshoot simulations in a simple chat UI.
Why it matters: new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed.

Live demo: Gprmax Support - a Hugging Face Space by jfang Main model used by the app: jfang/gprmax-ft-Qwen3-4B-Instruct. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency.


What I built (GSoC progress)

  • Fine‑tuned model for gprMax. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads jfang/gprmax-ft-Qwen3-4B-Instruct.

  • RAG (Retrieval‑Augmented Generation) on top of the official gprMax documentation. On first run, the app clones the repo, chunks /docs files, and creates a persistent ChromaDB store. Then the model can “call a tool” to search docs and show sources.

  • Friendly UI with Gradio: left side is chat; right side has two collapsible panels: AI Thinking Process and Documentation Sources. There are also Settings so people can tune temperature, max tokens, etc.

  • Reproducible fine‑tuning recipe with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run.

  • Model Zoo (finetuned weights): I trained several variants and organized them here:
    https://huggingface.co/collections/jfang/gprmax-command-finetuned

The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts.


Quick start

1) Use it online (Hugging Face Space)

  1. Open the Space.

  2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file.

  3. Check the right panels:

    • AI Thinking Process shows the model’s step‑by‑step reasoning (what it’s thinking).

    • Documentation Sources shows the retriever’s citations and short previews.

The Space wraps generation with @spaces.GPU(duration=60) to keep GPU usage small and predictable.

2) Run it locally

pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces" 

gradio app.py
  • First run: if the vector DB is missing, the app will auto‑build it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.”

  • The database is persistent (on disk), so later runs are faster. The builder stores a metadata.json with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default).


Using the app (what to try)

Ask things like:

  • “How do I create a basic gprMax input file for a simple GPR simulation?”

  • “What’s the difference between #domain and #dx_dy_dz?”

  • “How do I add a Ricker wavelet source?”

  • “My simulation is taking too long—any tips to speed it up?”

  • “How do I model a soil with different dielectric properties?”

When the model needs context, it emits a small JSON “tool call” to search_documentation. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets.


Design principles (in simple terms)

  • Keep it modular. Model, retriever, and UI are separate pieces. We can upgrade any part later.

  • Ground answers in docs. The model can look things up and show sources, not just “guess.”

  • Make it light. A 4B model plus a local vector DB runs on modest hardware and fits on Spaces.

  • Be transparent. Show what the model is thinking and where facts come from.

  • Future‑proof. Rebuild the DB when docs change; swap in new models or embeddings later.


Architecture (at a glance)

User ↔ Gradio Chat UI
          │
          ▼
 Transformers (Qwen3‑4B fine‑tuned) → streams text + <think> ... </think>
          │
   (optional tool call as JSON)
          ▼
search_documentation(query)
          │
          ▼
GprMaxRAGRetriever ── ChromaDB (persistent on disk)
          │                 │
          ▼                 ▼
     gprMax docs (cloned → chunked → indexed)
  • Model loading & streaming. The app uses AutoTokenizer/AutoModelForCausalLM with device_map="auto". The generator splits <think>…</think> into a separate “AI Thinking Process” pane.

  • Tool calling. The system prompt describes a search_documentation tool and the exact JSON format for calling it.

  • RAG database. The builder clones the official gprMax repo, reads /docs (.rst, .md, .txt), chunks with size 1000 / overlap 200, and stores to a ChromaDB collection named gprmax_docs_v1. Metadata includes embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)".

  • Retriever. Uses a persistent Chroma client and queries via query_texts. Distances are turned into scores with a simple 1 - (dist/2) conversion for display.


Technical choices (frameworks and why)

  • Transformers to load and run the fine‑tuned Qwen 4B model, with device_map="auto" and trust_remote_code=True. This keeps the code short and makes GPU/CPU selection automatic.

  • Gradio for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend.

  • ChromaDB for a simple, persistent vector store that ships with the app. No external service is required.

  • GitPython + tqdm to clone gprMax docs and show progress when building the DB.


Reproducible fine‑tuning (LoRA / PEFT)

This is the core of the work. Below is exactly how the 4B model was trained and how someone else can redo it.

What I trained

  • Base model: Qwen/Qwen3-4B (using the Qwen3 chat template).

  • Method: LoRA adapters (rank=8, alpha=16, dropout=0.0) applied to attention and MLP projection layers.

  • Outputs: adapters + merged weights; the app uses the merged variant jfang/gprmax-ft-Qwen3-4B-Instruct.

  • Other models I trained: see my collection:
    https://huggingface.co/collections/jfang/gprmax-command-finetuned

Exact config used (YAML)

bf16: true
cutoff_len: 2048
dataset: gpr-train
dataset_dir: data
ddp_timeout: 180000000
do_train: true
enable_thinking: true
finetuning_type: lora
flash_attn: auto
gradient_accumulation_steps: 8
include_num_input_tokens_seen: true
learning_rate: 5.0e-05
logging_steps: 5
lora_alpha: 16
lora_dropout: 0
lora_rank: 8
lora_target: all
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_samples: 100000
model_name_or_path: Qwen/Qwen3-4B
num_train_epochs: 2.0
optim: adamw_torch
output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27
packing: false
per_device_train_batch_size: 4
plot_loss: true
preprocessing_num_workers: 16
report_to: none
save_steps: 100
stage: sft
template: qwen3
trust_remote_code: true
warmup_steps: 0

Metrics reported (4B run):

{
  "epoch": 2.0,
  "num_input_tokens_seen": 48562016,
  "total_flos": 1.0635160197775688e+18,
  "train_loss": 0.3312762507200241,
  "train_runtime": 16760.735,
  "train_samples_per_second": 1.909,
  "train_steps_per_second": 0.06
}

loss curve ![[training_loss.png]]

Path A — Simple HF/PEFT training script

# train_lora_peft.py
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from trl import SFTTrainer
from peft import LoraConfig

BASE = "Qwen/Qwen3-4B"

tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
tok.padding_side = "right"
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"})

def to_text(ex):
    return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)}

ds = ds.map(to_text, remove_columns=ds["train"].column_names)

dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True)

peft_cfg = LoraConfig(
    r=8, lora_alpha=16, lora_dropout=0.0,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    task_type="CAUSAL_LM"
)

args = TrainingArguments(
    output_dir="saves/Qwen3-4B-Instruct/lora/run-peft",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=5e-5,
    num_train_epochs=2,
    lr_scheduler_type="cosine",
    logging_steps=5,
    save_steps=100,
    bf16=True,
    report_to="none",
    max_grad_norm=1.0
)

trainer = SFTTrainer(
    model=model,
    peft_config=peft_cfg,
    tokenizer=tok,
    train_dataset=ds["train"],
    dataset_text_field="text",
    max_seq_length=2048,
    packing=False
)

trainer.train()
trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft")
tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft")

Inference with adapter (or merge):

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B"
adapter = "saves/Qwen3-4B-Instruct/lora/run-peft"

tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = PeftModel.from_pretrained(model, adapter)

prompt = tok.apply_chat_template(
    [{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}],
    tokenize=False, add_generation_prompt=True
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(out[0], skip_special_tokens=True))

# Optional: merge LoRA into base weights for publishing
# model = model.merge_and_unload()
# model.save_pretrained("merged-qwen3-4b-gprmax")
# tok.save_pretrained("merged-qwen3-4b-gprmax")

How the fine‑tuned model plugs into the app

  • app.py sets MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct" and uses AutoTokenizer/AutoModelForCausalLM with device_map="auto".
    It also streams the thinking text (between <think>...</think>) to a separate UI pane.

  • When the model emits the tool call JSON for search_documentation, the app uses the retriever to query the local ChromaDB and shows sources in the right pane.


Project layout

.
├── app.py                          # Main Gradio app: model load, streaming, tool-calling
└── rag-db/
    ├── generate_db.py              # Clone gprMax, chunk docs, build ChromaDB, save metadata
    ├── retriever.py                # Persistent Chroma client + search utilities
    └── chroma_db/                  # (created at runtime) persistent vector DB + metadata.json
  • The app will auto‑build the DB by pulling gprMax github repo and embedding latest documents if it’s missing, then load it for searches.

  • The builder saves metadata.json with the collection name (gprmax_docs_v1), chunking settings, and the embedding label.

  • The retriever uses a persistent client and turns distances into a simple score for display.


Tips & troubleshooting

  • GPU out‑of‑memory? Lower Max New Tokens in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU.

  • No docs in sources panel? Build the DB manually:

       python rag-db/generate_db.py --recreate
    

    This clones the official repo, chunks /docs (size 1000, overlap 200), builds the gprmax_docs_v1 collection, and writes metadata.

  • First response is slow. That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster.

  • Smaller models tend to overthink(Cuadron, Alejandro, et al.,2025), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof.

License note

The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content.

Thanks: the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB).