Spaces:
Running
on
Zero
Running
on
Zero
| title: Gprmax Support | |
| emoji: 👀 | |
| colorFrom: yellow | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.44.1 | |
| app_file: app.py | |
| pinned: true | |
| # gprMax AI Support Assistant (GSoC 2025) | |
| **What it is:** a small web app that helps people write gprMax `.in` files, understand commands, and troubleshoot simulations in a simple chat UI. | |
| **Why it matters:** new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed. | |
| **Live demo:** [Gprmax Support - a Hugging Face Space by jfang](https://huggingface.co/spaces/jfang/gprmax-support-gsoc25) | |
| **Main model used by the app:** `jfang/gprmax-ft-Qwen3-4B-Instruct`. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency. | |
| --- | |
| ## What I built (GSoC progress) | |
| - **Fine‑tuned model for gprMax**. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads `jfang/gprmax-ft-Qwen3-4B-Instruct`. | |
| - **RAG (Retrieval‑Augmented Generation)** on top of the official gprMax documentation. On first run, the app clones the repo, chunks `/docs` files, and creates a **persistent ChromaDB** store. Then the model can “call a tool” to search docs and show sources. | |
| - **Friendly UI** with Gradio: left side is chat; right side has two collapsible panels: **AI Thinking Process** and **Documentation Sources**. There are also **Settings** so people can tune temperature, max tokens, etc. | |
| - **Reproducible fine‑tuning recipe** with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run. | |
| - **Model Zoo (finetuned weights)**: I trained several variants and organized them here: | |
| [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned) | |
| > The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts. | |
| --- | |
| ## Quick start | |
| ### 1) Use it online (Hugging Face Space) | |
| 1. Open the Space. | |
| 2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file. | |
| 3. Check the right panels: | |
| - **AI Thinking Process** shows the model’s step‑by‑step reasoning (what it’s thinking). | |
| - **Documentation Sources** shows the retriever’s citations and short previews. | |
| > The Space wraps generation with `@spaces.GPU(duration=60)` to keep GPU usage small and predictable. | |
| ### 2) Run it locally | |
| ```bash | |
| pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces" | |
| gradio app.py | |
| ``` | |
| - First run: if the vector DB is missing, the app will **auto‑build** it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.” | |
| - The database is **persistent** (on disk), so later runs are faster. The builder stores a `metadata.json` with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default). | |
| --- | |
| ## Using the app (what to try) | |
| Ask things like: | |
| - “How do I create a basic gprMax input file for a simple GPR simulation?” | |
| - “What’s the difference between `#domain` and `#dx_dy_dz`?” | |
| - “How do I add a Ricker wavelet source?” | |
| - “My simulation is taking too long—any tips to speed it up?” | |
| - “How do I model a soil with different dielectric properties?” | |
| When the model needs context, it emits a small JSON “tool call” to **search_documentation**. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets. | |
| --- | |
| ## Design principles (in simple terms) | |
| - **Keep it modular.** Model, retriever, and UI are separate pieces. We can upgrade any part later. | |
| - **Ground answers in docs.** The model can look things up and show sources, not just “guess.” | |
| - **Make it light.** A 4B model plus a local vector DB runs on modest hardware and fits on Spaces. | |
| - **Be transparent.** Show what the model is thinking and where facts come from. | |
| - **Future‑proof.** Rebuild the DB when docs change; swap in new models or embeddings later. | |
| --- | |
| ## Architecture (at a glance) | |
| ``` | |
| User ↔ Gradio Chat UI | |
| │ | |
| ▼ | |
| Transformers (Qwen3‑4B fine‑tuned) → streams text + <think> ... </think> | |
| │ | |
| (optional tool call as JSON) | |
| ▼ | |
| search_documentation(query) | |
| │ | |
| ▼ | |
| GprMaxRAGRetriever ── ChromaDB (persistent on disk) | |
| │ │ | |
| ▼ ▼ | |
| gprMax docs (cloned → chunked → indexed) | |
| ``` | |
| - **Model loading & streaming.** The app uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. The generator splits `<think>…</think>` into a separate “AI Thinking Process” pane. | |
| - **Tool calling.** The system prompt describes a `search_documentation` tool and the exact JSON format for calling it. | |
| - **RAG database.** The builder clones the official `gprMax` repo, reads `/docs` (`.rst`, `.md`, `.txt`), chunks with **size 1000 / overlap 200**, and stores to a **ChromaDB** collection named `gprmax_docs_v1`. Metadata includes `embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)"`. | |
| - **Retriever.** Uses a persistent Chroma client and queries via `query_texts`. Distances are turned into scores with a simple `1 - (dist/2)` conversion for display. | |
| --- | |
| ## Technical choices (frameworks and why) | |
| - **Transformers** to load and run the fine‑tuned Qwen 4B model, with `device_map="auto"` and `trust_remote_code=True`. This keeps the code short and makes GPU/CPU selection automatic. | |
| - **Gradio** for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend. | |
| - **ChromaDB** for a simple, persistent vector store that ships with the app. No external service is required. | |
| - **GitPython + tqdm** to clone gprMax docs and show progress when building the DB. | |
| --- | |
| ## Reproducible fine‑tuning (LoRA / PEFT) | |
| This is the core of the work. Below is **exactly** how the 4B model was trained and how someone else can redo it. | |
| ### What I trained | |
| - **Base model:** `Qwen/Qwen3-4B` (using the Qwen3 chat template). | |
| - **Method:** LoRA adapters (**rank=8**, **alpha=16**, **dropout=0.0**) applied to attention and MLP projection layers. | |
| - **Outputs:** adapters + merged weights; the app uses the merged variant `jfang/gprmax-ft-Qwen3-4B-Instruct`. | |
| - **Other models I trained:** see my collection: | |
| [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned) | |
| ### Exact config used (YAML) | |
| ```yaml | |
| bf16: true | |
| cutoff_len: 2048 | |
| dataset: gpr-train | |
| dataset_dir: data | |
| ddp_timeout: 180000000 | |
| do_train: true | |
| enable_thinking: true | |
| finetuning_type: lora | |
| flash_attn: auto | |
| gradient_accumulation_steps: 8 | |
| include_num_input_tokens_seen: true | |
| learning_rate: 5.0e-05 | |
| logging_steps: 5 | |
| lora_alpha: 16 | |
| lora_dropout: 0 | |
| lora_rank: 8 | |
| lora_target: all | |
| lr_scheduler_type: cosine | |
| max_grad_norm: 1.0 | |
| max_samples: 100000 | |
| model_name_or_path: Qwen/Qwen3-4B | |
| num_train_epochs: 2.0 | |
| optim: adamw_torch | |
| output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27 | |
| packing: false | |
| per_device_train_batch_size: 4 | |
| plot_loss: true | |
| preprocessing_num_workers: 16 | |
| report_to: none | |
| save_steps: 100 | |
| stage: sft | |
| template: qwen3 | |
| trust_remote_code: true | |
| warmup_steps: 0 | |
| ``` | |
| **Metrics reported (4B run):** | |
| ```json | |
| { | |
| "epoch": 2.0, | |
| "num_input_tokens_seen": 48562016, | |
| "total_flos": 1.0635160197775688e+18, | |
| "train_loss": 0.3312762507200241, | |
| "train_runtime": 16760.735, | |
| "train_samples_per_second": 1.909, | |
| "train_steps_per_second": 0.06 | |
| } | |
| ``` | |
| **loss curve** | |
| ![[training_loss.png]] | |
| ### Path A — Simple HF/PEFT training script | |
| ```python | |
| # train_lora_peft.py | |
| import torch | |
| from datasets import load_dataset | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments | |
| from trl import SFTTrainer | |
| from peft import LoraConfig | |
| BASE = "Qwen/Qwen3-4B" | |
| tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True) | |
| tok.padding_side = "right" | |
| if tok.pad_token is None: | |
| tok.pad_token = tok.eos_token | |
| ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"}) | |
| def to_text(ex): | |
| return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)} | |
| ds = ds.map(to_text, remove_columns=ds["train"].column_names) | |
| dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32 | |
| model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True) | |
| peft_cfg = LoraConfig( | |
| r=8, lora_alpha=16, lora_dropout=0.0, | |
| target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"], | |
| task_type="CAUSAL_LM" | |
| ) | |
| args = TrainingArguments( | |
| output_dir="saves/Qwen3-4B-Instruct/lora/run-peft", | |
| per_device_train_batch_size=4, | |
| gradient_accumulation_steps=8, | |
| learning_rate=5e-5, | |
| num_train_epochs=2, | |
| lr_scheduler_type="cosine", | |
| logging_steps=5, | |
| save_steps=100, | |
| bf16=True, | |
| report_to="none", | |
| max_grad_norm=1.0 | |
| ) | |
| trainer = SFTTrainer( | |
| model=model, | |
| peft_config=peft_cfg, | |
| tokenizer=tok, | |
| train_dataset=ds["train"], | |
| dataset_text_field="text", | |
| max_seq_length=2048, | |
| packing=False | |
| ) | |
| trainer.train() | |
| trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft") | |
| tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft") | |
| ``` | |
| **Inference with adapter (or merge):** | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from peft import PeftModel | |
| import torch | |
| base = "Qwen/Qwen3-4B" | |
| adapter = "saves/Qwen3-4B-Instruct/lora/run-peft" | |
| tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True) | |
| model = PeftModel.from_pretrained(model, adapter) | |
| prompt = tok.apply_chat_template( | |
| [{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}], | |
| tokenize=False, add_generation_prompt=True | |
| ) | |
| inputs = tok(prompt, return_tensors="pt").to(model.device) | |
| out = model.generate(**inputs, max_new_tokens=512) | |
| print(tok.decode(out[0], skip_special_tokens=True)) | |
| # Optional: merge LoRA into base weights for publishing | |
| # model = model.merge_and_unload() | |
| # model.save_pretrained("merged-qwen3-4b-gprmax") | |
| # tok.save_pretrained("merged-qwen3-4b-gprmax") | |
| ``` | |
| ### How the fine‑tuned model plugs into the app | |
| - `app.py` sets `MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct"` and uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. | |
| It also streams the **thinking** text (between `<think>...</think>`) to a separate UI pane. | |
| - When the model emits the tool call JSON for `search_documentation`, the app uses the retriever to query the local ChromaDB and shows sources in the right pane. | |
| --- | |
| ## Project layout | |
| ``` | |
| . | |
| ├── app.py # Main Gradio app: model load, streaming, tool-calling | |
| └── rag-db/ | |
| ├── generate_db.py # Clone gprMax, chunk docs, build ChromaDB, save metadata | |
| ├── retriever.py # Persistent Chroma client + search utilities | |
| └── chroma_db/ # (created at runtime) persistent vector DB + metadata.json | |
| ``` | |
| - The app will **auto‑build** the DB by **pulling gprMax github repo and embedding *latest* documents** if it’s missing, then load it for searches. | |
| - The builder saves `metadata.json` with the collection name (`gprmax_docs_v1`), chunking settings, and the embedding label. | |
| - The retriever uses a persistent client and turns distances into a simple score for display. | |
| --- | |
| ## Tips & troubleshooting | |
| - **GPU out‑of‑memory?** Lower **Max New Tokens** in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU. | |
| - **No docs in sources panel?** Build the DB manually: | |
| ```bash | |
| python rag-db/generate_db.py --recreate | |
| ``` | |
| This clones the official repo, chunks `/docs` (size **1000**, overlap **200**), builds the `gprmax_docs_v1` collection, and writes metadata. | |
| - **First response is slow.** That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster. | |
| - Smaller models tend to **overthink**([Cuadron, Alejandro, et al.,2025](https://arxiv.org/abs/2502.08235)), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof. | |
| ## License note | |
| The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content. | |
| **Thanks:** the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB). |