Spaces:

jfang
/

gprmax-support-gsoc25

Running on Zero

App Files Files Community

gprmax-support-gsoc25 / README.md

jfang

Update README.md

0496ab5 verified 2 months ago

preview code

raw

history blame contribute delete

13.2 kB

	---
	title: Gprmax Support
	emoji: 👀
	colorFrom: yellow
	colorTo: purple
	sdk: gradio
	sdk_version: 5.44.1
	app_file: app.py
	pinned: true
	---

	# gprMax AI Support Assistant (GSoC 2025)

	What it is: a small web app that helps people write gprMax `.in` files, understand commands, and troubleshoot simulations in a simple chat UI.
	Why it matters: new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed.

	Live demo: [Gprmax Support - a Hugging Face Space by jfang](https://huggingface.co/spaces/jfang/gprmax-support-gsoc25)
	Main model used by the app: `jfang/gprmax-ft-Qwen3-4B-Instruct`. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency.

	---

	## What I built (GSoC progress)

	- Fine‑tuned model for gprMax. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads `jfang/gprmax-ft-Qwen3-4B-Instruct`.

	- RAG (Retrieval‑Augmented Generation) on top of the official gprMax documentation. On first run, the app clones the repo, chunks `/docs` files, and creates a persistent ChromaDB store. Then the model can “call a tool” to search docs and show sources.

	- Friendly UI with Gradio: left side is chat; right side has two collapsible panels: AI Thinking Process and Documentation Sources. There are also Settings so people can tune temperature, max tokens, etc.

	- Reproducible fine‑tuning recipe with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run.

	- Model Zoo (finetuned weights): I trained several variants and organized them here:
	[https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)


	> The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts.

	---

	## Quick start

	### 1) Use it online (Hugging Face Space)

	1. Open the Space.

	2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file.

	3. Check the right panels:

	- AI Thinking Process shows the model’s step‑by‑step reasoning (what it’s thinking).

	- Documentation Sources shows the retriever’s citations and short previews.


	> The Space wraps generation with `@spaces.GPU(duration=60)` to keep GPU usage small and predictable.

	### 2) Run it locally

	```bash
	pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces"

	gradio app.py
	```

	- First run: if the vector DB is missing, the app will auto‑build it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.”

	- The database is persistent (on disk), so later runs are faster. The builder stores a `metadata.json` with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default).


	---

	## Using the app (what to try)

	Ask things like:

	- “How do I create a basic gprMax input file for a simple GPR simulation?”

	- “What’s the difference between `#domain` and `#dx_dy_dz`?”

	- “How do I add a Ricker wavelet source?”

	- “My simulation is taking too long—any tips to speed it up?”

	- “How do I model a soil with different dielectric properties?”


	When the model needs context, it emits a small JSON “tool call” to search_documentation. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets.

	---

	## Design principles (in simple terms)

	- Keep it modular. Model, retriever, and UI are separate pieces. We can upgrade any part later.

	- Ground answers in docs. The model can look things up and show sources, not just “guess.”

	- Make it light. A 4B model plus a local vector DB runs on modest hardware and fits on Spaces.

	- Be transparent. Show what the model is thinking and where facts come from.

	- Future‑proof. Rebuild the DB when docs change; swap in new models or embeddings later.


	---

	## Architecture (at a glance)

	```
	User ↔ Gradio Chat UI
	│
	▼
	Transformers (Qwen3‑4B fine‑tuned) → streams text + <think> ... </think>
	│
	(optional tool call as JSON)
	▼
	search_documentation(query)
	│
	▼
	GprMaxRAGRetriever ── ChromaDB (persistent on disk)
	│ │
	▼ ▼
	gprMax docs (cloned → chunked → indexed)
	```

	- Model loading & streaming. The app uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. The generator splits `<think>…</think>` into a separate “AI Thinking Process” pane.

	- Tool calling. The system prompt describes a `search_documentation` tool and the exact JSON format for calling it.

	- RAG database. The builder clones the official `gprMax` repo, reads `/docs` (`.rst`, `.md`, `.txt`), chunks with size 1000 / overlap 200, and stores to a ChromaDB collection named `gprmax_docs_v1`. Metadata includes `embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)"`.

	- Retriever. Uses a persistent Chroma client and queries via `query_texts`. Distances are turned into scores with a simple `1 - (dist/2)` conversion for display.


	---

	## Technical choices (frameworks and why)

	- Transformers to load and run the fine‑tuned Qwen 4B model, with `device_map="auto"` and `trust_remote_code=True`. This keeps the code short and makes GPU/CPU selection automatic.

	- Gradio for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend.

	- ChromaDB for a simple, persistent vector store that ships with the app. No external service is required.

	- GitPython + tqdm to clone gprMax docs and show progress when building the DB.


	---

	## Reproducible fine‑tuning (LoRA / PEFT)

	This is the core of the work. Below is exactly how the 4B model was trained and how someone else can redo it.

	### What I trained

	- Base model: `Qwen/Qwen3-4B` (using the Qwen3 chat template).

	- Method: LoRA adapters (rank=8, alpha=16, dropout=0.0) applied to attention and MLP projection layers.

	- Outputs: adapters + merged weights; the app uses the merged variant `jfang/gprmax-ft-Qwen3-4B-Instruct`.

	- Other models I trained: see my collection:
	[https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)



	### Exact config used (YAML)

	```yaml
	bf16: true
	cutoff_len: 2048
	dataset: gpr-train
	dataset_dir: data
	ddp_timeout: 180000000
	do_train: true
	enable_thinking: true
	finetuning_type: lora
	flash_attn: auto
	gradient_accumulation_steps: 8
	include_num_input_tokens_seen: true
	learning_rate: 5.0e-05
	logging_steps: 5
	lora_alpha: 16
	lora_dropout: 0
	lora_rank: 8
	lora_target: all
	lr_scheduler_type: cosine
	max_grad_norm: 1.0
	max_samples: 100000
	model_name_or_path: Qwen/Qwen3-4B
	num_train_epochs: 2.0
	optim: adamw_torch
	output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27
	packing: false
	per_device_train_batch_size: 4
	plot_loss: true
	preprocessing_num_workers: 16
	report_to: none
	save_steps: 100
	stage: sft
	template: qwen3
	trust_remote_code: true
	warmup_steps: 0
	```

	Metrics reported (4B run):

	```json
	{
	"epoch": 2.0,
	"num_input_tokens_seen": 48562016,
	"total_flos": 1.0635160197775688e+18,
	"train_loss": 0.3312762507200241,
	"train_runtime": 16760.735,
	"train_samples_per_second": 1.909,
	"train_steps_per_second": 0.06
	}
	```

	loss curve
	![[training_loss.png]]

	### Path A — Simple HF/PEFT training script

	```python
	# train_lora_peft.py
	import torch
	from datasets import load_dataset
	from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
	from trl import SFTTrainer
	from peft import LoraConfig

	BASE = "Qwen/Qwen3-4B"

	tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
	tok.padding_side = "right"
	if tok.pad_token is None:
	tok.pad_token = tok.eos_token

	ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"})

	def to_text(ex):
	return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)}

	ds = ds.map(to_text, remove_columns=ds["train"].column_names)

	dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
	model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True)

	peft_cfg = LoraConfig(
	r=8, lora_alpha=16, lora_dropout=0.0,
	target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
	task_type="CAUSAL_LM"
	)

	args = TrainingArguments(
	output_dir="saves/Qwen3-4B-Instruct/lora/run-peft",
	per_device_train_batch_size=4,
	gradient_accumulation_steps=8,
	learning_rate=5e-5,
	num_train_epochs=2,
	lr_scheduler_type="cosine",
	logging_steps=5,
	save_steps=100,
	bf16=True,
	report_to="none",
	max_grad_norm=1.0
	)

	trainer = SFTTrainer(
	model=model,
	peft_config=peft_cfg,
	tokenizer=tok,
	train_dataset=ds["train"],
	dataset_text_field="text",
	max_seq_length=2048,
	packing=False
	)

	trainer.train()
	trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft")
	tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft")
	```

	Inference with adapter (or merge):

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	base = "Qwen/Qwen3-4B"
	adapter = "saves/Qwen3-4B-Instruct/lora/run-peft"

	tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
	model = PeftModel.from_pretrained(model, adapter)

	prompt = tok.apply_chat_template(
	[{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}],
	tokenize=False, add_generation_prompt=True
	)
	inputs = tok(prompt, return_tensors="pt").to(model.device)
	out = model.generate(**inputs, max_new_tokens=512)
	print(tok.decode(out[0], skip_special_tokens=True))

	# Optional: merge LoRA into base weights for publishing
	# model = model.merge_and_unload()
	# model.save_pretrained("merged-qwen3-4b-gprmax")
	# tok.save_pretrained("merged-qwen3-4b-gprmax")
	```

	### How the fine‑tuned model plugs into the app

	- `app.py` sets `MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct"` and uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`.
	It also streams the thinking text (between `<think>...</think>`) to a separate UI pane.

	- When the model emits the tool call JSON for `search_documentation`, the app uses the retriever to query the local ChromaDB and shows sources in the right pane.


	---

	## Project layout

	```
	.
	├── app.py # Main Gradio app: model load, streaming, tool-calling
	└── rag-db/
	├── generate_db.py # Clone gprMax, chunk docs, build ChromaDB, save metadata
	├── retriever.py # Persistent Chroma client + search utilities
	└── chroma_db/ # (created at runtime) persistent vector DB + metadata.json
	```

	- The app will auto‑build the DB by *pulling gprMax github repo and embedding latest* documents** if it’s missing, then load it for searches.

	- The builder saves `metadata.json` with the collection name (`gprmax_docs_v1`), chunking settings, and the embedding label.

	- The retriever uses a persistent client and turns distances into a simple score for display.


	---


	## Tips & troubleshooting

	- GPU out‑of‑memory? Lower Max New Tokens in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU.

	- No docs in sources panel? Build the DB manually:

	```bash
	python rag-db/generate_db.py --recreate
	```


	This clones the official repo, chunks `/docs` (size 1000, overlap 200), builds the `gprmax_docs_v1` collection, and writes metadata.

	- First response is slow. That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster.

	- Smaller models tend to overthink([Cuadron, Alejandro, et al.,2025](https://arxiv.org/abs/2502.08235)), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof.

	## License note

	The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content.

	Thanks: the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB).