Spaces:

dflehel
/

medgemmank

Running on Zero

App Files Files Community

medgemmank / README.md

dflehel

Upload README.md

788e18b verified about 1 month ago

preview code

raw

history blame contribute delete

5.93 kB

	---
	title: medgemma27
	emoji: 🏠
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: "4.44.0"
	app_file: app.py
	pinned: false
	---


	# MedGemma ZeroGPU Gradio Space

	This repository contains a minimal Gradio application that wraps
	Google’s `medgemma‑27b‑it` multi‑modal model and exposes it via a
	browser‑based interface. The app is designed to run on **Hugging Face
	Spaces configured with the ZeroGPU (Dynamic resources)** option.
	ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on
	demand. Existing ZeroGPU Spaces can be used for free, and the
	infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】.
	However, hosting your own ZeroGPU Space requires a PRO or Enterprise
	Hub subscription【283503741553473†L118-L127】. ZeroGPU Spaces are only
	compatible with the Gradio SDK and specific versions of PyTorch and
	Python【916380489845432†L110-L118】, which is why this project uses Gradio
	instead of a raw FastAPI server.

	## Features

	- English‑only input and output – The interface accepts a question in
	English and returns the model’s answer in English. A disclaimer is
	appended to every response reminding users to consult a medical
	professional.
	- Multi‑modal support – Optionally upload an image to provide
	additional context for the model. The input text and image are
	processed together.
	- Custom system prompt – You can supply your own system prompt to
	steer the model’s behaviour. If omitted a default radiology assistant
	instruction is used.
	- Optional API key – If you set an `API_KEY` secret in your Space,
	the UI will display a hidden API key field. Clients must enter the
	same value when calling the model; otherwise the request is rejected.
	- ZeroGPU integration – The heavy computation is wrapped in a
	function decorated with `@spaces.GPU`, which allocates an H200 slice
	for the duration of the call and releases it afterwards.

	## Setup

	1. Create a Gradio Space on Hugging Face. Choose the **ZeroGPU
	(Dynamic resources) hardware option and select the NVIDIA H200**
	accelerator. If ZeroGPU or H200 does not appear in the hardware
	selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】.

	2. Add secrets in your Space settings. Under Settings → Secrets:
	- `HF_TOKEN` – a Hugging Face access token with permission to
	download `google/medgemma-27b-it`. Without this token the model
	cannot be loaded. The Hugging Face documentation recommends
	storing tokens and API keys in secrets rather than hard‑coding
	them【188898489867690†L175-L193】.
	- `API_KEY` (optional) – a random string used to protect your Space.
	If set, callers must provide the same value in the API key field
	when using the interface or when calling the model programmatically.

	3. Upload the files in this repository to your Space. The
	`app.py` file defines the Gradio interface and lazy‑loads the model.
	The `requirements.txt` lists the Python dependencies.

	4. Once the Space is built, open it in your browser. Enter your
	question, optionally upload an image, and click Submit. The model
	will run on an H200 slice and return an answer.

	## Programmatic access with `gradio_client`

	You can call this Space from your own Python code using the
	[`gradio_client`](https://github.com/gradio-app/gradio/tree/main/client) package. The client
	connects to the Space and invokes the `/predict` endpoint. If you have
	configured an API key, supply it as the last argument. Example:

	```python
	from gradio_client import Client

	space_name = "<user>/<space>" # replace with your Space
	client = Client(space_name)

	# Prepare inputs: prompt, image (None), system prompt, api_key
	result = client.predict(
	"Please examine this chest X‑ray.",
	None,
	"You are a concise radiology assistant.",
	"my_secret_key", # or omit if API_KEY is not set
	api_name="/predict",
	)
	print(result)
	```

	The inputs must be provided in the same order as defined in `app.py`:
	1. prompt (string) – required
	2. image (`PIL.Image.Image` or `None`) – optional
	3. system_prompt (string or `None`) – optional
	4. api_key (string or `None`) – required only if you set `API_KEY`

	If you prefer a cURL call, you can send a JSON payload to the
	`/predict` endpoint. For example:

	```bash
	curl -X POST \
	-H "Content-Type: application/json" \
	-d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \
	https://huggingface.co/spaces/<user>/<space>/predict
	```

	Note that Gradio sets the `api_name` of the prediction endpoint to
	`/predict` by default when using `gr.Interface(fn=...)`.

	## Running locally

	You can also run this application locally for testing. Install the
	dependencies and start the Gradio server:

	```bash
	python3 -m venv venv
	source venv/bin/activate
	pip install -r requirements.txt
	export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx # set your token
	python app.py
	```

	Open `http://localhost:7860` in your browser. Running locally will
	execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic
	allocation only works within a Hugging Face Space.

	## Dependencies

	The `requirements.txt` file specifies the Python packages needed to run
	this project. It includes Gradio, `spaces` for ZeroGPU support, and the
	transformers library. These versions are selected to be compatible
	with ZeroGPU【916380489845432†L110-L118】.

	## Disclaimer

	The MedGemma model is for research and educational purposes only. It
	may generate incorrect or harmful content and should not be used for
	medical diagnosis or treatment. Always consult a licensed medical
	professional for health questions. This application appends a
	disclaimer to every response to remind users of these limitations.