Spaces:

dflehel
/

medgemmank

Running on Zero

File size: 5,931 Bytes

---
title: medgemma27
emoji: 🏠
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---


# MedGemma ZeroGPU Gradio Space

This repository contains a minimal **Gradio** application that wraps
Google’s `medgemma‑27b‑it` multi‑modal model and exposes it via a
browser‑based interface.  The app is designed to run on **Hugging Face
Spaces** configured with the **ZeroGPU (Dynamic resources)** option.
ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on
demand.  Existing ZeroGPU Spaces can be used for free, and the
infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】.
However, hosting your own ZeroGPU Space requires a PRO or Enterprise
Hub subscription【283503741553473†L118-L127】.  ZeroGPU Spaces are only
compatible with the **Gradio SDK** and specific versions of PyTorch and
Python【916380489845432†L110-L118】, which is why this project uses Gradio
instead of a raw FastAPI server.

## Features

- **English‑only input and output** – The interface accepts a question in
  English and returns the model’s answer in English.  A disclaimer is
  appended to every response reminding users to consult a medical
  professional.
- **Multi‑modal support** – Optionally upload an image to provide
  additional context for the model.  The input text and image are
  processed together.
- **Custom system prompt** – You can supply your own system prompt to
  steer the model’s behaviour.  If omitted a default radiology assistant
  instruction is used.
- **Optional API key** – If you set an `API_KEY` secret in your Space,
  the UI will display a hidden API key field.  Clients must enter the
  same value when calling the model; otherwise the request is rejected.
- **ZeroGPU integration** – The heavy computation is wrapped in a
  function decorated with `@spaces.GPU`, which allocates an H200 slice
  for the duration of the call and releases it afterwards.

## Setup

1. **Create a Gradio Space** on Hugging Face.  Choose the **ZeroGPU
   (Dynamic resources)** hardware option and select the **NVIDIA H200**
   accelerator.  If ZeroGPU or H200 does not appear in the hardware
   selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】.

2. **Add secrets** in your Space settings.  Under **Settings → Secrets**:
   - `HF_TOKEN` – a Hugging Face access token with permission to
     download `google/medgemma-27b-it`.  Without this token the model
     cannot be loaded.  The Hugging Face documentation recommends
     storing tokens and API keys in secrets rather than hard‑coding
     them【188898489867690†L175-L193】.
   - `API_KEY` (optional) – a random string used to protect your Space.
     If set, callers must provide the same value in the API key field
     when using the interface or when calling the model programmatically.

3. **Upload the files** in this repository to your Space.  The
   `app.py` file defines the Gradio interface and lazy‑loads the model.
   The `requirements.txt` lists the Python dependencies.

4. Once the Space is built, open it in your browser.  Enter your
   question, optionally upload an image, and click *Submit*.  The model
   will run on an H200 slice and return an answer.

## Programmatic access with `gradio_client`

You can call this Space from your own Python code using the
[`gradio_client`](https://github.com/gradio-app/gradio/tree/main/client) package.  The client
connects to the Space and invokes the `/predict` endpoint.  If you have
configured an API key, supply it as the last argument.  Example:

```python
from gradio_client import Client

space_name = "<user>/<space>"  # replace with your Space
client = Client(space_name)

# Prepare inputs: prompt, image (None), system prompt, api_key
result = client.predict(
    "Please examine this chest X‑ray.",
    None,
    "You are a concise radiology assistant.",
    "my_secret_key",  # or omit if API_KEY is not set
    api_name="/predict",
)
print(result)
```

The inputs must be provided in the same order as defined in `app.py`:
1. **prompt** (string) – required
2. **image** (`PIL.Image.Image` or `None`) – optional
3. **system_prompt** (string or `None`) – optional
4. **api_key** (string or `None`) – required only if you set `API_KEY`

If you prefer a cURL call, you can send a JSON payload to the
`/predict` endpoint.  For example:

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \
  https://huggingface.co/spaces/<user>/<space>/predict
```

Note that Gradio sets the `api_name` of the prediction endpoint to
`/predict` by default when using `gr.Interface(fn=...)`.

## Running locally

You can also run this application locally for testing.  Install the
dependencies and start the Gradio server:

```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx  # set your token
python app.py
```

Open `http://localhost:7860` in your browser.  Running locally will
execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic
allocation only works within a Hugging Face Space.

## Dependencies

The `requirements.txt` file specifies the Python packages needed to run
this project.  It includes Gradio, `spaces` for ZeroGPU support, and the
transformers library.  These versions are selected to be compatible
with ZeroGPU【916380489845432†L110-L118】.

## Disclaimer

The MedGemma model is for research and educational purposes only.  It
may generate incorrect or harmful content and should **not** be used for
medical diagnosis or treatment.  Always consult a licensed medical
professional for health questions.  This application appends a
disclaimer to every response to remind users of these limitations.