medgemmank / README.md
dflehel's picture
Upload README.md
788e18b verified

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: medgemma27
emoji: 🏠
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

MedGemma ZeroGPU Gradio Space

This repository contains a minimal Gradio application that wraps Google’s medgemma‑27b‑it multi‑modal model and exposes it via a browser‑based interface. The app is designed to run on Hugging Face Spaces configured with the ZeroGPU (Dynamic resources) option. ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on demand. Existing ZeroGPU Spaces can be used for free, and the infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】. However, hosting your own ZeroGPU Space requires a PRO or Enterprise Hub subscription【283503741553473†L118-L127】. ZeroGPU Spaces are only compatible with the Gradio SDK and specific versions of PyTorch and Python【916380489845432†L110-L118】, which is why this project uses Gradio instead of a raw FastAPI server.

Features

  • English‑only input and output – The interface accepts a question in English and returns the model’s answer in English. A disclaimer is appended to every response reminding users to consult a medical professional.
  • Multi‑modal support – Optionally upload an image to provide additional context for the model. The input text and image are processed together.
  • Custom system prompt – You can supply your own system prompt to steer the model’s behaviour. If omitted a default radiology assistant instruction is used.
  • Optional API key – If you set an API_KEY secret in your Space, the UI will display a hidden API key field. Clients must enter the same value when calling the model; otherwise the request is rejected.
  • ZeroGPU integration – The heavy computation is wrapped in a function decorated with @spaces.GPU, which allocates an H200 slice for the duration of the call and releases it afterwards.

Setup

  1. Create a Gradio Space on Hugging Face. Choose the ZeroGPU (Dynamic resources) hardware option and select the NVIDIA H200 accelerator. If ZeroGPU or H200 does not appear in the hardware selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】.

  2. Add secrets in your Space settings. Under Settings → Secrets:

    • HF_TOKEN – a Hugging Face access token with permission to download google/medgemma-27b-it. Without this token the model cannot be loaded. The Hugging Face documentation recommends storing tokens and API keys in secrets rather than hard‑coding them【188898489867690†L175-L193】.
    • API_KEY (optional) – a random string used to protect your Space. If set, callers must provide the same value in the API key field when using the interface or when calling the model programmatically.
  3. Upload the files in this repository to your Space. The app.py file defines the Gradio interface and lazy‑loads the model. The requirements.txt lists the Python dependencies.

  4. Once the Space is built, open it in your browser. Enter your question, optionally upload an image, and click Submit. The model will run on an H200 slice and return an answer.

Programmatic access with gradio_client

You can call this Space from your own Python code using the gradio_client package. The client connects to the Space and invokes the /predict endpoint. If you have configured an API key, supply it as the last argument. Example:

from gradio_client import Client

space_name = "<user>/<space>"  # replace with your Space
client = Client(space_name)

# Prepare inputs: prompt, image (None), system prompt, api_key
result = client.predict(
    "Please examine this chest X‑ray.",
    None,
    "You are a concise radiology assistant.",
    "my_secret_key",  # or omit if API_KEY is not set
    api_name="/predict",
)
print(result)

The inputs must be provided in the same order as defined in app.py:

  1. prompt (string) – required
  2. image (PIL.Image.Image or None) – optional
  3. system_prompt (string or None) – optional
  4. api_key (string or None) – required only if you set API_KEY

If you prefer a cURL call, you can send a JSON payload to the /predict endpoint. For example:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \
  https://huggingface.co/spaces/<user>/<space>/predict

Note that Gradio sets the api_name of the prediction endpoint to /predict by default when using gr.Interface(fn=...).

Running locally

You can also run this application locally for testing. Install the dependencies and start the Gradio server:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx  # set your token
python app.py

Open http://localhost:7860 in your browser. Running locally will execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic allocation only works within a Hugging Face Space.

Dependencies

The requirements.txt file specifies the Python packages needed to run this project. It includes Gradio, spaces for ZeroGPU support, and the transformers library. These versions are selected to be compatible with ZeroGPU【916380489845432†L110-L118】.

Disclaimer

The MedGemma model is for research and educational purposes only. It may generate incorrect or harmful content and should not be used for medical diagnosis or treatment. Always consult a licensed medical professional for health questions. This application appends a disclaimer to every response to remind users of these limitations.