Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
5.49.1
title: medgemma27
emoji: 🏠
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
MedGemma ZeroGPU Gradio Space
This repository contains a minimal Gradio application that wraps
Google’s medgemma‑27b‑it multi‑modal model and exposes it via a
browser‑based interface. The app is designed to run on Hugging Face
Spaces configured with the ZeroGPU (Dynamic resources) option.
ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on
demand. Existing ZeroGPU Spaces can be used for free, and the
infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】.
However, hosting your own ZeroGPU Space requires a PRO or Enterprise
Hub subscription【283503741553473†L118-L127】. ZeroGPU Spaces are only
compatible with the Gradio SDK and specific versions of PyTorch and
Python【916380489845432†L110-L118】, which is why this project uses Gradio
instead of a raw FastAPI server.
Features
- English‑only input and output – The interface accepts a question in English and returns the model’s answer in English. A disclaimer is appended to every response reminding users to consult a medical professional.
- Multi‑modal support – Optionally upload an image to provide additional context for the model. The input text and image are processed together.
- Custom system prompt – You can supply your own system prompt to steer the model’s behaviour. If omitted a default radiology assistant instruction is used.
- Optional API key – If you set an
API_KEYsecret in your Space, the UI will display a hidden API key field. Clients must enter the same value when calling the model; otherwise the request is rejected. - ZeroGPU integration – The heavy computation is wrapped in a
function decorated with
@spaces.GPU, which allocates an H200 slice for the duration of the call and releases it afterwards.
Setup
Create a Gradio Space on Hugging Face. Choose the ZeroGPU (Dynamic resources) hardware option and select the NVIDIA H200 accelerator. If ZeroGPU or H200 does not appear in the hardware selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】.
Add secrets in your Space settings. Under Settings → Secrets:
HF_TOKEN– a Hugging Face access token with permission to downloadgoogle/medgemma-27b-it. Without this token the model cannot be loaded. The Hugging Face documentation recommends storing tokens and API keys in secrets rather than hard‑coding them【188898489867690†L175-L193】.API_KEY(optional) – a random string used to protect your Space. If set, callers must provide the same value in the API key field when using the interface or when calling the model programmatically.
Upload the files in this repository to your Space. The
app.pyfile defines the Gradio interface and lazy‑loads the model. Therequirements.txtlists the Python dependencies.Once the Space is built, open it in your browser. Enter your question, optionally upload an image, and click Submit. The model will run on an H200 slice and return an answer.
Programmatic access with gradio_client
You can call this Space from your own Python code using the
gradio_client package. The client
connects to the Space and invokes the /predict endpoint. If you have
configured an API key, supply it as the last argument. Example:
from gradio_client import Client
space_name = "<user>/<space>" # replace with your Space
client = Client(space_name)
# Prepare inputs: prompt, image (None), system prompt, api_key
result = client.predict(
"Please examine this chest X‑ray.",
None,
"You are a concise radiology assistant.",
"my_secret_key", # or omit if API_KEY is not set
api_name="/predict",
)
print(result)
The inputs must be provided in the same order as defined in app.py:
- prompt (string) – required
- image (
PIL.Image.ImageorNone) – optional - system_prompt (string or
None) – optional - api_key (string or
None) – required only if you setAPI_KEY
If you prefer a cURL call, you can send a JSON payload to the
/predict endpoint. For example:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \
https://huggingface.co/spaces/<user>/<space>/predict
Note that Gradio sets the api_name of the prediction endpoint to
/predict by default when using gr.Interface(fn=...).
Running locally
You can also run this application locally for testing. Install the dependencies and start the Gradio server:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx # set your token
python app.py
Open http://localhost:7860 in your browser. Running locally will
execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic
allocation only works within a Hugging Face Space.
Dependencies
The requirements.txt file specifies the Python packages needed to run
this project. It includes Gradio, spaces for ZeroGPU support, and the
transformers library. These versions are selected to be compatible
with ZeroGPU【916380489845432†L110-L118】.
Disclaimer
The MedGemma model is for research and educational purposes only. It may generate incorrect or harmful content and should not be used for medical diagnosis or treatment. Always consult a licensed medical professional for health questions. This application appends a disclaimer to every response to remind users of these limitations.