Spaces:
Running
on
Zero
Running
on
Zero
| title: medgemma27 | |
| emoji: 🏠 | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| # MedGemma ZeroGPU Gradio Space | |
| This repository contains a minimal **Gradio** application that wraps | |
| Google’s `medgemma‑27b‑it` multi‑modal model and exposes it via a | |
| browser‑based interface. The app is designed to run on **Hugging Face | |
| Spaces** configured with the **ZeroGPU (Dynamic resources)** option. | |
| ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on | |
| demand. Existing ZeroGPU Spaces can be used for free, and the | |
| infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】. | |
| However, hosting your own ZeroGPU Space requires a PRO or Enterprise | |
| Hub subscription【283503741553473†L118-L127】. ZeroGPU Spaces are only | |
| compatible with the **Gradio SDK** and specific versions of PyTorch and | |
| Python【916380489845432†L110-L118】, which is why this project uses Gradio | |
| instead of a raw FastAPI server. | |
| ## Features | |
| - **English‑only input and output** – The interface accepts a question in | |
| English and returns the model’s answer in English. A disclaimer is | |
| appended to every response reminding users to consult a medical | |
| professional. | |
| - **Multi‑modal support** – Optionally upload an image to provide | |
| additional context for the model. The input text and image are | |
| processed together. | |
| - **Custom system prompt** – You can supply your own system prompt to | |
| steer the model’s behaviour. If omitted a default radiology assistant | |
| instruction is used. | |
| - **Optional API key** – If you set an `API_KEY` secret in your Space, | |
| the UI will display a hidden API key field. Clients must enter the | |
| same value when calling the model; otherwise the request is rejected. | |
| - **ZeroGPU integration** – The heavy computation is wrapped in a | |
| function decorated with `@spaces.GPU`, which allocates an H200 slice | |
| for the duration of the call and releases it afterwards. | |
| ## Setup | |
| 1. **Create a Gradio Space** on Hugging Face. Choose the **ZeroGPU | |
| (Dynamic resources)** hardware option and select the **NVIDIA H200** | |
| accelerator. If ZeroGPU or H200 does not appear in the hardware | |
| selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】. | |
| 2. **Add secrets** in your Space settings. Under **Settings → Secrets**: | |
| - `HF_TOKEN` – a Hugging Face access token with permission to | |
| download `google/medgemma-27b-it`. Without this token the model | |
| cannot be loaded. The Hugging Face documentation recommends | |
| storing tokens and API keys in secrets rather than hard‑coding | |
| them【188898489867690†L175-L193】. | |
| - `API_KEY` (optional) – a random string used to protect your Space. | |
| If set, callers must provide the same value in the API key field | |
| when using the interface or when calling the model programmatically. | |
| 3. **Upload the files** in this repository to your Space. The | |
| `app.py` file defines the Gradio interface and lazy‑loads the model. | |
| The `requirements.txt` lists the Python dependencies. | |
| 4. Once the Space is built, open it in your browser. Enter your | |
| question, optionally upload an image, and click *Submit*. The model | |
| will run on an H200 slice and return an answer. | |
| ## Programmatic access with `gradio_client` | |
| You can call this Space from your own Python code using the | |
| [`gradio_client`](https://github.com/gradio-app/gradio/tree/main/client) package. The client | |
| connects to the Space and invokes the `/predict` endpoint. If you have | |
| configured an API key, supply it as the last argument. Example: | |
| ```python | |
| from gradio_client import Client | |
| space_name = "<user>/<space>" # replace with your Space | |
| client = Client(space_name) | |
| # Prepare inputs: prompt, image (None), system prompt, api_key | |
| result = client.predict( | |
| "Please examine this chest X‑ray.", | |
| None, | |
| "You are a concise radiology assistant.", | |
| "my_secret_key", # or omit if API_KEY is not set | |
| api_name="/predict", | |
| ) | |
| print(result) | |
| ``` | |
| The inputs must be provided in the same order as defined in `app.py`: | |
| 1. **prompt** (string) – required | |
| 2. **image** (`PIL.Image.Image` or `None`) – optional | |
| 3. **system_prompt** (string or `None`) – optional | |
| 4. **api_key** (string or `None`) – required only if you set `API_KEY` | |
| If you prefer a cURL call, you can send a JSON payload to the | |
| `/predict` endpoint. For example: | |
| ```bash | |
| curl -X POST \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \ | |
| https://huggingface.co/spaces/<user>/<space>/predict | |
| ``` | |
| Note that Gradio sets the `api_name` of the prediction endpoint to | |
| `/predict` by default when using `gr.Interface(fn=...)`. | |
| ## Running locally | |
| You can also run this application locally for testing. Install the | |
| dependencies and start the Gradio server: | |
| ```bash | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| pip install -r requirements.txt | |
| export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx # set your token | |
| python app.py | |
| ``` | |
| Open `http://localhost:7860` in your browser. Running locally will | |
| execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic | |
| allocation only works within a Hugging Face Space. | |
| ## Dependencies | |
| The `requirements.txt` file specifies the Python packages needed to run | |
| this project. It includes Gradio, `spaces` for ZeroGPU support, and the | |
| transformers library. These versions are selected to be compatible | |
| with ZeroGPU【916380489845432†L110-L118】. | |
| ## Disclaimer | |
| The MedGemma model is for research and educational purposes only. It | |
| may generate incorrect or harmful content and should **not** be used for | |
| medical diagnosis or treatment. Always consult a licensed medical | |
| professional for health questions. This application appends a | |
| disclaimer to every response to remind users of these limitations. |