dflehel commited on
Commit
224b1c2
·
verified ·
1 Parent(s): 25e91dd
Files changed (3) hide show
  1. README.md +133 -12
  2. app.py +255 -0
  3. requirements.txt +8 -0
README.md CHANGED
@@ -1,12 +1,133 @@
1
- ---
2
- title: Medgemmank
3
- emoji: 👀
4
- colorFrom: indigo
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 5.49.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MedGemma ZeroGPU Gradio Space
2
+
3
+ This repository contains a minimal **Gradio** application that wraps
4
+ Google’s `medgemma‑27b‑it` multi‑modal model and exposes it via a
5
+ browser‑based interface. The app is designed to run on **Hugging Face
6
+ Spaces** configured with the **ZeroGPU (Dynamic resources)** option.
7
+ ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on
8
+ demand. Existing ZeroGPU Spaces can be used for free, and the
9
+ infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】.
10
+ However, hosting your own ZeroGPU Space requires a PRO or Enterprise
11
+ Hub subscription【283503741553473†L118-L127】. ZeroGPU Spaces are only
12
+ compatible with the **Gradio SDK** and specific versions of PyTorch and
13
+ Python【916380489845432†L110-L118】, which is why this project uses Gradio
14
+ instead of a raw FastAPI server.
15
+
16
+ ## Features
17
+
18
+ - **English‑only input and output** – The interface accepts a question in
19
+ English and returns the model’s answer in English. A disclaimer is
20
+ appended to every response reminding users to consult a medical
21
+ professional.
22
+ - **Multi‑modal support** – Optionally upload an image to provide
23
+ additional context for the model. The input text and image are
24
+ processed together.
25
+ - **Custom system prompt** – You can supply your own system prompt to
26
+ steer the model’s behaviour. If omitted a default radiology assistant
27
+ instruction is used.
28
+ - **Optional API key** – If you set an `API_KEY` secret in your Space,
29
+ the UI will display a hidden API key field. Clients must enter the
30
+ same value when calling the model; otherwise the request is rejected.
31
+ - **ZeroGPU integration** – The heavy computation is wrapped in a
32
+ function decorated with `@spaces.GPU`, which allocates an H200 slice
33
+ for the duration of the call and releases it afterwards.
34
+
35
+ ## Setup
36
+
37
+ 1. **Create a Gradio Space** on Hugging Face. Choose the **ZeroGPU
38
+ (Dynamic resources)** hardware option and select the **NVIDIA H200**
39
+ accelerator. If ZeroGPU or H200 does not appear in the hardware
40
+ selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】.
41
+
42
+ 2. **Add secrets** in your Space settings. Under **Settings → Secrets**:
43
+ - `HF_TOKEN` – a Hugging Face access token with permission to
44
+ download `google/medgemma-27b-it`. Without this token the model
45
+ cannot be loaded. The Hugging Face documentation recommends
46
+ storing tokens and API keys in secrets rather than hard‑coding
47
+ them【188898489867690†L175-L193】.
48
+ - `API_KEY` (optional) – a random string used to protect your Space.
49
+ If set, callers must provide the same value in the API key field
50
+ when using the interface or when calling the model programmatically.
51
+
52
+ 3. **Upload the files** in this repository to your Space. The
53
+ `app.py` file defines the Gradio interface and lazy‑loads the model.
54
+ The `requirements.txt` lists the Python dependencies.
55
+
56
+ 4. Once the Space is built, open it in your browser. Enter your
57
+ question, optionally upload an image, and click *Submit*. The model
58
+ will run on an H200 slice and return an answer.
59
+
60
+ ## Programmatic access with `gradio_client`
61
+
62
+ You can call this Space from your own Python code using the
63
+ [`gradio_client`](https://github.com/gradio-app/gradio/tree/main/client) package. The client
64
+ connects to the Space and invokes the `/predict` endpoint. If you have
65
+ configured an API key, supply it as the last argument. Example:
66
+
67
+ ```python
68
+ from gradio_client import Client
69
+
70
+ space_name = "<user>/<space>" # replace with your Space
71
+ client = Client(space_name)
72
+
73
+ # Prepare inputs: prompt, image (None), system prompt, api_key
74
+ result = client.predict(
75
+ "Please examine this chest X‑ray.",
76
+ None,
77
+ "You are a concise radiology assistant.",
78
+ "my_secret_key", # or omit if API_KEY is not set
79
+ api_name="/predict",
80
+ )
81
+ print(result)
82
+ ```
83
+
84
+ The inputs must be provided in the same order as defined in `app.py`:
85
+ 1. **prompt** (string) – required
86
+ 2. **image** (`PIL.Image.Image` or `None`) – optional
87
+ 3. **system_prompt** (string or `None`) – optional
88
+ 4. **api_key** (string or `None`) – required only if you set `API_KEY`
89
+
90
+ If you prefer a cURL call, you can send a JSON payload to the
91
+ `/predict` endpoint. For example:
92
+
93
+ ```bash
94
+ curl -X POST \
95
+ -H "Content-Type: application/json" \
96
+ -d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \
97
+ https://huggingface.co/spaces/<user>/<space>/predict
98
+ ```
99
+
100
+ Note that Gradio sets the `api_name` of the prediction endpoint to
101
+ `/predict` by default when using `gr.Interface(fn=...)`.
102
+
103
+ ## Running locally
104
+
105
+ You can also run this application locally for testing. Install the
106
+ dependencies and start the Gradio server:
107
+
108
+ ```bash
109
+ python3 -m venv venv
110
+ source venv/bin/activate
111
+ pip install -r requirements.txt
112
+ export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx # set your token
113
+ python app.py
114
+ ```
115
+
116
+ Open `http://localhost:7860` in your browser. Running locally will
117
+ execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic
118
+ allocation only works within a Hugging Face Space.
119
+
120
+ ## Dependencies
121
+
122
+ The `requirements.txt` file specifies the Python packages needed to run
123
+ this project. It includes Gradio, `spaces` for ZeroGPU support, and the
124
+ transformers library. These versions are selected to be compatible
125
+ with ZeroGPU【916380489845432†L110-L118】.
126
+
127
+ ## Disclaimer
128
+
129
+ The MedGemma model is for research and educational purposes only. It
130
+ may generate incorrect or harmful content and should **not** be used for
131
+ medical diagnosis or treatment. Always consult a licensed medical
132
+ professional for health questions. This application appends a
133
+ disclaimer to every response to remind users of these limitations.
app.py ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Gradio application for MedGemma inference with ZeroGPU.
3
+
4
+ This script defines a minimal Gradio interface around Google's
5
+ ``medgemma‑27b‑it`` multi‑modal model. It is designed to run on
6
+ Hugging Face Spaces using the **ZeroGPU** hardware option. ZeroGPU
7
+ allocates an NVIDIA H200 GPU slice for the duration of each call and
8
+ releases it afterwards. The interface accepts a textual **prompt**
9
+ (English only), an optional image upload and an optional **system
10
+ prompt** to steer the model. All responses are returned in English and
11
+ include a short disclaimer reminding users to consult a medical
12
+ professional.
13
+
14
+ If you set an ``API_KEY`` secret in your Space, callers must supply the
15
+ same value in the hidden API key field. Otherwise the endpoint will be
16
+ publicly accessible. See the README for details.
17
+
18
+ Note: ZeroGPU Spaces currently only work with the **Gradio** SDK and
19
+ support specific versions of PyTorch and Python【916380489845432†L110-L118】.
20
+ Running this script outside of a Space will work on CPU or dedicated
21
+ GPU hardware, but ZeroGPU GPU allocation only takes effect when the
22
+ Space hardware is set to *ZeroGPU (Dynamic resources)*.
23
+ """
24
+
25
+ import os
26
+ from typing import Optional
27
+
28
+ import gradio as gr
29
+ from PIL import Image
30
+ import torch
31
+ from transformers import (
32
+ AutoProcessor,
33
+ AutoModelForImageTextToText,
34
+ GenerationConfig,
35
+ pipeline,
36
+ )
37
+ import spaces # for the @spaces.GPU decorator
38
+
39
+ # ----------------------------------------------------------------------------
40
+ # Configuration
41
+ # ----------------------------------------------------------------------------
42
+
43
+ HF_TOKEN = os.getenv("HF_TOKEN")
44
+ if HF_TOKEN is None:
45
+ raise RuntimeError(
46
+ "HF_TOKEN environment variable must be set as a Secret in the Space."
47
+ )
48
+
49
+ # Optional API key: when set, clients must provide the same value in the
50
+ # hidden ``api_key`` field of the Gradio interface. If not set, no
51
+ # authentication is enforced.
52
+ API_KEY = os.getenv("API_KEY")
53
+
54
+ MODEL_ID = "google/medgemma-27b-it"
55
+
56
+ # Load the processor outside of the GPU context – this is lightweight
57
+ processor = AutoProcessor.from_pretrained(MODEL_ID, token=HF_TOKEN, trust_remote_code=True)
58
+
59
+ eos_id = processor.tokenizer.eos_token_id
60
+ pad_id = processor.tokenizer.pad_token_id or eos_id
61
+
62
+ # Banned phrases to reduce chatty or irrelevant responses
63
+ ban_list = [
64
+ "Disclaimer",
65
+ "disclaimer",
66
+ "As an AI Chatbot",
67
+ "as an AI Chatbot",
68
+ "I cannot give medical advice",
69
+ "I cannot provide medical advice",
70
+ "I cannot give medical advise",
71
+ "user",
72
+ "response",
73
+ "display",
74
+ "response>",
75
+ "```",
76
+ "label",
77
+ "tool_code",
78
+ ]
79
+ bad_words_ids = [processor.tokenizer(b, add_special_tokens=False).input_ids for b in ban_list]
80
+
81
+ gen_cfg = GenerationConfig(
82
+ max_new_tokens=120,
83
+ do_sample=False,
84
+ repetition_penalty=1.12,
85
+ no_repeat_ngram_size=6,
86
+ length_penalty=1.0,
87
+ temperature=0.0,
88
+ eos_token_id=eos_id,
89
+ pad_token_id=pad_id,
90
+ bad_words_ids=bad_words_ids,
91
+ )
92
+
93
+ # We'll load the model lazily inside run_model to ensure GPU allocation
94
+ # occurs within the ZeroGPU context. Cache the model and pipeline on
95
+ # first use so subsequent calls are faster. A simple attribute on the
96
+ # function serves as a persistent cache.
97
+
98
+
99
+ @spaces.GPU(duration=120)
100
+ def run_model(prompt: str, image: Optional[Image.Image], system_prompt: Optional[str]) -> str:
101
+ """Execute the MedGemma model.
102
+
103
+ This function will be run inside the ZeroGPU allocation context. It
104
+ lazily loads the model and pipeline on first invocation and reuses
105
+ them for subsequent calls. Inputs are combined with an optional
106
+ system prompt to produce the full prompt. The model's output is
107
+ returned as a plain English string.
108
+
109
+ Args:
110
+ prompt: The user's question (English only).
111
+ image: An optional PIL Image. If provided, the model will use
112
+ both text and image modalities; otherwise text-only.
113
+ system_prompt: An optional system prompt to steer the model. If
114
+ None or empty, a default instruction is used.
115
+
116
+ Returns:
117
+ The raw English output from the model (without disclaimer).
118
+ """
119
+ # Lazy‑load the model and pipeline on first use
120
+ if not hasattr(run_model, "model"):
121
+ # Determine the appropriate dtype and device map. We'll load on
122
+ # auto to split across CPU/GPU if necessary. Use bfloat16 when
123
+ # CUDA is available to save memory on H200.
124
+ model_kwargs: dict = {
125
+ "torch_dtype": torch.bfloat16 if torch.cuda.is_available() else torch.float32,
126
+ "token": HF_TOKEN,
127
+ }
128
+ if torch.cuda.is_available():
129
+ model_kwargs["device_map"] = "auto"
130
+ model = AutoModelForImageTextToText.from_pretrained(MODEL_ID, **model_kwargs)
131
+ # Create a pipeline for convenience
132
+ vlm = pipeline(
133
+ task="image-text-to-text",
134
+ model=model,
135
+ processor=processor,
136
+ generation_config=gen_cfg,
137
+ )
138
+ # Store for reuse
139
+ run_model.model = model
140
+ run_model.vlm = vlm
141
+ else:
142
+ vlm = run_model.vlm
143
+
144
+ # Compose the full prompt
145
+ sys_prompt = (
146
+ system_prompt.strip()
147
+ if system_prompt and system_prompt.strip()
148
+ else "You are a concise radiology assistant. Answer the user's question based on the image and text."
149
+ )
150
+ full_prompt = sys_prompt + "\n" + prompt.strip()
151
+
152
+ # Run inference
153
+ if image is not None:
154
+ result = vlm(image, full_prompt)
155
+ else:
156
+ result = vlm(full_prompt)
157
+ output = result[0]["generated_text"]
158
+ return output
159
+
160
+
161
+ def predict(
162
+ prompt: str,
163
+ image: Optional[Image.Image] = None,
164
+ system_prompt: Optional[str] = None,
165
+ api_key: Optional[str] = None,
166
+ ) -> str:
167
+ """Wrapper function for Gradio.
168
+
169
+ Handles optional API key authentication and appends a disclaimer to
170
+ the model's output. See README for details.
171
+
172
+ Args:
173
+ prompt: The user's question in English.
174
+ image: An optional PIL image.
175
+ system_prompt: Optional system prompt to steer the model.
176
+ api_key: Optional API key supplied by the client. If the
177
+ ``API_KEY`` secret is set and this does not match, the
178
+ request is rejected.
179
+
180
+ Returns:
181
+ A string containing the model's answer followed by a
182
+ disclaimer. If authentication fails an error message is
183
+ returned instead.
184
+ """
185
+ # Enforce API key if configured
186
+ if API_KEY:
187
+ if api_key is None or api_key != API_KEY:
188
+ return "Error: Invalid or missing API key."
189
+
190
+ # Validate prompt
191
+ if not prompt or not prompt.strip():
192
+ return "Error: Prompt cannot be empty."
193
+
194
+ try:
195
+ answer = run_model(prompt, image, system_prompt)
196
+ except Exception as e:
197
+ return f"Error during inference: {e}"
198
+ disclaimer = (
199
+ "\n\nThis response is generated by an AI model and may be incorrect. "
200
+ "Always consult a licensed medical professional for health questions."
201
+ )
202
+ return answer.strip() + disclaimer
203
+
204
+
205
+ def build_demo() -> gr.Interface:
206
+ """Construct the Gradio UI for this application."""
207
+ # Define inputs: prompt, optional image, optional system prompt, and
208
+ # optional API key (hidden from the UI). When API_KEY is not
209
+ # configured the api_key input is ignored.
210
+ inputs = [
211
+ gr.Textbox(
212
+ label="Prompt (English only)",
213
+ lines=4,
214
+ placeholder="Describe the medical image or ask a question."
215
+ ),
216
+ gr.Image(
217
+ type="pil",
218
+ label="Optional image"
219
+ ),
220
+ gr.Textbox(
221
+ label="Optional system prompt",
222
+ lines=2,
223
+ placeholder="e.g. You are a concise radiology assistant."
224
+ ),
225
+ gr.Textbox(
226
+ label="API key",
227
+ lines=1,
228
+ placeholder="Enter API key if required",
229
+ type="password",
230
+ visible=bool(API_KEY),
231
+ ),
232
+ ]
233
+ outputs = gr.Textbox(label="Answer")
234
+ description = (
235
+ "Ask MedGemma a question about a medical image or condition. "
236
+ "Optionally provide a system prompt to guide the model's behaviour. "
237
+ "All responses are in English and include a disclaimer."
238
+ )
239
+ demo = gr.Interface(
240
+ fn=predict,
241
+ inputs=inputs,
242
+ outputs=outputs,
243
+ title="MedGemma ZeroGPU (Gradio)",
244
+ description=description,
245
+ allow_flagging="never",
246
+ )
247
+ return demo
248
+
249
+
250
+ demo = build_demo()
251
+
252
+ if __name__ == "__main__":
253
+ # Launch with share=False to bind to the default port. In Spaces this
254
+ # function is not executed; Spaces uses the Gradio SDK to run the app.
255
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0,<5.0.0
2
+ spaces
3
+ transformers
4
+ torch>=2.1.0
5
+ huggingface-hub>=0.19.1
6
+ Pillow
7
+ # gradio_client is optional but useful for programmatic access
8
+ gradio_client