Spaces:

dflehel
/

medgemmank

Running on Zero

App Files Files Community

dflehel commited on Oct 7

Commit

224b1c2

verified ·

1 Parent(s): 25e91dd

first

Browse files

Files changed (3) hide show

README.md +133 -12
app.py +255 -0
requirements.txt +8 -0

README.md CHANGED Viewed

@@ -1,12 +1,133 @@
----
-title: Medgemmank
-emoji: 👀
-colorFrom: indigo
-colorTo: green
-sdk: gradio
-sdk_version: 5.49.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# MedGemma ZeroGPU Gradio Space
+This repository contains a minimal **Gradio** application that wraps
+Google’s `medgemma‑27b‑it` multi‑modal model and exposes it via a
+browser‑based interface.  The app is designed to run on **Hugging Face
+Spaces** configured with the **ZeroGPU (Dynamic resources)** option.
+ZeroGPU dynamically allocates and releases NVIDIA H200 GPU slices on
+demand.  Existing ZeroGPU Spaces can be used for free, and the
+infrastructure supports multi‑GPU allocation for large models【283503741553473†L104-L132】.
+However, hosting your own ZeroGPU Space requires a PRO or Enterprise
+Hub subscription【283503741553473†L118-L127】.  ZeroGPU Spaces are only
+compatible with the **Gradio SDK** and specific versions of PyTorch and
+Python【916380489845432†L110-L118】, which is why this project uses Gradio
+instead of a raw FastAPI server.
+## Features
+- **English‑only input and output** – The interface accepts a question in
+  English and returns the model’s answer in English.  A disclaimer is
+  appended to every response reminding users to consult a medical
+  professional.
+- **Multi‑modal support** – Optionally upload an image to provide
+  additional context for the model.  The input text and image are
+  processed together.
+- **Custom system prompt** – You can supply your own system prompt to
+  steer the model’s behaviour.  If omitted a default radiology assistant
+  instruction is used.
+- **Optional API key** – If you set an `API_KEY` secret in your Space,
+  the UI will display a hidden API key field.  Clients must enter the
+  same value when calling the model; otherwise the request is rejected.
+- **ZeroGPU integration** – The heavy computation is wrapped in a
+  function decorated with `@spaces.GPU`, which allocates an H200 slice
+  for the duration of the call and releases it afterwards.
+## Setup
+1. **Create a Gradio Space** on Hugging Face.  Choose the **ZeroGPU
+   (Dynamic resources)** hardware option and select the **NVIDIA H200**
+   accelerator.  If ZeroGPU or H200 does not appear in the hardware
+   selector you may need to upgrade to a PRO plan【283503741553473†L118-L127】.
+2. **Add secrets** in your Space settings.  Under **Settings → Secrets**:
+   - `HF_TOKEN` – a Hugging Face access token with permission to
+     download `google/medgemma-27b-it`.  Without this token the model
+     cannot be loaded.  The Hugging Face documentation recommends
+     storing tokens and API keys in secrets rather than hard‑coding
+     them【188898489867690†L175-L193】.
+   - `API_KEY` (optional) – a random string used to protect your Space.
+     If set, callers must provide the same value in the API key field
+     when using the interface or when calling the model programmatically.
+3. **Upload the files** in this repository to your Space.  The
+   `app.py` file defines the Gradio interface and lazy‑loads the model.
+   The `requirements.txt` lists the Python dependencies.
+4. Once the Space is built, open it in your browser.  Enter your
+   question, optionally upload an image, and click *Submit*.  The model
+   will run on an H200 slice and return an answer.
+## Programmatic access with `gradio_client`
+You can call this Space from your own Python code using the
+[`gradio_client`](https://github.com/gradio-app/gradio/tree/main/client) package.  The client
+connects to the Space and invokes the `/predict` endpoint.  If you have
+configured an API key, supply it as the last argument.  Example:
+```python
+from gradio_client import Client
+space_name = "<user>/<space>"  # replace with your Space
+client = Client(space_name)
+# Prepare inputs: prompt, image (None), system prompt, api_key
+result = client.predict(
+    "Please examine this chest X‑ray.",
+    None,
+    "You are a concise radiology assistant.",
+    "my_secret_key",  # or omit if API_KEY is not set
+    api_name="/predict",
+)
+print(result)
+```
+The inputs must be provided in the same order as defined in `app.py`:
+1. **prompt** (string) – required
+2. **image** (`PIL.Image.Image` or `None`) – optional
+3. **system_prompt** (string or `None`) – optional
+4. **api_key** (string or `None`) – required only if you set `API_KEY`
+If you prefer a cURL call, you can send a JSON payload to the
+`/predict` endpoint.  For example:
+```bash
+curl -X POST \
+  -H "Content-Type: application/json" \
+  -d '{"data": ["Please examine this CT scan.", null, "You are a concise radiology assistant.", "my_secret_key"]}' \
+  https://huggingface.co/spaces/<user>/<space>/predict
+```
+Note that Gradio sets the `api_name` of the prediction endpoint to
+`/predict` by default when using `gr.Interface(fn=...)`.
+## Running locally
+You can also run this application locally for testing.  Install the
+dependencies and start the Gradio server:
+```bash
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx  # set your token
+python app.py
+```
+Open `http://localhost:7860` in your browser.  Running locally will
+execute the model on your machine’s CPU or GPU; the ZeroGPU dynamic
+allocation only works within a Hugging Face Space.
+## Dependencies
+The `requirements.txt` file specifies the Python packages needed to run
+this project.  It includes Gradio, `spaces` for ZeroGPU support, and the
+transformers library.  These versions are selected to be compatible
+with ZeroGPU【916380489845432†L110-L118】.
+## Disclaimer
+The MedGemma model is for research and educational purposes only.  It
+may generate incorrect or harmful content and should **not** be used for
+medical diagnosis or treatment.  Always consult a licensed medical
+professional for health questions.  This application appends a
+disclaimer to every response to remind users of these limitations.

app.py ADDED Viewed

	@@ -0,0 +1,255 @@

+"""
+Gradio application for MedGemma inference with ZeroGPU.
+This script defines a minimal Gradio interface around Google's
+``medgemma‑27b‑it`` multi‑modal model.  It is designed to run on
+Hugging Face Spaces using the **ZeroGPU** hardware option.  ZeroGPU
+allocates an NVIDIA H200 GPU slice for the duration of each call and
+releases it afterwards.  The interface accepts a textual **prompt**
+(English only), an optional image upload and an optional **system
+prompt** to steer the model.  All responses are returned in English and
+include a short disclaimer reminding users to consult a medical
+professional.
+If you set an ``API_KEY`` secret in your Space, callers must supply the
+same value in the hidden API key field.  Otherwise the endpoint will be
+publicly accessible.  See the README for details.
+Note: ZeroGPU Spaces currently only work with the **Gradio** SDK and
+support specific versions of PyTorch and Python【916380489845432†L110-L118】.
+Running this script outside of a Space will work on CPU or dedicated
+GPU hardware, but ZeroGPU GPU allocation only takes effect when the
+Space hardware is set to *ZeroGPU (Dynamic resources)*.
+"""
+import os
+from typing import Optional
+import gradio as gr
+from PIL import Image
+import torch
+from transformers import (
+    AutoProcessor,
+    AutoModelForImageTextToText,
+    GenerationConfig,
+    pipeline,
+)
+import spaces  # for the @spaces.GPU decorator
+# ----------------------------------------------------------------------------
+# Configuration
+# ----------------------------------------------------------------------------
+HF_TOKEN = os.getenv("HF_TOKEN")
+if HF_TOKEN is None:
+    raise RuntimeError(
+        "HF_TOKEN environment variable must be set as a Secret in the Space."
+    )
+# Optional API key: when set, clients must provide the same value in the
+# hidden ``api_key`` field of the Gradio interface.  If not set, no
+# authentication is enforced.
+API_KEY = os.getenv("API_KEY")
+MODEL_ID = "google/medgemma-27b-it"
+# Load the processor outside of the GPU context – this is lightweight
+processor = AutoProcessor.from_pretrained(MODEL_ID, token=HF_TOKEN, trust_remote_code=True)
+eos_id = processor.tokenizer.eos_token_id
+pad_id = processor.tokenizer.pad_token_id or eos_id
+# Banned phrases to reduce chatty or irrelevant responses
+ban_list = [
+    "Disclaimer",
+    "disclaimer",
+    "As an AI Chatbot",
+    "as an AI Chatbot",
+    "I cannot give medical advice",
+    "I cannot provide medical advice",
+    "I cannot give medical advise",
+    "user",
+    "response",
+    "display",
+    "response>",
+    "```",
+    "label",
+    "tool_code",
+]
+bad_words_ids = [processor.tokenizer(b, add_special_tokens=False).input_ids for b in ban_list]
+gen_cfg = GenerationConfig(
+    max_new_tokens=120,
+    do_sample=False,
+    repetition_penalty=1.12,
+    no_repeat_ngram_size=6,
+    length_penalty=1.0,
+    temperature=0.0,
+    eos_token_id=eos_id,
+    pad_token_id=pad_id,
+    bad_words_ids=bad_words_ids,
+)
+# We'll load the model lazily inside run_model to ensure GPU allocation
+# occurs within the ZeroGPU context.  Cache the model and pipeline on
+# first use so subsequent calls are faster.  A simple attribute on the
+# function serves as a persistent cache.
+@spaces.GPU(duration=120)
+def run_model(prompt: str, image: Optional[Image.Image], system_prompt: Optional[str]) -> str:
+    """Execute the MedGemma model.
+    This function will be run inside the ZeroGPU allocation context.  It
+    lazily loads the model and pipeline on first invocation and reuses
+    them for subsequent calls.  Inputs are combined with an optional
+    system prompt to produce the full prompt.  The model's output is
+    returned as a plain English string.
+    Args:
+        prompt: The user's question (English only).
+        image: An optional PIL Image.  If provided, the model will use
+            both text and image modalities; otherwise text-only.
+        system_prompt: An optional system prompt to steer the model.  If
+            None or empty, a default instruction is used.
+    Returns:
+        The raw English output from the model (without disclaimer).
+    """
+    # Lazy‑load the model and pipeline on first use
+    if not hasattr(run_model, "model"):
+        # Determine the appropriate dtype and device map.  We'll load on
+        # auto to split across CPU/GPU if necessary.  Use bfloat16 when
+        # CUDA is available to save memory on H200.
+        model_kwargs: dict = {
+            "torch_dtype": torch.bfloat16 if torch.cuda.is_available() else torch.float32,
+            "token": HF_TOKEN,
+        }
+        if torch.cuda.is_available():
+            model_kwargs["device_map"] = "auto"
+        model = AutoModelForImageTextToText.from_pretrained(MODEL_ID, **model_kwargs)
+        # Create a pipeline for convenience
+        vlm = pipeline(
+            task="image-text-to-text",
+            model=model,
+            processor=processor,
+            generation_config=gen_cfg,
+        )
+        # Store for reuse
+        run_model.model = model
+        run_model.vlm = vlm
+    else:
+        vlm = run_model.vlm
+    # Compose the full prompt
+    sys_prompt = (
+        system_prompt.strip()
+        if system_prompt and system_prompt.strip()
+        else "You are a concise radiology assistant. Answer the user's question based on the image and text."
+    )
+    full_prompt = sys_prompt + "\n" + prompt.strip()
+    # Run inference
+    if image is not None:
+        result = vlm(image, full_prompt)
+    else:
+        result = vlm(full_prompt)
+    output = result[0]["generated_text"]
+    return output
+def predict(
+    prompt: str,
+    image: Optional[Image.Image] = None,
+    system_prompt: Optional[str] = None,
+    api_key: Optional[str] = None,
+) -> str:
+    """Wrapper function for Gradio.
+    Handles optional API key authentication and appends a disclaimer to
+    the model's output.  See README for details.
+    Args:
+        prompt: The user's question in English.
+        image: An optional PIL image.
+        system_prompt: Optional system prompt to steer the model.
+        api_key: Optional API key supplied by the client.  If the
+            ``API_KEY`` secret is set and this does not match, the
+            request is rejected.
+    Returns:
+        A string containing the model's answer followed by a
+        disclaimer.  If authentication fails an error message is
+        returned instead.
+    """
+    # Enforce API key if configured
+    if API_KEY:
+        if api_key is None or api_key != API_KEY:
+            return "Error: Invalid or missing API key."
+    # Validate prompt
+    if not prompt or not prompt.strip():
+        return "Error: Prompt cannot be empty."
+    try:
+        answer = run_model(prompt, image, system_prompt)
+    except Exception as e:
+        return f"Error during inference: {e}"
+    disclaimer = (
+        "\n\nThis response is generated by an AI model and may be incorrect. "
+        "Always consult a licensed medical professional for health questions."
+    )
+    return answer.strip() + disclaimer
+def build_demo() -> gr.Interface:
+    """Construct the Gradio UI for this application."""
+    # Define inputs: prompt, optional image, optional system prompt, and
+    # optional API key (hidden from the UI).  When API_KEY is not
+    # configured the api_key input is ignored.
+    inputs = [
+        gr.Textbox(
+            label="Prompt (English only)",
+            lines=4,
+            placeholder="Describe the medical image or ask a question."
+        ),
+        gr.Image(
+            type="pil",
+            label="Optional image"
+        ),
+        gr.Textbox(
+            label="Optional system prompt",
+            lines=2,
+            placeholder="e.g. You are a concise radiology assistant."
+        ),
+        gr.Textbox(
+            label="API key",
+            lines=1,
+            placeholder="Enter API key if required",
+            type="password",
+            visible=bool(API_KEY),
+        ),
+    ]
+    outputs = gr.Textbox(label="Answer")
+    description = (
+        "Ask MedGemma a question about a medical image or condition. "
+        "Optionally provide a system prompt to guide the model's behaviour. "
+        "All responses are in English and include a disclaimer."
+    )
+    demo = gr.Interface(
+        fn=predict,
+        inputs=inputs,
+        outputs=outputs,
+        title="MedGemma ZeroGPU (Gradio)",
+        description=description,
+        allow_flagging="never",
+    )
+    return demo
+demo = build_demo()
+if __name__ == "__main__":
+    # Launch with share=False to bind to the default port.  In Spaces this
+    # function is not executed; Spaces uses the Gradio SDK to run the app.
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+gradio>=4.0.0,<5.0.0
+spaces
+transformers
+torch>=2.1.0
+huggingface-hub>=0.19.1
+Pillow
+# gradio_client is optional but useful for programmatic access
+gradio_client