| import gradio as gr | |
| import os | |
| with gr.Blocks(title="Technical Documentation", css="footer {visibility: hidden}") as docs_demo: | |
| with gr.Column(): | |
| gr.Markdown(""" | |
| # Technical Documentation | |
| ## Overview | |
| This page provides details about the architecture, API, and usage of the MedGemma Agent application. | |
| ## Features | |
| - Multimodal (text + image) | |
| - Wikipedia tool integration | |
| - Real-time streaming | |
| - Medical knowledge base | |
| --- | |
| ## Architecture | |
| - **Frontend:** Gradio Blocks, custom CSS | |
| - **Backend:** Modal, FastAPI, VLLM, MedGemma-4B | |
| - **Security:** API key authentication | |
| ### ποΈ Technical Stack | |
| - Streaming responses for real-time interaction | |
| - Secure API key authentication | |
| - Base64 image processing for multimodal inputs | |
| ### Frontend Interface | |
| - Built with Gradio for seamless user interaction | |
| - Custom CSS theming for professional appearance | |
| - Example queries for common medical scenarios | |
| ```mermaid | |
| graph TD | |
| A[MedGemma Agent] --> B[Backend] | |
| A --> C[Frontend] | |
| A --> D[Model] | |
| B --> B1[Modal] | |
| B --> B2[FastAPI] | |
| B --> B3[VLLM] | |
| C --> C1[Gradio] | |
| C --> C2[Custom CSS] | |
| D --> D1[MedGemma-4B] | |
| D --> D2[4-bit Quantization] | |
| ``` | |
| """) | |
| gr.Markdown(""" | |
| ## Backend Architecture | |
| ### π― Performance Features | |
| - Optimized for low latency responses | |
| - GPU-accelerated inference | |
| - Efficient memory utilization with 4-bit quantization | |
| - Maximum context length of 8192 tokens | |
| ### π Security Measures | |
| - API key authentication for all requests | |
| - Secure image processing | |
| - Protected model endpoints | |
| ```mermaid | |
| flowchart LR | |
| A[Client] --> B[FastAPI] | |
| B --> C[Modal Container] | |
| C --> D[VLLM] | |
| D --> E[MedGemma-4B] | |
| B --> F[Wikipedia API] | |
| ``` | |
| """) | |
| with gr.Row(): | |
| with gr.Column(): | |
| gr.Markdown(""" | |
| ## πΎ Model Deployment | |
| ### Model | |
| - **Model:** unsloth/medgemma-4b-it-unsloth-bnb-4bit | |
| - **Context Length:** 8192 tokens | |
| - **Quantization:** 4-bit, bfloat16 | |
| - Utilizes Modal's GPU-accelerated containers | |
| - Implements efficient model loading with VLLM | |
| - Supports bfloat16 precision for optimal performance | |
| """) | |
| with gr.Column(): | |
| gr.Markdown(""" | |
| ```mermaid | |
| graph TD | |
| A[Model Loading] --> B[GPU Acceleration] | |
| B --> C[4-bit Quantization] | |
| C --> D[8192 Token Context] | |
| D --> E[Streaming Response] | |
| ``` | |
| """) | |
| with gr.Column(): | |
| gr.Markdown(""" | |
| ## π System Architecture | |
| ```mermaid | |
| flowchart TD | |
| A[User Interface] --> B[API Gateway] | |
| B --> C[Authentication] | |
| C --> D[Model Service] | |
| D --> E[Wikipedia Service] | |
| D --> F[Image Processing] | |
| F --> G[Model Inference] | |
| E --> H[Response Generation] | |
| G --> H | |
| H --> I[Stream Response] | |
| I --> A | |
| ``` | |
| """) | |
| gr.Markdown(""" | |
| [Back to Main Application](https://huggingface.co/spaces/Agents-MCP-Hackathon/agentic-coach-advisor-medgemma) | |
| """) | |
| if __name__ == "__main__": | |
| docs_demo.launch() |