Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.49.1
metadata
title: Granite Docling 258M Demo
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
models:
- ibm-granite/granite-docling-258M
π¬ Granite Docling 258M - Online Demo
Experience IBM's cutting-edge Vision-Language Model for document processing and conversion directly in your browser on Hugging Face Spaces!
π€ This Space uses the IBM Granite Docling 258M Vision-Language Model hosted on Hugging Face Hub
π What is Granite Docling 258M?
The IBM Granite Docling 258M is a state-of-the-art Vision-Language Model (VLM) designed for advanced document understanding and conversion. This model excels at:
- π Multi-Format Processing: PDF, DOCX, images
- π Intelligent Analysis: Document structure detection
- π Smart Conversion: Semantic Markdown generation
- β‘ Fast Processing: 19x faster document insights
- πΌοΈ Vision Understanding: OCR and image analysis
π Features Available in This Demo
π Document Analysis (Fast) - Recommended
- 19x faster than full conversion
- Quick structural insights and metadata
- Perfect for understanding document layout
- Ideal for the free tier with processing time limits
π Full Markdown Conversion
- Complete document-to-Markdown transformation
- Preserves formatting and structure
- Comprehensive text extraction
π Table Extraction
- Detects and extracts tabular data
- Maintains table structure in Markdown format
π Quick Preview
- Fast content sampling
- Great for quick document verification
π‘ How to Use
- π€ Upload your document (PDF, DOCX, or image)
- βοΈ Select processing mode (try "Document Analysis" first!)
- π Click "Process Document"
- π View results in the tabs below
β‘ Performance & Tips
- Document Analysis mode is optimized for speed and works great on the free tier
- CPU processing optimized for reliable performance
- Processing time varies based on document size and complexity
- Free CPU tier provides reliable processing with timeout limitations for very large documents
- Upgrade to GPU tier for faster processing speeds
π οΈ Technical Details
- Model: IBM Granite Docling 258M Vision-Language Model
- Model Hub: Automatically loaded from
ibm-granite/granite-docling-258Mon Hugging Face - Backend: Docling framework with PyMuPDF optimization
- Processing: CPU optimized (GPU available with paid tier upgrade)
- Hosting: π€ Hugging Face Spaces (Free CPU Tier)
π Links & Resources
- π GitHub Repository: granite-docling-implementation
- π€ Model Hub: IBM Granite Docling 258M
- π Documentation: Docling Framework
- π Production Ready: Full security audit with zero vulnerabilities
π― Perfect For
- π Document Analysis: Quick insights into document structure
- π Format Conversion: PDF/DOCX to clean Markdown
- π Data Extraction: Tables and structured content
- π§ͺ Research: Testing document processing capabilities
- π Prototyping: Exploring Vision-Language Model capabilities
ποΈ Built With
- IBM Granite Docling 258M - State-of-the-art VLM
- Gradio - Interactive web interface
- PyMuPDF - Fast PDF processing optimization
- Hugging Face Transformers - Model inference
- PyTorch - Deep learning framework
π Try it now! Upload a document above and experience the power of IBM's Granite Docling model with free GPU acceleration!
This demo showcases a production-ready implementation with comprehensive security auditing and performance optimizations.