--- title: KOSMOS-2.5 Document AI Demo emoji: 📄 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit --- # KOSMOS-2.5 Document AI Demo This Space demonstrates the capabilities of Microsoft's **KOSMOS-2.5**, a multimodal literate model for machine reading of text-intensive images. ## Features 🔥 **Three powerful modes**: 1. **📝 Markdown Generation**: Convert document images to clean markdown format 2. **🔍 OCR with Bounding Boxes**: Extract text with precise spatial coordinates and visualization 3. **💬 Document Q&A**: Ask questions about document content using KOSMOS-2.5 Chat ## What is KOSMOS-2.5? KOSMOS-2.5 is Microsoft's latest document AI model that excels at understanding text-rich images. It can: - Generate spatially-aware text blocks with coordinates - Produce structured markdown output that captures document styles - Answer questions about document content through the chat variant The model was pre-trained on 357.4 million text-rich document images and achieves performance comparable to much larger models (1.3B vs 7B parameters) on visual question answering benchmarks. ## Example Use Cases - **Receipts**: Extract itemized information or ask "What's the total amount?" - **Forms**: Convert to structured format or query specific fields - **Articles**: Get clean markdown or ask content-specific questions - **Screenshots**: Extract UI text or get information about elements ## Model Information - **Base Model**: [microsoft/kosmos-2.5](https://huggingface.co/microsoft/kosmos-2.5) - **Chat Model**: [microsoft/kosmos-2.5-chat](https://huggingface.co/microsoft/kosmos-2.5-chat) - **Paper**: [Kosmos-2.5: A Multimodal Literate Model](https://arxiv.org/abs/2309.11419) ## Note This is a generative model and may occasionally produce inaccurate results. Please verify outputs for critical applications. ## Citation ```bibtex @article{lv2023kosmos, title={Kosmos-2.5: A multimodal literate model}, author={Lv, Tengchao and Huang, Yupan and Chen, Jingye and Cui, Lei and Ma, Shuming and Chang, Yaoyao and Huang, Shaohan and Wang, Wenhui and Dong, Li and Luo, Weiyao and others}, journal={arXiv preprint arXiv:2309.11419}, year={2023} } ```