ai_ocr / model_card.md
utarn's picture
Update model
dfdb180

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
license: mit
tags:
  - gradio
  - omni-api
  - multimodal
  - chat-interface
  - pdf-processing
  - image-processing
  - audio-processing
  - llm
  - api-client
  - chatbot
  - text-generation
  - document-analysis
  - ocr
  - transcription
widget:
  - src: https://api.modelharbor.com

Omni API Gradio UI

This is a Gradio-based user interface for the Omni API that supports multimodal interactions with various file types including text, PDF documents, images, and audio files.

Model Description

The Omni API Gradio UI provides an easy-to-use web interface for interacting with the Omni API, which supports advanced multimodal AI capabilities. Users can send text prompts along with various file types and receive intelligent responses.

Supported Models

The interface supports several state-of-the-art models:

  • typhoon-ocr-preview
  • openai/gpt-5
  • meta-llama/llama-4-maverick
  • qwen/qwen3-vl-235b-a22b-instruct
  • gemini/gemini-2.5-pro
  • gemini/gemini-2.5-flash

Features

  • Multimodal Support: Process text, PDFs, images, and audio files in a single interface
  • File Ordering: Upload multiple files in a specific order for precise control
  • Configurable Models: Switch between different AI models for different tasks
  • Real-time Responses: Get immediate feedback from the API
  • Customizable Parameters: Adjust max tokens and other settings

Intended Uses & Limitations

Intended Uses

  • Document analysis and summarization
  • Image OCR and analysis
  • Audio transcription and analysis
  • Multimodal chat applications
  • Content extraction from various file formats

Limitations

  • Requires access to the Omni API
  • Dependent on network connectivity
  • File size limitations based on API constraints
  • Some models may require API keys

How to Use

  1. Configure the API base URL (defaults to https://api.modelharbor.com)
  2. Select your preferred model from the dropdown
  3. Enter your text message in the input box
  4. Upload files (PDF, images, or audio) as needed
  5. Click "Send Request" to interact with the API
  6. View the response in the output panel

Supported File Types

  • PDFs: Document processing and analysis
  • Images: JPG, PNG, GIF, BMP, WEBP for OCR and visual analysis
  • Audio: MP3, WAV, M4A, FLAC, OGG for transcription

Technical Details

Frameworks and Libraries

  • Gradio 4.0+
  • Python 3.8+
  • Requests library for API communication

Installation

# Install dependencies
uv sync

# Run the application
uv run python app.py

Development Mode

# Run with auto-reload for development
uv run python dev.py

Citation

If you use this interface in your work, please cite:

@misc{omni_api_gradio_ui,
  title={Omni API Gradio UI},
  author={ModelHarbor Team},
  year={2025},
  howpublished={\url{https://github.com/your-username/omni-api-gradio-ui}}
}