Spaces:

utarn
/

ai_ocr

Running

App Files Files Community

ai_ocr / model_card.md

utarn

Update model

dfdb180 about 1 month ago

preview code

raw

history blame contribute delete

2.84 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

license: mit
tags:
  - gradio
  - omni-api
  - multimodal
  - chat-interface
  - pdf-processing
  - image-processing
  - audio-processing
  - llm
  - api-client
  - chatbot
  - text-generation
  - document-analysis
  - ocr
  - transcription
widget:
  - src: https://api.modelharbor.com

Omni API Gradio UI

This is a Gradio-based user interface for the Omni API that supports multimodal interactions with various file types including text, PDF documents, images, and audio files.

Model Description

The Omni API Gradio UI provides an easy-to-use web interface for interacting with the Omni API, which supports advanced multimodal AI capabilities. Users can send text prompts along with various file types and receive intelligent responses.

Supported Models

The interface supports several state-of-the-art models:

typhoon-ocr-preview
openai/gpt-5
meta-llama/llama-4-maverick
qwen/qwen3-vl-235b-a22b-instruct
gemini/gemini-2.5-pro
gemini/gemini-2.5-flash

Features

Multimodal Support: Process text, PDFs, images, and audio files in a single interface
File Ordering: Upload multiple files in a specific order for precise control
Configurable Models: Switch between different AI models for different tasks
Real-time Responses: Get immediate feedback from the API
Customizable Parameters: Adjust max tokens and other settings

Intended Uses & Limitations

Intended Uses

Document analysis and summarization
Image OCR and analysis
Audio transcription and analysis
Multimodal chat applications
Content extraction from various file formats

Limitations

Requires access to the Omni API
Dependent on network connectivity
File size limitations based on API constraints
Some models may require API keys

How to Use

Configure the API base URL (defaults to https://api.modelharbor.com)
Select your preferred model from the dropdown
Enter your text message in the input box
Upload files (PDF, images, or audio) as needed
Click "Send Request" to interact with the API
View the response in the output panel

Supported File Types

PDFs: Document processing and analysis
Images: JPG, PNG, GIF, BMP, WEBP for OCR and visual analysis
Audio: MP3, WAV, M4A, FLAC, OGG for transcription

Technical Details

Frameworks and Libraries

Gradio 4.0+
Python 3.8+
Requests library for API communication

Installation

# Install dependencies
uv sync

# Run the application
uv run python app.py

Development Mode

# Run with auto-reload for development
uv run python dev.py

Citation

If you use this interface in your work, please cite:

@misc{omni_api_gradio_ui,
  title={Omni API Gradio UI},
  author={ModelHarbor Team},
  year={2025},
  howpublished={\url{https://github.com/your-username/omni-api-gradio-ui}}
}