File size: 2,574 Bytes
d04908e 0173e33 d04908e 0173e33 d04908e 3cbcd96 d04908e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
license: apache-2.0
datasets:
- SouthpawIN/senter-omni-data
language:
- en
base_model:
- unsloth/Qwen2.5-Omni-3B-GGUF
tags:
- any-to-any
pipeline_tag: text-generation
---
# π Senter-Omni
**Multimodal AI Assistant with Cross-Modal Embeddings**

## π Overview
Senter-Omni is a 4B parameter multimodal AI assistant that understands and reasons across text, images, audio, and video simultaneously. Built on Qwen2.5-Omni with extended 128K context and Apache 2.0 licensing.
## β¨ Key Features
- **π― ONE MODEL, ALL MODALITIES** - Single model for text, image, audio, and video
- **β‘ TRUE STREAMING** - Real-time token generation (~0.234s time-to-first-token)
- **π OPEN & UNCENSORED** - Apache 2.0 licensed with unrestricted responses
- **π§ 128K CONTEXT** - Extended RoPE scaling for massive documents
- **πΎ MEMORY EFFICIENT** - 4-bit quantized model for consumer GPUs
- **π CROSS-MODAL EMBEDDINGS** - Unified 1024D space for all modalities
## π Quick Start
```python
from omni import OmniClient
# Initialize Senter-Omni
client = OmniClient()
# Multimodal chat
response = client.chat([
{"role": "user", "content": [
{"type": "image", "image": "photo.jpg"},
{"type": "text", "text": "What do you see?"}
]}
])
# Cross-modal embeddings
embedding = client.embed("any content", modality="auto")
```
## π Model Specifications
- **Parameters**: 4B (quantized to 4-bit)
- **Context Length**: 128K tokens (RoPE scaled)
- **Memory Usage**: ~8GB VRAM
- **Modalities**: Text, Image, Audio, Video
- **License**: Apache 2.0
## π Links
- **GitHub Repository**: https://github.com/SouthpawIN/senter-omni
- **Training Dataset**: https://huggingface.co/datasets/SouthpawIN/senter-omni-data
- **Demo Script**: Run `python senter_omni_demo.py` in the GitHub repo
## π― Performance
- **Time to First Token**: ~0.234s
- **Text Generation**: 2-5 seconds
- **Image Analysis**: 3-6 seconds
- **Audio Processing**: 4-8 seconds
- **Multimodal Chat**: 5-10 seconds
## π οΈ Installation
```bash
git clone https://github.com/SouthpawIN/senter-omni.git
cd senter-omni
pip install -r requirements.txt
python senter_omni_demo.py
```
## π Citation
```bibtex
@misc{senter-omni,
title={Senter-Omni: Multimodal AI Assistant with Cross-Modal Embeddings},
author={Chris at Alignment Lab AI},
year={2024},
url={https://github.com/SouthpawIN/senter-omni}
}
```
---
**Built with β€οΈ by Chris at Alignment Lab AI**
|