File size: 2,574 Bytes

---
license: apache-2.0
datasets:
- SouthpawIN/senter-omni-data
language:
- en
base_model:
- unsloth/Qwen2.5-Omni-3B-GGUF
tags:
- any-to-any
pipeline_tag: text-generation
---

# 🎭 Senter-Omni

**Multimodal AI Assistant with Cross-Modal Embeddings**

![Senter-Omni Fixed Banner](https://github.com/SouthpawIN/senter-omni/raw/main/senter-fixed-banner.gif)

## 🌟 Overview

Senter-Omni is a 4B parameter multimodal AI assistant that understands and reasons across text, images, audio, and video simultaneously. Built on Qwen2.5-Omni with extended 128K context and Apache 2.0 licensing.

## ✨ Key Features

- **🎯 ONE MODEL, ALL MODALITIES** - Single model for text, image, audio, and video
- **⚡ TRUE STREAMING** - Real-time token generation (~0.234s time-to-first-token)
- **🔓 OPEN & UNCENSORED** - Apache 2.0 licensed with unrestricted responses
- **🧠 128K CONTEXT** - Extended RoPE scaling for massive documents
- **💾 MEMORY EFFICIENT** - 4-bit quantized model for consumer GPUs
- **🔍 CROSS-MODAL EMBEDDINGS** - Unified 1024D space for all modalities

## 🚀 Quick Start

```python
from omni import OmniClient

# Initialize Senter-Omni
client = OmniClient()

# Multimodal chat
response = client.chat([
    {"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "What do you see?"}
    ]}
])

# Cross-modal embeddings
embedding = client.embed("any content", modality="auto")
```

## 📊 Model Specifications

- **Parameters**: 4B (quantized to 4-bit)
- **Context Length**: 128K tokens (RoPE scaled)
- **Memory Usage**: ~8GB VRAM
- **Modalities**: Text, Image, Audio, Video
- **License**: Apache 2.0

## 🔗 Links

- **GitHub Repository**: https://github.com/SouthpawIN/senter-omni
- **Training Dataset**: https://huggingface.co/datasets/SouthpawIN/senter-omni-data
- **Demo Script**: Run `python senter_omni_demo.py` in the GitHub repo

## 🎯 Performance

- **Time to First Token**: ~0.234s
- **Text Generation**: 2-5 seconds
- **Image Analysis**: 3-6 seconds
- **Audio Processing**: 4-8 seconds
- **Multimodal Chat**: 5-10 seconds

## 🛠️ Installation

```bash
git clone https://github.com/SouthpawIN/senter-omni.git
cd senter-omni
pip install -r requirements.txt
python senter_omni_demo.py
```

## 📝 Citation

```bibtex
@misc{senter-omni,
  title={Senter-Omni: Multimodal AI Assistant with Cross-Modal Embeddings},
  author={Chris at Alignment Lab AI},
  year={2024},
  url={https://github.com/SouthpawIN/senter-omni}
}
```

---

**Built with ❤️ by Chris at Alignment Lab AI**