File size: 2,574 Bytes
d04908e
 
0173e33
 
 
 
 
 
d04908e
0173e33
d04908e
 
 
 
 
 
 
3cbcd96
d04908e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: apache-2.0
datasets:
- SouthpawIN/senter-omni-data
language:
- en
base_model:
- unsloth/Qwen2.5-Omni-3B-GGUF
tags:
- any-to-any
pipeline_tag: text-generation
---

# 🎭 Senter-Omni

**Multimodal AI Assistant with Cross-Modal Embeddings**

![Senter-Omni Fixed Banner](https://github.com/SouthpawIN/senter-omni/raw/main/senter-fixed-banner.gif)

## 🌟 Overview

Senter-Omni is a 4B parameter multimodal AI assistant that understands and reasons across text, images, audio, and video simultaneously. Built on Qwen2.5-Omni with extended 128K context and Apache 2.0 licensing.

## ✨ Key Features

- **🎯 ONE MODEL, ALL MODALITIES** - Single model for text, image, audio, and video
- **⚑ TRUE STREAMING** - Real-time token generation (~0.234s time-to-first-token)
- **πŸ”“ OPEN & UNCENSORED** - Apache 2.0 licensed with unrestricted responses
- **🧠 128K CONTEXT** - Extended RoPE scaling for massive documents
- **πŸ’Ύ MEMORY EFFICIENT** - 4-bit quantized model for consumer GPUs
- **πŸ” CROSS-MODAL EMBEDDINGS** - Unified 1024D space for all modalities

## πŸš€ Quick Start

```python
from omni import OmniClient

# Initialize Senter-Omni
client = OmniClient()

# Multimodal chat
response = client.chat([
    {"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "What do you see?"}
    ]}
])

# Cross-modal embeddings
embedding = client.embed("any content", modality="auto")
```

## πŸ“Š Model Specifications

- **Parameters**: 4B (quantized to 4-bit)
- **Context Length**: 128K tokens (RoPE scaled)
- **Memory Usage**: ~8GB VRAM
- **Modalities**: Text, Image, Audio, Video
- **License**: Apache 2.0

## πŸ”— Links

- **GitHub Repository**: https://github.com/SouthpawIN/senter-omni
- **Training Dataset**: https://huggingface.co/datasets/SouthpawIN/senter-omni-data
- **Demo Script**: Run `python senter_omni_demo.py` in the GitHub repo

## 🎯 Performance

- **Time to First Token**: ~0.234s
- **Text Generation**: 2-5 seconds
- **Image Analysis**: 3-6 seconds
- **Audio Processing**: 4-8 seconds
- **Multimodal Chat**: 5-10 seconds

## πŸ› οΈ Installation

```bash
git clone https://github.com/SouthpawIN/senter-omni.git
cd senter-omni
pip install -r requirements.txt
python senter_omni_demo.py
```

## πŸ“ Citation

```bibtex
@misc{senter-omni,
  title={Senter-Omni: Multimodal AI Assistant with Cross-Modal Embeddings},
  author={Chris at Alignment Lab AI},
  year={2024},
  url={https://github.com/SouthpawIN/senter-omni}
}
```

---

**Built with ❀️ by Chris at Alignment Lab AI**