sovthpaw commited on
Commit
d04908e
Β·
verified Β·
1 Parent(s): a1f9f22

Upload hf_model_readme.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. hf_model_readme.md +101 -0
hf_model_readme.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: unsloth/Qwen2.5-Omni-3B
4
+ tags:
5
+ - multimodal
6
+ - text
7
+ - image
8
+ - audio
9
+ - video
10
+ - senter
11
+ - omnimodal
12
+ - 4b
13
+ - 128k
14
+ - uncensored
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # 🎭 Senter-Omni
19
+
20
+ **Multimodal AI Assistant with Cross-Modal Embeddings**
21
+
22
+ ![Senter Banner](senter-banner.png)
23
+
24
+ ## 🌟 Overview
25
+
26
+ Senter-Omni is a 4B parameter multimodal AI assistant that understands and reasons across text, images, audio, and video simultaneously. Built on Qwen2.5-Omni with extended 128K context and Apache 2.0 licensing.
27
+
28
+ ## ✨ Key Features
29
+
30
+ - **🎯 ONE MODEL, ALL MODALITIES** - Single model for text, image, audio, and video
31
+ - **⚑ TRUE STREAMING** - Real-time token generation (~0.234s time-to-first-token)
32
+ - **πŸ”“ OPEN & UNCENSORED** - Apache 2.0 licensed with unrestricted responses
33
+ - **🧠 128K CONTEXT** - Extended RoPE scaling for massive documents
34
+ - **πŸ’Ύ MEMORY EFFICIENT** - 4-bit quantized model for consumer GPUs
35
+ - **πŸ” CROSS-MODAL EMBEDDINGS** - Unified 1024D space for all modalities
36
+
37
+ ## πŸš€ Quick Start
38
+
39
+ ```python
40
+ from omni import OmniClient
41
+
42
+ # Initialize Senter-Omni
43
+ client = OmniClient()
44
+
45
+ # Multimodal chat
46
+ response = client.chat([
47
+ {"role": "user", "content": [
48
+ {"type": "image", "image": "photo.jpg"},
49
+ {"type": "text", "text": "What do you see?"}
50
+ ]}
51
+ ])
52
+
53
+ # Cross-modal embeddings
54
+ embedding = client.embed("any content", modality="auto")
55
+ ```
56
+
57
+ ## πŸ“Š Model Specifications
58
+
59
+ - **Parameters**: 4B (quantized to 4-bit)
60
+ - **Context Length**: 128K tokens (RoPE scaled)
61
+ - **Memory Usage**: ~8GB VRAM
62
+ - **Modalities**: Text, Image, Audio, Video
63
+ - **License**: Apache 2.0
64
+
65
+ ## πŸ”— Links
66
+
67
+ - **GitHub Repository**: https://github.com/SouthpawIN/senter-omni
68
+ - **Training Dataset**: https://huggingface.co/datasets/SouthpawIN/senter-omni-data
69
+ - **Demo Script**: Run `python senter_omni_demo.py` in the GitHub repo
70
+
71
+ ## 🎯 Performance
72
+
73
+ - **Time to First Token**: ~0.234s
74
+ - **Text Generation**: 2-5 seconds
75
+ - **Image Analysis**: 3-6 seconds
76
+ - **Audio Processing**: 4-8 seconds
77
+ - **Multimodal Chat**: 5-10 seconds
78
+
79
+ ## πŸ› οΈ Installation
80
+
81
+ ```bash
82
+ git clone https://github.com/SouthpawIN/senter-omni.git
83
+ cd senter-omni
84
+ pip install -r requirements.txt
85
+ python senter_omni_demo.py
86
+ ```
87
+
88
+ ## πŸ“ Citation
89
+
90
+ ```bibtex
91
+ @misc{senter-omni,
92
+ title={Senter-Omni: Multimodal AI Assistant with Cross-Modal Embeddings},
93
+ author={Chris at Alignment Lab AI},
94
+ year={2024},
95
+ url={https://github.com/SouthpawIN/senter-omni}
96
+ }
97
+ ```
98
+
99
+ ---
100
+
101
+ **Built with ❀️ by Chris at Alignment Lab AI**