Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
utter-project
/
TowerVision-2B
like
2
Follow
UTTER - Unified Transcription and Translation for Extended Reality
302
Image-Text-to-Text
Transformers
Safetensors
18 languages
llava_next
image-to-text
multimodal
multilingual
vlm
translation
conversational
text-generation-inference
arXiv:
2510.21849
License:
cc-by-nc-sa-4.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
cc4328d
TowerVision-2B
12.1 GB
1 contributor
History:
24 commits
GuilhermeNunes
Upload Tower.png
cc4328d
verified
23 days ago
.gitattributes
Safe
1.71 kB
Upload Tower.png
23 days ago
README.md
11.6 kB
Update README.md
28 days ago
Tower.png
135 kB
xet
Upload Tower.png
23 days ago
added_tokens.json
Safe
24 Bytes
Upload processor
3 months ago
chat_template.jinja
Safe
305 Bytes
Upload processor
3 months ago
config.json
4.02 kB
Update config.json
2 months ago
generation_config.json
Safe
170 Bytes
Upload LlavaNextForConditionalGeneration
2 months ago
mc-eval1.png
154 kB
xet
Upload 3 files
29 days ago
mc-eval2.png
179 kB
xet
Upload 3 files
29 days ago
model-00001-of-00003.safetensors
Safe
4.97 GB
xet
Upload LlavaNextForConditionalGeneration
2 months ago
model-00002-of-00003.safetensors
Safe
4.98 GB
xet
Upload LlavaNextForConditionalGeneration
2 months ago
model-00003-of-00003.safetensors
Safe
2.12 GB
xet
Upload LlavaNextForConditionalGeneration
2 months ago
model.safetensors.index.json
Safe
76.2 kB
Upload LlavaNextForConditionalGeneration
2 months ago
preprocessor_config.json
Safe
1.09 kB
Upload processor
3 months ago
processor_config.json
Safe
174 Bytes
Upload processor
3 months ago
special_tokens_map.json
Safe
644 Bytes
Upload processor
3 months ago
tokenizer.json
Safe
34.4 MB
xet
Upload processor
3 months ago
tokenizer.model
Safe
4.24 MB
xet
Upload processor
3 months ago
tokenizer_config.json
Safe
46.6 kB
Upload processor
2 months ago