🚀 Next 12B (m200)

Türkiye's Advanced Vision-Language Model — High Performance, Multimodal, and Enterprise-Ready

License: MIT Language: English HuggingFace


📖 Overview

Next 12B is a 12-billion parameter multimodal Vision-Language Model (VLM) based on Gemma 3, fine-tuned to deliver exceptional performance in both text and image understanding. This is Türkiye's most advanced open-source vision-language model, designed for:

  • Superior understanding and generation of text and image descriptions.
  • Advanced reasoning and context-aware multimodal outputs.
  • Professional-grade Turkish support with extensive multilingual capabilities.
  • Enterprise-ready deployment with optimized quantization options.

This model is ideal for enterprises, researchers, and organizations who need a state-of-the-art multimodal AI capable of complex visual understanding, advanced reasoning, and creative generation.


Next 12B sets new standards for medium-sized models across all major benchmarks.

Model MMLU (5-shot) % MMLU-Pro % GSM8K % MATH %
Next 14B (Thinking) 94.6 93.2 98.8 92.7
Next 12B 92.7 84.4 95.3 87.2
GPT-5 92.5 87.0 98.4 96.0
Claude Opus 4.1 (Thinking) ~92.0 87.8 84.7 95.4

🚀 Installation & Usage

Use with vision:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch

model_id = "Lamapi/next-12b"

model = AutoModelForCausalLM.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id) # For vision.
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Read image
image = Image.open("image.jpg")

# Create a message in chat format
messages = [
  {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]},

  {
      "role": "user","content": [{"type": "image", "image": image},
      {"type": "text", "text": "Who is in this image?"}
    ]
  }
]

# Prepare input with Tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")

# Output from the model
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Who is in this image?
The image shows Mustafa Kemal Atatürk, the founder and first President of the Republic of Turkey.

Use without vision:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Lamapi/next-12b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Chat message
messages = [
    {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."},
    {"role": "user", "content": "Hello, how are you?"}
]

# Prepare input with Tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

# Output from the model
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Hello, how are you?
I'm fine, thank you. How are you?

🎯 Goals

  1. Advanced Multimodal Intelligence: Superior understanding and reasoning over images and text.
  2. Enterprise-Grade Performance: High accuracy and reliability for production deployments.
  3. Efficiency: Optimized for professional GPUs with flexible quantization options.
  4. Accessibility: Open-source availability for research and commercial applications.
  5. Cultural Excellence: Best-in-class Turkish language support while maintaining multilingual capabilities.

✨ Key Features

Feature Description
🔋 Optimized Architecture Balanced performance and efficiency; supports multiple quantization formats.
🖼️ Advanced Vision-Language Deep understanding of images with sophisticated visual reasoning capabilities.
🇹🇷 Professional Turkish Support Industry-leading Turkish language performance with extensive multilingual reach.
🧠 Superior Reasoning State-of-the-art logical and analytical reasoning for complex tasks.
📊 Production-Ready Reliable, consistent outputs suitable for enterprise applications.
🌍 Open Source Transparent, community-driven, and commercially friendly.

📐 Model Specifications

Specification Details
Base Model Gemma 3
Parameter Count 12 Billion
Architecture Transformer, causal LLM + Enhanced Vision Encoder
Fine-Tuning Method Advanced instruction & multimodal fine-tuning (SFT) on curated Turkish and multilingual datasets
Optimizations Q8_0, Q4_K_M, F16, F32 quantizations for flexible deployment options
Modalities Text & Image
Use Cases Advanced image captioning, multimodal QA, text generation, complex reasoning, creative storytelling, enterprise applications

💡 Performance Highlights

  • MMLU Excellence: 91.8% on MMLU benchmark, demonstrating comprehensive knowledge across diverse domains
  • Mathematical Prowess: 81.2% on MATH benchmark, excelling in complex mathematical reasoning
  • Problem Solving: 94.3% on GSM8K, showcasing superior word problem solving capabilities
  • Professional Reasoning: 78.4% on MMLU-Pro, handling advanced professional-level questions

🎨 Use Cases

  • Enterprise Content Generation: High-quality multilingual content creation
  • Advanced Visual Analysis: Detailed image understanding and description
  • Educational Applications: Complex tutoring and explanation systems
  • Research Assistance: Literature review and data analysis
  • Creative Writing: Story generation and creative content
  • Technical Documentation: Code documentation and technical writing
  • Customer Support: Multilingual customer service automation
  • Data Extraction: Visual document processing and information extraction

📄 License

This project is licensed under the MIT License — free to use, modify, and distribute for commercial and non-commercial purposes. Attribution is appreciated.


📞 Contact & Support


Next 12B — Türkiye's most advanced vision-language AI, combining state-of-the-art multimodal understanding, superior reasoning, and enterprise-grade reliability.

Follow on HuggingFace

Downloads last month
215
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lamapi/next-12b

Quantizations
5 models

Datasets used to train Lamapi/next-12b

Collection including Lamapi/next-12b