Kubernetes AI - GGUF Quantized Models
Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference.
Model Description
This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility.
Primary Purpose: Answer Kubernetes-related questions in Turkish language on local machines.
Available Models
| Model | Size | Download |
|---|---|---|
| Unquantized | 22.0 GB | kubernetes-ai.gguf |
| Q8_0 | 12.5 GB | kubernetes-ai-Q8_0.gguf |
| Q5_K_M | 8.45 GB | kubernetes-ai-Q5_K_M.gguf |
| Q4_K_M | 7.3 GB | kubernetes-ai-Q4_K_M.gguf |
| Q4_K_S | 6.9 GB | kubernetes-ai-Q4_K_S.gguf |
| Q3_K_M | 6.0 GB | kubernetes-ai-Q3_K_M.gguf |
| IQ3_M | 5.6 GB | kubernetes-ai-IQ3_M.gguf |
Recommended: Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems.
Quick Start
Using Ollama (Recommended)
Ollama provides the easiest way to run GGUF models locally.
1. Install Ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew install ollama
# Windows - Download from https://ollama.com/download
2. Download Model
# Download your preferred quantization
wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf
3. Create Modelfile
cat > Modelfile << 'EOF'
FROM <path-to-model>/kubernetes-ai.gguf
TEMPLATE """{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>
"""
# Model Parametreleri
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER repeat_penalty 1.05
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"
SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."""
EOF
4. Create and Run Model
# Create model
ollama create kubernetes-ai -f Modelfile
# Run interactive chat
ollama run kubernetes-ai
# Example query
ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"
Training Details
This model is based on the aciklab/kubernetes-ai LoRA adapters:
- Base Model: unsloth/gemma-3-12b-it-qat-bnb-4bit
- Training Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 8
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Dataset: ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- Training Time: 28 hours on NVIDIA RTX 5070 12GB
- Max Sequence Length: 1024 tokens
Training Dataset Summary
| Dataset Category | Count | Description |
|---|---|---|
| Kubernetes Official Docs | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| Stack Overflow | 52,000 | Kubernetes Q&A from community |
| DevOps Datasets | 62,500 | General DevOps and Kubernetes content |
| Configurations & CLI | 36,800 | Kubernetes configs, kubectl examples, operators |
| Total | ~157,210 | Comprehensive Kubernetes knowledge base |
Quantization Details
All models were quantized using llama.cpp with importance matrix optimization:
- Source: Merged LoRA adapters with base model
- Quantization Tool: llama.cpp (latest)
- Method: K-quant and IQ-quant mixtures
- Optimization: Importance matrix for better quality
Quantization Quality
- Q4_K_M: Best balance - recommended for most users
- Q4_K_S: Slightly smaller with minimal quality loss
- Q3_K_M: Good for memory-constrained systems
- IQ3_M: Advanced 3-bit quantization for laptops
- Unquantized: Original F16/F32 precision
Hardware Requirements
Minimum
- CPU: 4+ cores
- RAM: 8GB (for IQ3_M/Q3_K_M quantizations)
- Storage: 6-8GB free space
- GPU: Not required (CPU inference)
Recommended
- CPU: 8+ cores
- RAM: 16GB (for Q4_K_M/Q4_K_S quantizations)
- Storage: 10GB free space
- GPU: Optional (can accelerate inference)
License
This model is released under the MIT License. Free to use in commercial and open-source projects.
Contact
Produced by: HAVELSAN/Açıklab
For questions or feedback, please open an issue on the model repository.
Note: These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.
- Downloads last month
- 337
Model tree for aciklab/kubernetes-ai-GGUF
Base model
google/gemma-3-12b-pt