Kubernetes AI - GGUF Quantized Models

Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference.

Model Description

This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility.

Primary Purpose: Answer Kubernetes-related questions in Turkish language on local machines.

Available Models

Model	Size	Download
Unquantized	22.0 GB	kubernetes-ai.gguf
Q8_0	12.5 GB	kubernetes-ai-Q8_0.gguf
Q5_K_M	8.45 GB	kubernetes-ai-Q5_K_M.gguf
Q4_K_M	7.3 GB	kubernetes-ai-Q4_K_M.gguf
Q4_K_S	6.9 GB	kubernetes-ai-Q4_K_S.gguf
Q3_K_M	6.0 GB	kubernetes-ai-Q3_K_M.gguf
IQ3_M	5.6 GB	kubernetes-ai-IQ3_M.gguf

Recommended: Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems.

Quick Start

Using Ollama (Recommended)

Ollama provides the easiest way to run GGUF models locally.

1. Install Ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows - Download from https://ollama.com/download

2. Download Model

# Download your preferred quantization
wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf

3. Create Modelfile

cat > Modelfile << 'EOF'
FROM <path-to-model>/kubernetes-ai.gguf

TEMPLATE """{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>
"""

# Model Parametreleri
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER repeat_penalty 1.05
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"

SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."""
EOF

4. Create and Run Model

# Create model
ollama create kubernetes-ai -f Modelfile

# Run interactive chat
ollama run kubernetes-ai

# Example query
ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"

Training Details

This model is based on the aciklab/kubernetes-ai LoRA adapters:

Base Model: unsloth/gemma-3-12b-it-qat-bnb-4bit
Training Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 8
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Dataset: ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
Training Time: 28 hours on NVIDIA RTX 5070 12GB
Max Sequence Length: 1024 tokens

Training Dataset Summary

Dataset Category	Count	Description
Kubernetes Official Docs	8,910	Concepts, kubectl, setup, tasks, tutorials
Stack Overflow	52,000	Kubernetes Q&A from community
DevOps Datasets	62,500	General DevOps and Kubernetes content
Configurations & CLI	36,800	Kubernetes configs, kubectl examples, operators
Total	~157,210	Comprehensive Kubernetes knowledge base

Quantization Details

All models were quantized using llama.cpp with importance matrix optimization:

Source: Merged LoRA adapters with base model
Quantization Tool: llama.cpp (latest)
Method: K-quant and IQ-quant mixtures
Optimization: Importance matrix for better quality

Quantization Quality

Q4_K_M: Best balance - recommended for most users
Q4_K_S: Slightly smaller with minimal quality loss
Q3_K_M: Good for memory-constrained systems
IQ3_M: Advanced 3-bit quantization for laptops
Unquantized: Original F16/F32 precision

Hardware Requirements

Minimum

CPU: 4+ cores
RAM: 8GB (for IQ3_M/Q3_K_M quantizations)
Storage: 6-8GB free space
GPU: Not required (CPU inference)

License

This model is released under the MIT License. Free to use in commercial and open-source projects.

Contact

Produced by: HAVELSAN/Açıklab

For questions or feedback, please open an issue on the model repository.

Note: These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.

Downloads last month: 337

GGUF

Model size

12B params

Architecture

gemma3

Hardware compatibility

3-bit

4-bit

5-bit

8-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aciklab/kubernetes-ai-GGUF

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it

Finetuned

google/gemma-3-12b-it-qat-q4_0-unquantized

Quantized

unsloth/gemma-3-12b-it-qat-bnb-4bit

Adapter

aciklab/kubernetes-ai-lora

Quantized

(2)

this model