Kubernetes AI - GGUF Quantized Models

Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to GGUF format for efficient local inference.

Model Description

This repository contains GGUF quantized versions of the Kubernetes AI model, optimized for running on consumer hardware without GPU requirements. The model consists of LoRA adapters fine-tuned on unsloth/gemma-3-12b-it-qat-bnb-4bit and converted to GGUF format for llama.cpp compatibility.

Primary Purpose: Answer Kubernetes-related questions in Turkish language on local machines.

Available Models

Model Size Download
Unquantized 22.0 GB kubernetes-ai.gguf
Q8_0 12.5 GB kubernetes-ai-Q8_0.gguf
Q5_K_M 8.45 GB kubernetes-ai-Q5_K_M.gguf
Q4_K_M 7.3 GB kubernetes-ai-Q4_K_M.gguf
Q4_K_S 6.9 GB kubernetes-ai-Q4_K_S.gguf
Q3_K_M 6.0 GB kubernetes-ai-Q3_K_M.gguf
IQ3_M 5.6 GB kubernetes-ai-IQ3_M.gguf

Recommended: Q4_K_M for best balance of quality and size, or IQ3_M for low-end systems.

Quick Start

Using Ollama (Recommended)

Ollama provides the easiest way to run GGUF models locally.

1. Install Ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows - Download from https://ollama.com/download

2. Download Model

# Download your preferred quantization
wget https://huggingface.co/aciklab/kubernetes-ai-GGUF/resolve/main/kubernetes-ai-Q4_K_M.gguf

3. Create Modelfile

cat > Modelfile << 'EOF'
FROM <path-to-model>/kubernetes-ai.gguf

TEMPLATE """{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>
"""

# Model Parametreleri
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER top_k 64
PARAMETER repeat_penalty 1.05
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"

SYSTEM """Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."""
EOF

4. Create and Run Model

# Create model
ollama create kubernetes-ai -f Modelfile

# Run interactive chat
ollama run kubernetes-ai

# Example query
ollama run kubernetes-ai "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"

Training Details

This model is based on the aciklab/kubernetes-ai LoRA adapters:

  • Base Model: unsloth/gemma-3-12b-it-qat-bnb-4bit
  • Training Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 8
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training Dataset: ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
  • Training Time: 28 hours on NVIDIA RTX 5070 12GB
  • Max Sequence Length: 1024 tokens

Training Dataset Summary

Dataset Category Count Description
Kubernetes Official Docs 8,910 Concepts, kubectl, setup, tasks, tutorials
Stack Overflow 52,000 Kubernetes Q&A from community
DevOps Datasets 62,500 General DevOps and Kubernetes content
Configurations & CLI 36,800 Kubernetes configs, kubectl examples, operators
Total ~157,210 Comprehensive Kubernetes knowledge base

Quantization Details

All models were quantized using llama.cpp with importance matrix optimization:

  • Source: Merged LoRA adapters with base model
  • Quantization Tool: llama.cpp (latest)
  • Method: K-quant and IQ-quant mixtures
  • Optimization: Importance matrix for better quality

Quantization Quality

  • Q4_K_M: Best balance - recommended for most users
  • Q4_K_S: Slightly smaller with minimal quality loss
  • Q3_K_M: Good for memory-constrained systems
  • IQ3_M: Advanced 3-bit quantization for laptops
  • Unquantized: Original F16/F32 precision

Hardware Requirements

Minimum

  • CPU: 4+ cores
  • RAM: 8GB (for IQ3_M/Q3_K_M quantizations)
  • Storage: 6-8GB free space
  • GPU: Not required (CPU inference)

Recommended

  • CPU: 8+ cores
  • RAM: 16GB (for Q4_K_M/Q4_K_S quantizations)
  • Storage: 10GB free space
  • GPU: Optional (can accelerate inference)

License

This model is released under the MIT License. Free to use in commercial and open-source projects.

Contact

Produced by: HAVELSAN/Açıklab

For questions or feedback, please open an issue on the model repository.


Note: These are GGUF quantized versions ready for immediate use. No additional model loading or merging required.

Downloads last month
337
GGUF
Model size
12B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aciklab/kubernetes-ai-GGUF