| # 🧠 Image Classification AI Model (CIFAR-100) | |
| This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for **image classification** on the CIFAR-100 dataset. The model is built using `google/vit-base-patch16-224`, quantized to **FP16** for efficient inference, and delivers high accuracy in multi-class image classification tasks. | |
| --- | |
| ## 🚀 Features | |
| - 🖼️ **Task**: Image Classification | |
| - 🧠 **Base Model**: `google/vit-base-patch16-224` (Vision Transformer) | |
| - 🧪 **Quantized**: FP16 for faster and memory-efficient inference | |
| - 🎯 **Dataset**: 100 fine-grained object categories | |
| - ⚡ **CUDA Enabled**: Optimized for GPU acceleration | |
| - 📈 **High Accuracy**: Fine-tuned and evaluated on validation split | |
| --- | |
| ## 📊 Dataset Used | |
| **Hugging Face Dataset**: [`tanganke/cifar100`](https://huggingface.co/datasets/tanganke/cifar100) | |
| - **Description**: CIFAR-100 is a dataset of 60,000 32×32 color images in 100 classes (600 images per class) | |
| - **Split**: 50,000 training images and 10,000 test images | |
| - **Categories**: Animals, Vehicles, Food, Household items, etc. | |
| - **License**: MIT License (from source) | |
| ```python | |
| from datasets import load_dataset | |
| dataset = load_dataset("tanganke/cifar100") | |
| ``` | |
| ## 🛠️ Model & Training Configuration | |
| - Model: google/vit-base-patch16-224 | |
| - Image Size: 224x224 (resized from 32x32) | |
| - Framework: Hugging Face Transformers & Datasets | |
| - Training Environment: Kaggle Notebook with CUDA | |
| - Epochs: 5–10 (with early stopping) | |
| - Batch Size: 32 | |
| - Optimizer: AdamW | |
| - Loss Function: CrossEntropyLoss | |
| # ✅ Evaluation & Scoring | |
| - Accuracy: ~70–80% (varies by configuration) | |
| - Validation Tool: evaluate or sklearn.metrics | |
| - Metric: Accuracy, Top-1 and Top-5 scores | |
| - Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224") | |
| # 🔍 Inference Example | |
| ```python | |
| from PIL import Image | |
| import torch | |
| def predict(image_path): | |
| image = Image.open(image_path).convert("RGB") | |
| inputs = feature_extractor(images=image, return_tensors="pt").to("cuda") | |
| outputs = model(**inputs) | |
| logits = outputs.logits | |
| predicted_class = logits.argmax(-1).item() | |
| return dataset["train"].features["fine_label"].int2str(predicted_class) | |
| print(predict("sample_image.jpg")) | |
| ``` | |
| # 📁 Folder Structure | |
| 📦image-classification-vit | |
| ┣ 📂vit-cifar100-fp16 | |
| ┣ 📜train.py | |
| ┣ 📜inference.py | |
| ┣ 📜README.md | |
| ┗ 📜requirements.txt | |