|
|
--- |
|
|
title: XAI Image Classifier |
|
|
emoji: ๐ฌ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
tags: |
|
|
- computer-vision |
|
|
- image-classification |
|
|
- explainable-ai |
|
|
- grad-cam |
|
|
- resnet |
|
|
- pytorch |
|
|
- interpretability |
|
|
--- |
|
|
|
|
|
# ๐ฌ XAI Image Classifier: ResNet-152 with Grad-CAM |
|
|
|
|
|
[](https://pytorch.org/) |
|
|
[](https://gradio.app) |
|
|
[](LICENSE) |
|
|
|
|
|
> **Production-grade explainable image classification** powered by ResNet-152 architecture with gradient-based visual attribution via Grad-CAM. |
|
|
|
|
|
## ๐ฏ Overview |
|
|
|
|
|
This space provides **transparent AI decision-making** for image classification tasks. Built on ResNet-152 (82.3% ImageNet Top-1 accuracy), it integrates Captum's LayerGradCam to generate pixel-level attribution maps, revealing which spatial regions drive class-specific predictions. |
|
|
|
|
|
## โจ Key Features |
|
|
|
|
|
| Feature | Description | |
|
|
|---------|-------------| |
|
|
| **๐ง ResNet-152 Architecture** | 60M parameters, 82.3% ImageNet accuracy | |
|
|
| **๐ฅ Grad-CAM Visualization** | Gradient-weighted class activation mapping | |
|
|
| **โก GPU-Optimized Inference** | FP16 mixed-precision (~4-5ms latency on A100) | |
|
|
| **๐ Multi-View Analysis** | Original + Heatmap + Overlay + Contours | |
|
|
| **๐จ 1000 ImageNet Classes** | Comprehensive object recognition | |
|
|
|
|
|
## ๐ How to Use |
|
|
|
|
|
1. **Upload an image** (JPG, PNG, WebP supported) |
|
|
2. Click **"๐ Analyze"** to run inference |
|
|
3. View **Top-10 predictions** with confidence scores |
|
|
4. Examine **Grad-CAM heatmaps** showing model attention |
|
|
5. Compare **multiple colormap visualizations** |
|
|
|
|
|
## ๐ฌ Technical Architecture |
|
|
```python |
|
|
Model: ResNet-152 (torchvision.models.resnet152) |
|
|
Weights: IMAGENET1K_V2 (pretrained) |
|
|
XAI Method: Layer Grad-CAM (Captum) |
|
|
Target Layer: layer4[-1] (final conv block) |
|
|
Input Size: 224ร224 RGB |
|
|
Precision: FP16 (GPU) / FP32 (CPU) |
|
|
``` |
|
|
|
|
|
### Performance Metrics |
|
|
|
|
|
| Hardware | Inference Time | Memory Usage | |
|
|
|----------|---------------|--------------| |
|
|
| NVIDIA A100 | ~3-4ms | 1.2GB | |
|
|
| NVIDIA T4 | ~8-10ms | 1.2GB | |
|
|
| CPU (16 cores) | ~200ms | 2.5GB | |
|
|
|
|
|
## ๐ Model Accuracy |
|
|
|
|
|
- **Top-1 Accuracy:** 82.3% (ImageNet validation set) |
|
|
- **Top-5 Accuracy:** 96.1% |
|
|
- **Parameter Count:** 60.2M |
|
|
- **FLOPs:** 11.6B |
|
|
|
|
|
## ๐ ๏ธ Optimizations Applied |
|
|
|
|
|
- **FP16 Mixed Precision:** 2x inference speedup on GPU |
|
|
- **cuDNN Benchmark:** Auto-tuned convolution algorithms |
|
|
- **TF32 Operations:** 8x faster matmuls on Ampere GPUs |
|
|
- **Gradient Checkpointing:** Memory-efficient Grad-CAM computation |
|
|
|
|
|
## ๐จ Visualization Outputs |
|
|
|
|
|
1. **Original Image** - Input as-is |
|
|
2. **Grad-CAM Heatmap** - Pure activation visualization |
|
|
3. **Overlay** - Heatmap superimposed on original |
|
|
4. **Multi-Colormap Comparison** - Jet, Hot, Viridis with contours |
|
|
|
|
|
## ๐ Use Cases |
|
|
|
|
|
| Domain | Application | |
|
|
|--------|-------------| |
|
|
| **Medical Imaging** | Validate diagnostic AI attention regions | |
|
|
| **Autonomous Systems** | Debug object detection focus | |
|
|
| **Security & Surveillance** | Audit algorithmic decision-making | |
|
|
| **Research** | Study CNN feature representations | |
|
|
| **Education** | Teach explainable AI concepts | |
|
|
|
|
|
## ๐ Privacy & Ethics |
|
|
|
|
|
- โ
**No data retention** - Images processed in-memory only |
|
|
- โ
**Zero telemetry** - No usage tracking |
|
|
- โ
**Open source** - Full code transparency |
|
|
- โ
**Bias auditing** - Visual inspection of model biases |
|
|
|
|
|
## ๐ References |
|
|
|
|
|
### Model Architecture |
|
|
- He, K., et al. (2016). *Deep Residual Learning for Image Recognition.* CVPR. |
|
|
|
|
|
### Explainability Method |
|
|
- Selvaraju, R. R., et al. (2017). *Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.* ICCV. |
|
|
|
|
|
### Framework |
|
|
- PyTorch Team. *PyTorch: An Imperative Style, High-Performance Deep Learning Library.* NeurIPS 2019. |
|
|
|
|
|
## ๐ Links |
|
|
|
|
|
- **GitHub Repository:** [0AnshuAditya0/xai](https://github.com/0AnshuAditya0/xai) |
|
|
- **Documentation:** [Full Technical Docs](https://github.com/0AnshuAditya0/xai/wiki) |
|
|
- **Paper (Grad-CAM):** [arXiv:1610.02391](https://arxiv.org/abs/1610.02391) |
|
|
- **Paper (ResNet):** [arXiv:1512.03385](https://arxiv.org/abs/1512.03385) |
|
|
|
|
|
## โ๏ธ Technical Requirements |
|
|
```bash |
|
|
# Core Dependencies |
|
|
torch>=2.0.0 |
|
|
torchvision>=0.15.0 |
|
|
gradio>=4.44.0 |
|
|
captum>=0.6.0 |
|
|
Pillow>=9.0.0 |
|
|
numpy>=1.23.0 |
|
|
matplotlib>=3.5.0 |
|
|
``` |
|
|
|
|
|
## ๐ Known Limitations |
|
|
|
|
|
- **Memory:** Requires ~1.2GB GPU memory (FP16 mode) |
|
|
- **Latency:** CPU inference slower (~200ms vs ~5ms GPU) |
|
|
- **Classes:** Limited to 1000 ImageNet categories |
|
|
- **Input Format:** RGB images only (grayscale not supported) |
|
|
|
|
|
## ๐ฎ Roadmap |
|
|
|
|
|
- [ ] Add support for custom model fine-tuning |
|
|
- [ ] Implement batch processing API |
|
|
- [ ] Integrate additional XAI methods (SHAP, Integrated Gradients) |
|
|
- [ ] Add uncertainty quantification |
|
|
- [ ] Support for video frame analysis |
|
|
|
|
|
## ๐ License |
|
|
|
|
|
MIT License - Free for research, education, and commercial use. |
|
|
|
|
|
## ๐จโ๐ป Author |
|
|
|
|
|
**Anshu Aditya** |
|
|
AI Engineer | Explainable AI Researcher |
|
|
|
|
|
[](https://github.com/0AnshuAditya0) |
|
|
[](https://linkedin.com/in/your-profile) |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Built with โค๏ธ for transparent and accountable AI** |
|
|
|
|
|
*Making deep learning interpretable, one image at a time* |
|
|
|
|
|
โญ Star this space if you find it useful! |
|
|
|
|
|
</div> |