Update README.md
Browse files
README.md
CHANGED
|
@@ -1,176 +1,182 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
|
| 52 |
-
|
| 53 |
-
| **
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
| <img src="
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
|
| 112 |
-
|
| 113 |
-
| <img src="
|
| 114 |
-
| <img src="
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
<img src="
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- UserNae3/LLVIP
|
| 5 |
+
pipeline_tag: text-to-image
|
| 6 |
+
---
|
| 7 |
+
# 🌙 Conditional GAN for Visible → Infrared (LLVIP)
|
| 8 |
+
|
| 9 |
+
> **High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization**
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## 🧩 Overview
|
| 14 |
+
|
| 15 |
+
This project implements a **Conditional Generative Adversarial Network (cGAN)** trained to translate **visible-light (RGB)** images into **infrared (IR)** representations.
|
| 16 |
+
|
| 17 |
+
It leverages **multi-loss optimization** — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both **scene structure** and **thermal contrast**.
|
| 18 |
+
|
| 19 |
+
A higher emphasis is given to **L1 loss**, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## 📁 Dataset
|
| 24 |
+
|
| 25 |
+
- **Dataset:** [LLVIP Dataset](https://huggingface.co/datasets/UserNae3/LLVIP)
|
| 26 |
+
Paired **visible (RGB)** and **infrared (IR)** images under diverse lighting and background conditions.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## 🧠 Model Architecture
|
| 31 |
+
|
| 32 |
+
- **Type:** Conditional GAN (cGAN)
|
| 33 |
+
- **Direction:** *Visible → Infrared*
|
| 34 |
+
- **Framework:** TensorFlow
|
| 35 |
+
- **Pipeline Tag:** `image-to-image`
|
| 36 |
+
- **License:** MIT
|
| 37 |
+
|
| 38 |
+
### 🧱 Generator
|
| 39 |
+
- U-Net encoder–decoder with skip connections
|
| 40 |
+
- Conditioned on RGB input
|
| 41 |
+
- Output: single-channel IR image
|
| 42 |
+
|
| 43 |
+
### ⚔️ Discriminator
|
| 44 |
+
- PatchGAN (70×70 receptive field)
|
| 45 |
+
- Evaluates realism of local patches for fine detail learning
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## ⚙️ Training Configuration
|
| 50 |
+
|
| 51 |
+
| Setting | Value |
|
| 52 |
+
|----------|--------|
|
| 53 |
+
| **Epochs** | 100 |
|
| 54 |
+
| **Steps per Epoch** | 376 |
|
| 55 |
+
| **Batch Size** | 4 |
|
| 56 |
+
| **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
|
| 57 |
+
| **Learning Rate** | 2e-4 |
|
| 58 |
+
| **Precision** | Mixed (FP16/32) |
|
| 59 |
+
| **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## 💡 Multi-Loss Function Design
|
| 64 |
+
|
| 65 |
+
| Loss Type | Description | Weight (λ) | Purpose |
|
| 66 |
+
|------------|--------------|-------------|----------|
|
| 67 |
+
| **L1 Loss** | Pixel-wise mean absolute error between generated and real IR | **100** | Ensures global brightness & shape consistency |
|
| 68 |
+
| **Perceptual Loss (VGG)** | Feature loss from `conv5_block4` of pretrained VGG-19 | **10** | Captures high-level texture and semantic alignment |
|
| 69 |
+
| **Adversarial Loss** | Binary cross-entropy loss from PatchGAN discriminator | **1** | Encourages realistic IR texture generation |
|
| 70 |
+
| **Edge Loss** | Sobel/gradient difference between real & generated images | **5** | Enhances sharpness and edge clarity |
|
| 71 |
+
|
| 72 |
+
The **total generator loss** is computed as:
|
| 73 |
+
\[
|
| 74 |
+
L_{G} = \lambda_{L1} L_{L1} + \lambda_{perc} L_{perc} + \lambda_{adv} L_{adv} + \lambda_{edge} L_{edge}
|
| 75 |
+
\]
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## 📊 Evaluation Metrics
|
| 80 |
+
|
| 81 |
+
| Metric | Definition | Result |
|
| 82 |
+
|---------|-------------|--------|
|
| 83 |
+
| **L1 Loss** | Mean absolute difference between generated and ground truth IR | **0.0611** |
|
| 84 |
+
| **PSNR (Peak Signal-to-Noise Ratio)** | Measures reconstruction quality (higher is better) | **24.3096 dB** |
|
| 85 |
+
| **SSIM (Structural Similarity Index Measure)** | Perceptual similarity between generated & target images | **0.8386** |
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
## 🏗️ Model Architectures
|
| 89 |
+
|
| 90 |
+
| Model | Visualization |
|
| 91 |
+
|-------|---------------|
|
| 92 |
+
| **Generator** |  |
|
| 93 |
+
| **Discriminator** |  |
|
| 94 |
+
| **Combined GAN** |  |
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
## 🖼️ Visual Results
|
| 101 |
+
|
| 102 |
+
### 🎞️ Training Progress (Sample Evolution)
|
| 103 |
+
<img src="ezgif-58298bca2da920.gif" alt="Training Progress" width="700"/>
|
| 104 |
+
|
| 105 |
+
### ✨ Final Convergence Samples
|
| 106 |
+
| Early Epochs (Blurry, Low Brightness) | Later Epochs (Sharper, High Contrast) |
|
| 107 |
+
|--------------------------------------|---------------------------------------|
|
| 108 |
+
| <img src="./epoch_007.png" width="550"/> | <img src="epoch_100.png" width="550"/> |
|
| 109 |
+
|
| 110 |
+
### Comparison: Input vs Ground Truth vs Generated
|
| 111 |
+
| RGB Input- Ground Truth IR - Predicted IR |
|
| 112 |
+
|
| 113 |
+
| <img src="test_1179.png" width="750"/>
|
| 114 |
+
| <img src="test_001.png" width="750"/>
|
| 115 |
+
| <img src="test_4884.png" width="750"/>
|
| 116 |
+
| <img src="test_5269.png" width="750"/>
|
| 117 |
+
| <img src="test_5361.png" width="750"/>
|
| 118 |
+
| <img src="test_7255.png" width="750"/>
|
| 119 |
+
| <img src="test_7362.png" width="750"/>
|
| 120 |
+
| <img src="test_12015.png" width="750"/>
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## 📈 Loss Curves
|
| 124 |
+
|
| 125 |
+
### Generator & Discriminator Loss
|
| 126 |
+
<img src="./train_loss_curve.png" alt="Training Loss Curve" width="600"/>
|
| 127 |
+
|
| 128 |
+
### Validation Loss per Epoch
|
| 129 |
+
<img src="./val_loss_curve.png" alt="Validation Loss Curve" width="600"/>
|
| 130 |
+
|
| 131 |
+
All training metrics are logged in:
|
| 132 |
+
|
| 133 |
+
---
|
| 134 |
+
```bash
|
| 135 |
+
/
|
| 136 |
+
├── logs.log
|
| 137 |
+
└── loss_summary.csv
|
| 138 |
+
```
|
| 139 |
+
## 🧩 Observations
|
| 140 |
+
|
| 141 |
+
- The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
|
| 142 |
+
- **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
|
| 143 |
+
- Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
+
## 🚀 Future Work
|
| 148 |
+
|
| 149 |
+
- Apply **feature matching loss** for smoother discriminator gradients
|
| 150 |
+
- Introduce **spectral normalization** for training stability
|
| 151 |
+
- Add **temporal or sequence consistency** for video IR translation
|
| 152 |
+
- Adaptive loss balancing with epoch-based dynamic weighting
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
❤️ Acknowledgements
|
| 156 |
+
|
| 157 |
+
LLVIP Dataset for paired RGB–IR samples
|
| 158 |
+
|
| 159 |
+
TensorFlow and VGG-19 for perceptual feature extraction
|
| 160 |
+
|
| 161 |
+
Kaggle GPU for high-performance model training
|
| 162 |
+
|
| 163 |
+
## 📜 License
|
| 164 |
+
|
| 165 |
+
**MIT License © 2025**
|
| 166 |
+
Author: **Sai Sumanth Appala**
|
| 167 |
+
|
| 168 |
+
---
|
| 169 |
+
|
| 170 |
+
## 🧾 Citation
|
| 171 |
+
|
| 172 |
+
If you use this work, please cite:
|
| 173 |
+
|
| 174 |
+
```bibtex
|
| 175 |
+
@misc{appala2025visible2ir,
|
| 176 |
+
author = {Appala, Sai Sumanth},
|
| 177 |
+
title = {Conditional GAN for Visible-to-Infrared Translation with Multi-Loss Training},
|
| 178 |
+
year = {2025},
|
| 179 |
+
license = {MIT},
|
| 180 |
+
dataset = {UserNae3/LLVIP},
|
| 181 |
+
framework = {TensorFlow},
|
| 182 |
+
}
|