File size: 5,758 Bytes
1a24ba3 6f3ce53 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 cd5ce2a 1a24ba3 cd5ce2a 1a24ba3 77c5948 1a24ba3 808e3c3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 cd5ce2a 1a24ba3 cd5ce2a 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 cd5ce2a 1a24ba3 77c5948 1a24ba3 77c5948 1a24ba3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
license: mit
datasets:
- UserNae3/LLVIP
pipeline_tag: image-to-image
---
# Conditional GAN for Visible → Infrared (LLVIP)
> **High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization**
---
## Overview
This project implements a **Conditional Generative Adversarial Network (cGAN)** trained to translate **visible-light (RGB)** images into **infrared (IR)** representations.
It leverages **multi-loss optimization** — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both **scene structure** and **thermal contrast**.
A higher emphasis is given to **L1 loss**, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.
---
## Dataset
- **Dataset:** [LLVIP Dataset](https://huggingface.co/datasets/UserNae3/LLVIP)
Paired **visible (RGB)** and **infrared (IR)** images under diverse lighting and background conditions.
---
## Model Architecture
- **Type:** Conditional GAN (cGAN)
- **Direction:** *Visible → Infrared*
- **Framework:** TensorFlow
- **Pipeline Tag:** `image-to-image`
- **License:** MIT
### Generator
- U-Net encoder–decoder with skip connections
- Conditioned on RGB input
- Output: single-channel IR image
### Discriminator
- Evaluates realism for fine detail learning
---
## ⚙️ Training Configuration
| Setting | Value |
|----------|--------|
| **Epochs** | 100 |
| **Steps per Epoch** | 376 |
| **Batch Size** | 4 |
| **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
| **Learning Rate** | 2e-4 |
| **Precision** | Mixed (32) |
| **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
---
## Multi-Loss Function Design
| Loss Type | Description | Weight (λ) | Purpose |
|------------|--------------|-------------|----------|
| **L1 Loss** | Pixel-wise mean absolute error between generated and real IR | **100** | Ensures global brightness & shape consistency |
| **Perceptual Loss (VGG)** | Feature loss from `conv5_block4` of pretrained VGG-19 | **10** | Captures high-level texture and semantic alignment |
| **Adversarial Loss** | Binary cross-entropy loss from PatchGAN discriminator | **1** | Encourages realistic IR texture generation |
| **Edge Loss** | Sobel/gradient difference between real & generated images | **5** | Enhances sharpness and edge clarity |
---
The **total generator loss** is computed as:
\[
L_{G} = \lambda_{L1}\,L_{L1} + \lambda_{\text{perc}}\,L_{\text{perc}} + \lambda_{\text{adv}}\,L_{\text{adv}} + \lambda_{\text{edge}}\,L_{\text{edge}}
\]
## Evaluation Metrics
| Metric | Definition | Result |
|---------|-------------|--------|
| **L1 Loss** | Mean absolute difference between generated and ground truth IR | **0.0611** |
| **PSNR (Peak Signal-to-Noise Ratio)** | Measures reconstruction quality (higher is better) | **24.3096 dB** |
| **SSIM (Structural Similarity Index Measure)** | Perceptual similarity between generated & target images | **0.8386** |
---
## Model Architectures
| Model | Visualization |
|-------|---------------|
| **Generator** |  |
| **Discriminator** |  |
| **Combined GAN** |  |
---
Data Exploration
We analysed the LLVIP dataset and found that ~70% of image pairs are captured at < 50 lux lighting and ~30% at 50-200 lux.
The average pedestrian height in IR channel was X pixels; outliers with <20 pixels height were excluded.
## Visual Results
### Training Progress (Sample Evolution)
<img src="ezgif-58298bca2da920.gif" alt="Training Progress" width="700"/>
### ✨ Final Convergence Samples
| Early Epochs (Blurry, Low Brightness) | Later Epochs (Sharper, High Contrast) |
|--------------------------------------|---------------------------------------|
| <img src="./epoch_007.png" width="550"/> | <img src="epoch_100.png" width="550"/> |
### Comparison: Input vs Ground Truth vs Generated
| RGB Input- Ground Truth IR - Predicted IR |
| <img src="test_1179.png" width="750"/>
| <img src="test_001.png" width="750"/>
| <img src="test_4884.png" width="750"/>
| <img src="test_5269.png" width="750"/>
| <img src="test_5361.png" width="750"/>
| <img src="test_7255.png" width="750"/>
| <img src="test_7362.png" width="750"/>
| <img src="test_12015.png" width="750"/>
---
## Loss Curves
### Generator & Discriminator Loss
<img src="./train_loss_curve.png" alt="Training Loss Curve" width="600"/>
### Validation Loss per Epoch
<img src="./val_loss_curve.png" alt="Validation Loss Curve" width="600"/>
All training metrics are logged in:
---
```bash
/
├── logs.log
└── loss_summary.csv
```
## Observations
- The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
- **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
- Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
- We compared three variants: (i) U-Net regression (L1 only) → SSIM = 0.80;
- (ii) cGAN with L1+adv → SSIM = 0.83; (iii) cGAN with L1+adv+perc+edge (our final) → SSIM = 0.8386
---
## Future Work
- Apply **feature matching loss** for smoother discriminator gradients
- Add **temporal or sequence consistency** for video IR translation
- Adaptive loss balancing with epoch-based dynamic weighting
---
Acknowledgements
LLVIP Dataset for paired RGB–IR samples
TensorFlow and VGG-19 for perceptual feature extraction
Kaggle GPU for high-performance model training
|