hash-map commited on
Commit
1a24ba3
·
verified ·
1 Parent(s): 5e61f2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +182 -176
README.md CHANGED
@@ -1,176 +1,182 @@
1
- # 🌙 Conditional GAN for Visible → Infrared (LLVIP)
2
-
3
- > **High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization**
4
-
5
- ---
6
-
7
- ## 🧩 Overview
8
-
9
- This project implements a **Conditional Generative Adversarial Network (cGAN)** trained to translate **visible-light (RGB)** images into **infrared (IR)** representations.
10
-
11
- It leverages **multi-loss optimization** — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both **scene structure** and **thermal contrast**.
12
-
13
- A higher emphasis is given to **L1 loss**, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.
14
-
15
- ---
16
-
17
- ## 📁 Dataset
18
-
19
- - **Dataset:** [LLVIP Dataset](https://huggingface.co/datasets/UserNae3/LLVIP)
20
- Paired **visible (RGB)** and **infrared (IR)** images under diverse lighting and background conditions.
21
-
22
- ---
23
-
24
- ## 🧠 Model Architecture
25
-
26
- - **Type:** Conditional GAN (cGAN)
27
- - **Direction:** *Visible → Infrared*
28
- - **Framework:** TensorFlow
29
- - **Pipeline Tag:** `image-to-image`
30
- - **License:** MIT
31
-
32
- ### 🧱 Generator
33
- - U-Net encoder–decoder with skip connections
34
- - Conditioned on RGB input
35
- - Output: single-channel IR image
36
-
37
- ### ⚔️ Discriminator
38
- - PatchGAN (70×70 receptive field)
39
- - Evaluates realism of local patches for fine detail learning
40
-
41
- ---
42
-
43
- ## ⚙️ Training Configuration
44
-
45
- | Setting | Value |
46
- |----------|--------|
47
- | **Epochs** | 100 |
48
- | **Steps per Epoch** | 376 |
49
- | **Batch Size** | 4 |
50
- | **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
51
- | **Learning Rate** | 2e-4 |
52
- | **Precision** | Mixed (FP16/32) |
53
- | **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
54
-
55
- ---
56
-
57
- ## 💡 Multi-Loss Function Design
58
-
59
- | Loss Type | Description | Weight (λ) | Purpose |
60
- |------------|--------------|-------------|----------|
61
- | **L1 Loss** | Pixel-wise mean absolute error between generated and real IR | **100** | Ensures global brightness & shape consistency |
62
- | **Perceptual Loss (VGG)** | Feature loss from `conv5_block4` of pretrained VGG-19 | **10** | Captures high-level texture and semantic alignment |
63
- | **Adversarial Loss** | Binary cross-entropy loss from PatchGAN discriminator | **1** | Encourages realistic IR texture generation |
64
- | **Edge Loss** | Sobel/gradient difference between real & generated images | **5** | Enhances sharpness and edge clarity |
65
-
66
- The **total generator loss** is computed as:
67
- \[
68
- L_{G} = \lambda_{L1} L_{L1} + \lambda_{perc} L_{perc} + \lambda_{adv} L_{adv} + \lambda_{edge} L_{edge}
69
- \]
70
-
71
- ---
72
-
73
- ## 📊 Evaluation Metrics
74
-
75
- | Metric | Definition | Result |
76
- |---------|-------------|--------|
77
- | **L1 Loss** | Mean absolute difference between generated and ground truth IR | **0.0611** |
78
- | **PSNR (Peak Signal-to-Noise Ratio)** | Measures reconstruction quality (higher is better) | **24.3096 dB** |
79
- | **SSIM (Structural Similarity Index Measure)** | Perceptual similarity between generated & target images | **0.8386** |
80
-
81
- ---
82
- ## 🏗️ Model Architectures
83
-
84
- | Model | Visualization |
85
- |-------|---------------|
86
- | **Generator** | ![Generator Architecture](generator.png) |
87
- | **Discriminator** | ![Discriminator Architecture](discriminator.png) |
88
- | **Combined GAN** | ![GAN Architecture Combined](gan_architecture_combined.png) |
89
-
90
- ---
91
-
92
-
93
-
94
- ## 🖼️ Visual Results
95
-
96
- ### 🎞️ Training Progress (Sample Evolution)
97
- <img src="ezgif-58298bca2da920.gif" alt="Training Progress" width="700"/>
98
-
99
- ### ✨ Final Convergence Samples
100
- | Early Epochs (Blurry, Low Brightness) | Later Epochs (Sharper, High Contrast) |
101
- |--------------------------------------|---------------------------------------|
102
- | <img src="./epoch_007.png" width="550"/> | <img src="epoch_100.png" width="550"/> |
103
-
104
- ### Comparison: Input vs Ground Truth vs Generated
105
- | RGB Input- Ground Truth IR - Predicted IR |
106
-
107
- | <img src="test_1179.png" width="750"/>
108
- | <img src="test_001.png" width="750"/>
109
- | <img src="test_4884.png" width="750"/>
110
- | <img src="test_5269.png" width="750"/>
111
- | <img src="test_5361.png" width="750"/>
112
- | <img src="test_7255.png" width="750"/>
113
- | <img src="test_7362.png" width="750"/>
114
- | <img src="test_12015.png" width="750"/>
115
- ---
116
-
117
- ## 📈 Loss Curves
118
-
119
- ### Generator & Discriminator Loss
120
- <img src="./train_loss_curve.png" alt="Training Loss Curve" width="600"/>
121
-
122
- ### Validation Loss per Epoch
123
- <img src="./val_loss_curve.png" alt="Validation Loss Curve" width="600"/>
124
-
125
- All training metrics are logged in:
126
-
127
- ---
128
- ```bash
129
- /
130
- ├── logs.log
131
- └── loss_summary.csv
132
- ```
133
- ## 🧩 Observations
134
-
135
- - The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
136
- - **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
137
- - Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
138
-
139
- ---
140
-
141
- ## 🚀 Future Work
142
-
143
- - Apply **feature matching loss** for smoother discriminator gradients
144
- - Introduce **spectral normalization** for training stability
145
- - Add **temporal or sequence consistency** for video IR translation
146
- - Adaptive loss balancing with epoch-based dynamic weighting
147
-
148
- ---
149
- ❤️ Acknowledgements
150
-
151
- LLVIP Dataset for paired RGB–IR samples
152
-
153
- TensorFlow and VGG-19 for perceptual feature extraction
154
-
155
- Kaggle GPU for high-performance model training
156
-
157
- ## 📜 License
158
-
159
- **MIT License © 2025**
160
- Author: **Sai Sumanth Appala**
161
-
162
- ---
163
-
164
- ## 🧾 Citation
165
-
166
- If you use this work, please cite:
167
-
168
- ```bibtex
169
- @misc{appala2025visible2ir,
170
- author = {Appala, Sai Sumanth},
171
- title = {Conditional GAN for Visible-to-Infrared Translation with Multi-Loss Training},
172
- year = {2025},
173
- license = {MIT},
174
- dataset = {UserNae3/LLVIP},
175
- framework = {TensorFlow},
176
- }
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - UserNae3/LLVIP
5
+ pipeline_tag: text-to-image
6
+ ---
7
+ # 🌙 Conditional GAN for Visible → Infrared (LLVIP)
8
+
9
+ > **High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization**
10
+
11
+ ---
12
+
13
+ ## 🧩 Overview
14
+
15
+ This project implements a **Conditional Generative Adversarial Network (cGAN)** trained to translate **visible-light (RGB)** images into **infrared (IR)** representations.
16
+
17
+ It leverages **multi-loss optimization** — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both **scene structure** and **thermal contrast**.
18
+
19
+ A higher emphasis is given to **L1 loss**, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.
20
+
21
+ ---
22
+
23
+ ## 📁 Dataset
24
+
25
+ - **Dataset:** [LLVIP Dataset](https://huggingface.co/datasets/UserNae3/LLVIP)
26
+ Paired **visible (RGB)** and **infrared (IR)** images under diverse lighting and background conditions.
27
+
28
+ ---
29
+
30
+ ## 🧠 Model Architecture
31
+
32
+ - **Type:** Conditional GAN (cGAN)
33
+ - **Direction:** *Visible Infrared*
34
+ - **Framework:** TensorFlow
35
+ - **Pipeline Tag:** `image-to-image`
36
+ - **License:** MIT
37
+
38
+ ### 🧱 Generator
39
+ - U-Net encoder–decoder with skip connections
40
+ - Conditioned on RGB input
41
+ - Output: single-channel IR image
42
+
43
+ ### ⚔️ Discriminator
44
+ - PatchGAN (70×70 receptive field)
45
+ - Evaluates realism of local patches for fine detail learning
46
+
47
+ ---
48
+
49
+ ## ⚙️ Training Configuration
50
+
51
+ | Setting | Value |
52
+ |----------|--------|
53
+ | **Epochs** | 100 |
54
+ | **Steps per Epoch** | 376 |
55
+ | **Batch Size** | 4 |
56
+ | **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
57
+ | **Learning Rate** | 2e-4 |
58
+ | **Precision** | Mixed (FP16/32) |
59
+ | **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
60
+
61
+ ---
62
+
63
+ ## 💡 Multi-Loss Function Design
64
+
65
+ | Loss Type | Description | Weight (λ) | Purpose |
66
+ |------------|--------------|-------------|----------|
67
+ | **L1 Loss** | Pixel-wise mean absolute error between generated and real IR | **100** | Ensures global brightness & shape consistency |
68
+ | **Perceptual Loss (VGG)** | Feature loss from `conv5_block4` of pretrained VGG-19 | **10** | Captures high-level texture and semantic alignment |
69
+ | **Adversarial Loss** | Binary cross-entropy loss from PatchGAN discriminator | **1** | Encourages realistic IR texture generation |
70
+ | **Edge Loss** | Sobel/gradient difference between real & generated images | **5** | Enhances sharpness and edge clarity |
71
+
72
+ The **total generator loss** is computed as:
73
+ \[
74
+ L_{G} = \lambda_{L1} L_{L1} + \lambda_{perc} L_{perc} + \lambda_{adv} L_{adv} + \lambda_{edge} L_{edge}
75
+ \]
76
+
77
+ ---
78
+
79
+ ## 📊 Evaluation Metrics
80
+
81
+ | Metric | Definition | Result |
82
+ |---------|-------------|--------|
83
+ | **L1 Loss** | Mean absolute difference between generated and ground truth IR | **0.0611** |
84
+ | **PSNR (Peak Signal-to-Noise Ratio)** | Measures reconstruction quality (higher is better) | **24.3096 dB** |
85
+ | **SSIM (Structural Similarity Index Measure)** | Perceptual similarity between generated & target images | **0.8386** |
86
+
87
+ ---
88
+ ## 🏗️ Model Architectures
89
+
90
+ | Model | Visualization |
91
+ |-------|---------------|
92
+ | **Generator** | ![Generator Architecture](generator.png) |
93
+ | **Discriminator** | ![Discriminator Architecture](discriminator.png) |
94
+ | **Combined GAN** | ![GAN Architecture Combined](gan_architecture_combined.png) |
95
+
96
+ ---
97
+
98
+
99
+
100
+ ## 🖼️ Visual Results
101
+
102
+ ### 🎞️ Training Progress (Sample Evolution)
103
+ <img src="ezgif-58298bca2da920.gif" alt="Training Progress" width="700"/>
104
+
105
+ ### Final Convergence Samples
106
+ | Early Epochs (Blurry, Low Brightness) | Later Epochs (Sharper, High Contrast) |
107
+ |--------------------------------------|---------------------------------------|
108
+ | <img src="./epoch_007.png" width="550"/> | <img src="epoch_100.png" width="550"/> |
109
+
110
+ ### Comparison: Input vs Ground Truth vs Generated
111
+ | RGB Input- Ground Truth IR - Predicted IR |
112
+
113
+ | <img src="test_1179.png" width="750"/>
114
+ | <img src="test_001.png" width="750"/>
115
+ | <img src="test_4884.png" width="750"/>
116
+ | <img src="test_5269.png" width="750"/>
117
+ | <img src="test_5361.png" width="750"/>
118
+ | <img src="test_7255.png" width="750"/>
119
+ | <img src="test_7362.png" width="750"/>
120
+ | <img src="test_12015.png" width="750"/>
121
+ ---
122
+
123
+ ## 📈 Loss Curves
124
+
125
+ ### Generator & Discriminator Loss
126
+ <img src="./train_loss_curve.png" alt="Training Loss Curve" width="600"/>
127
+
128
+ ### Validation Loss per Epoch
129
+ <img src="./val_loss_curve.png" alt="Validation Loss Curve" width="600"/>
130
+
131
+ All training metrics are logged in:
132
+
133
+ ---
134
+ ```bash
135
+ /
136
+ ├── logs.log
137
+ └── loss_summary.csv
138
+ ```
139
+ ## 🧩 Observations
140
+
141
+ - The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
142
+ - **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
143
+ - Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
144
+
145
+ ---
146
+
147
+ ## 🚀 Future Work
148
+
149
+ - Apply **feature matching loss** for smoother discriminator gradients
150
+ - Introduce **spectral normalization** for training stability
151
+ - Add **temporal or sequence consistency** for video IR translation
152
+ - Adaptive loss balancing with epoch-based dynamic weighting
153
+
154
+ ---
155
+ ❤️ Acknowledgements
156
+
157
+ LLVIP Dataset for paired RGB–IR samples
158
+
159
+ TensorFlow and VGG-19 for perceptual feature extraction
160
+
161
+ Kaggle GPU for high-performance model training
162
+
163
+ ## 📜 License
164
+
165
+ **MIT License © 2025**
166
+ Author: **Sai Sumanth Appala**
167
+
168
+ ---
169
+
170
+ ## 🧾 Citation
171
+
172
+ If you use this work, please cite:
173
+
174
+ ```bibtex
175
+ @misc{appala2025visible2ir,
176
+ author = {Appala, Sai Sumanth},
177
+ title = {Conditional GAN for Visible-to-Infrared Translation with Multi-Loss Training},
178
+ year = {2025},
179
+ license = {MIT},
180
+ dataset = {UserNae3/LLVIP},
181
+ framework = {TensorFlow},
182
+ }