hash-map
/

conditional_gan_vis_to_ir

Image-to-Image

Model card Files Files and versions

xet

Community

hash-map commited on Oct 21

Commit

1a24ba3

verified ·

1 Parent(s): 5e61f2a

Update README.md

Browse files

Files changed (1) hide show

README.md +182 -176

README.md CHANGED Viewed

@@ -1,176 +1,182 @@
-# 🌙 Conditional GAN for Visible → Infrared (LLVIP)
-> **High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization**
----
-## 🧩 Overview
-This project implements a **Conditional Generative Adversarial Network (cGAN)** trained to translate **visible-light (RGB)** images into **infrared (IR)** representations.
-It leverages **multi-loss optimization** — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both **scene structure** and **thermal contrast**.
-A higher emphasis is given to **L1 loss**, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.
----
-## 📁 Dataset
-- **Dataset:** [LLVIP Dataset](https://huggingface.co/datasets/UserNae3/LLVIP)
-  Paired **visible (RGB)** and **infrared (IR)** images under diverse lighting and background conditions.
----
-## 🧠 Model Architecture
-- **Type:** Conditional GAN (cGAN)
-- **Direction:** *Visible → Infrared*
-- **Framework:** TensorFlow
-- **Pipeline Tag:** `image-to-image`
-- **License:** MIT
-### 🧱 Generator
-- U-Net encoder–decoder with skip connections
-- Conditioned on RGB input
-- Output: single-channel IR image
-### ⚔️ Discriminator
-- PatchGAN (70×70 receptive field)
-- Evaluates realism of local patches for fine detail learning
----
-## ⚙️ Training Configuration
-| Setting | Value |
-|----------|--------|
-| **Epochs** | 100 |
-| **Steps per Epoch** | 376 |
-| **Batch Size** | 4 |
-| **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
-| **Learning Rate** | 2e-4 |
-| **Precision** | Mixed (FP16/32) |
-| **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
----
-## 💡 Multi-Loss Function Design
-| Loss Type | Description | Weight (λ) | Purpose |
-|------------|--------------|-------------|----------|
-| **L1 Loss** | Pixel-wise mean absolute error between generated and real IR | **100** | Ensures global brightness & shape consistency |
-| **Perceptual Loss (VGG)** | Feature loss from `conv5_block4` of pretrained VGG-19 | **10** | Captures high-level texture and semantic alignment |
-| **Adversarial Loss** | Binary cross-entropy loss from PatchGAN discriminator | **1** | Encourages realistic IR texture generation |
-| **Edge Loss** | Sobel/gradient difference between real & generated images | **5** | Enhances sharpness and edge clarity |
-The **total generator loss** is computed as:
-\[
-L_{G} = \lambda_{L1} L_{L1} + \lambda_{perc} L_{perc} + \lambda_{adv} L_{adv} + \lambda_{edge} L_{edge}
-\]
----
-## 📊 Evaluation Metrics
-| Metric | Definition | Result |
-|---------|-------------|--------|
-| **L1 Loss** | Mean absolute difference between generated and ground truth IR | **0.0611** |
-| **PSNR (Peak Signal-to-Noise Ratio)** | Measures reconstruction quality (higher is better) | **24.3096 dB** |
-| **SSIM (Structural Similarity Index Measure)** | Perceptual similarity between generated & target images | **0.8386** |
----
-## 🏗️ Model Architectures
-| Model | Visualization |
-|-------|---------------|
-| **Generator** | ![Generator Architecture](generator.png) |
-| **Discriminator** | ![Discriminator Architecture](discriminator.png) |
-| **Combined GAN** | ![GAN Architecture Combined](gan_architecture_combined.png) |
----
-## 🖼️ Visual Results
-### 🎞️ Training Progress (Sample Evolution)
-<img src="ezgif-58298bca2da920.gif" alt="Training Progress" width="700"/>
-### ✨ Final Convergence Samples
-| Early Epochs (Blurry, Low Brightness) | Later Epochs (Sharper, High Contrast) |
-|--------------------------------------|---------------------------------------|
-| <img src="./epoch_007.png" width="550"/> | <img src="epoch_100.png" width="550"/> |
-###  Comparison: Input vs Ground Truth vs Generated
-| RGB Input-  Ground Truth IR - Predicted IR |
-| <img src="test_1179.png" width="750"/>
-| <img src="test_001.png" width="750"/>
-| <img src="test_4884.png" width="750"/>
-| <img src="test_5269.png" width="750"/>
-| <img src="test_5361.png" width="750"/>
-| <img src="test_7255.png" width="750"/>
-| <img src="test_7362.png" width="750"/>
-| <img src="test_12015.png" width="750"/>
----
-## 📈 Loss Curves
-### Generator & Discriminator Loss
-<img src="./train_loss_curve.png" alt="Training Loss Curve" width="600"/>
-### Validation Loss per Epoch
-<img src="./val_loss_curve.png" alt="Validation Loss Curve" width="600"/>
-All training metrics are logged in:
----
-```bash
-/
-├── logs.log
-└── loss_summary.csv
-```
-## 🧩 Observations
-- The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
-- **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
-- Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
----
-## 🚀 Future Work
-- Apply **feature matching loss** for smoother discriminator gradients
-- Introduce **spectral normalization** for training stability
-- Add **temporal or sequence consistency** for video IR translation
-- Adaptive loss balancing with epoch-based dynamic weighting
----
-❤️ Acknowledgements
-LLVIP Dataset for paired RGB–IR samples
-TensorFlow and VGG-19 for perceptual feature extraction
-Kaggle GPU for high-performance model training
-## 📜 License
-**MIT License © 2025**
-Author: **Sai Sumanth Appala**
----
-## 🧾 Citation
-If you use this work, please cite:
-```bibtex
-@misc{appala2025visible2ir,
-  author = {Appala, Sai Sumanth},
-  title = {Conditional GAN for Visible-to-Infrared Translation with Multi-Loss Training},
-  year = {2025},
-  license = {MIT},
-  dataset = {UserNae3/LLVIP},
-  framework = {TensorFlow},
-}

+---
+license: mit
+datasets:
+- UserNae3/LLVIP
+pipeline_tag: text-to-image
+---
+# 🌙 Conditional GAN for Visible → Infrared (LLVIP)
+> **High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization**
+---
+## 🧩 Overview
+This project implements a **Conditional Generative Adversarial Network (cGAN)** trained to translate **visible-light (RGB)** images into **infrared (IR)** representations.
+It leverages **multi-loss optimization** — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both **scene structure** and **thermal contrast**.
+A higher emphasis is given to **L1 loss**, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.
+---
+## 📁 Dataset
+- **Dataset:** [LLVIP Dataset](https://huggingface.co/datasets/UserNae3/LLVIP)
+  Paired **visible (RGB)** and **infrared (IR)** images under diverse lighting and background conditions.
+---
+## 🧠 Model Architecture
+- **Type:** Conditional GAN (cGAN)
+- **Direction:** *Visible → Infrared*
+- **Framework:** TensorFlow
+- **Pipeline Tag:** `image-to-image`
+- **License:** MIT
+### 🧱 Generator
+- U-Net encoder–decoder with skip connections
+- Conditioned on RGB input
+- Output: single-channel IR image
+### ⚔️ Discriminator
+- PatchGAN (70×70 receptive field)
+- Evaluates realism of local patches for fine detail learning
+---
+## ⚙️ Training Configuration
+| Setting | Value |
+|----------|--------|
+| **Epochs** | 100 |
+| **Steps per Epoch** | 376 |
+| **Batch Size** | 4 |
+| **Optimizer** | Adam (β₁ = 0.5, β₂ = 0.999) |
+| **Learning Rate** | 2e-4 |
+| **Precision** | Mixed (FP16/32) |
+| **Hardware** | NVIDIA T4 (Kaggle GPU Runtime) |
+---
+## 💡 Multi-Loss Function Design
+| Loss Type | Description | Weight (λ) | Purpose |
+|------------|--------------|-------------|----------|
+| **L1 Loss** | Pixel-wise mean absolute error between generated and real IR | **100** | Ensures global brightness & shape consistency |
+| **Perceptual Loss (VGG)** | Feature loss from `conv5_block4` of pretrained VGG-19 | **10** | Captures high-level texture and semantic alignment |
+| **Adversarial Loss** | Binary cross-entropy loss from PatchGAN discriminator | **1** | Encourages realistic IR texture generation |
+| **Edge Loss** | Sobel/gradient difference between real & generated images | **5** | Enhances sharpness and edge clarity |
+The **total generator loss** is computed as:
+\[
+L_{G} = \lambda_{L1} L_{L1} + \lambda_{perc} L_{perc} + \lambda_{adv} L_{adv} + \lambda_{edge} L_{edge}
+\]
+---
+## 📊 Evaluation Metrics
+| Metric | Definition | Result |
+|---------|-------------|--------|
+| **L1 Loss** | Mean absolute difference between generated and ground truth IR | **0.0611** |
+| **PSNR (Peak Signal-to-Noise Ratio)** | Measures reconstruction quality (higher is better) | **24.3096 dB** |
+| **SSIM (Structural Similarity Index Measure)** | Perceptual similarity between generated & target images | **0.8386** |
+---
+## 🏗️ Model Architectures
+| Model | Visualization |
+|-------|---------------|
+| **Generator** | ![Generator Architecture](generator.png) |
+| **Discriminator** | ![Discriminator Architecture](discriminator.png) |
+| **Combined GAN** | ![GAN Architecture Combined](gan_architecture_combined.png) |
+---
+## 🖼️ Visual Results
+### 🎞️ Training Progress (Sample Evolution)
+<img src="ezgif-58298bca2da920.gif" alt="Training Progress" width="700"/>
+### ✨ Final Convergence Samples
+| Early Epochs (Blurry, Low Brightness) | Later Epochs (Sharper, High Contrast) |
+|--------------------------------------|---------------------------------------|
+| <img src="./epoch_007.png" width="550"/> | <img src="epoch_100.png" width="550"/> |
+###  Comparison: Input vs Ground Truth vs Generated
+| RGB Input-  Ground Truth IR - Predicted IR |
+| <img src="test_1179.png" width="750"/>
+| <img src="test_001.png" width="750"/>
+| <img src="test_4884.png" width="750"/>
+| <img src="test_5269.png" width="750"/>
+| <img src="test_5361.png" width="750"/>
+| <img src="test_7255.png" width="750"/>
+| <img src="test_7362.png" width="750"/>
+| <img src="test_12015.png" width="750"/>
+---
+## 📈 Loss Curves
+### Generator & Discriminator Loss
+<img src="./train_loss_curve.png" alt="Training Loss Curve" width="600"/>
+### Validation Loss per Epoch
+<img src="./val_loss_curve.png" alt="Validation Loss Curve" width="600"/>
+All training metrics are logged in:
+---
+```bash
+/
+├── logs.log
+└── loss_summary.csv
+```
+## 🧩 Observations
+- The model **captures IR brightness and object distinction**, but early epochs show slight blur due to L1-dominant stages.
+- **Contrast and edge sharpness improve** after ~70 epochs as adversarial and perceptual losses gain weight.
+- Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
+---
+## 🚀 Future Work
+- Apply **feature matching loss** for smoother discriminator gradients
+- Introduce **spectral normalization** for training stability
+- Add **temporal or sequence consistency** for video IR translation
+- Adaptive loss balancing with epoch-based dynamic weighting
+---
+❤️ Acknowledgements
+LLVIP Dataset for paired RGB–IR samples
+TensorFlow and VGG-19 for perceptual feature extraction
+Kaggle GPU for high-performance model training
+## 📜 License
+**MIT License © 2025**
+Author: **Sai Sumanth Appala**
+---
+## 🧾 Citation
+If you use this work, please cite:
+```bibtex
+@misc{appala2025visible2ir,
+  author = {Appala, Sai Sumanth},
+  title = {Conditional GAN for Visible-to-Infrared Translation with Multi-Loss Training},
+  year = {2025},
+  license = {MIT},
+  dataset = {UserNae3/LLVIP},
+  framework = {TensorFlow},
+}