Conditional GAN for Visible → Infrared (LLVIP)

High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization

Overview

This project implements a Conditional Generative Adversarial Network (cGAN) trained to translate visible-light (RGB) images into infrared (IR) representations.

It leverages multi-loss optimization — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both scene structure and thermal contrast.

A higher emphasis is given to L1 loss, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.

Dataset

Dataset: LLVIP Dataset
Paired visible (RGB) and infrared (IR) images under diverse lighting and background conditions.

Model Architecture

Type: Conditional GAN (cGAN)
Direction: Visible → Infrared
Framework: TensorFlow
Pipeline Tag: image-to-image
License: MIT

Generator

U-Net encoder–decoder with skip connections
Conditioned on RGB input
Output: single-channel IR image

Discriminator

Evaluates realism for fine detail learning

⚙️ Training Configuration

Setting	Value
Epochs	100
Steps per Epoch	376
Batch Size	4
Optimizer	Adam (β₁ = 0.5, β₂ = 0.999)
Learning Rate	2e-4
Precision	Mixed (32)
Hardware	NVIDIA T4 (Kaggle GPU Runtime)

Multi-Loss Function Design

Loss Type	Description	Weight (λ)	Purpose
L1 Loss	Pixel-wise mean absolute error between generated and real IR	100	Ensures global brightness & shape consistency
Perceptual Loss (VGG)	Feature loss from `conv5_block4` of pretrained VGG-19	10	Captures high-level texture and semantic alignment
Adversarial Loss	Binary cross-entropy loss from PatchGAN discriminator	1	Encourages realistic IR texture generation
Edge Loss	Sobel/gradient difference between real & generated images	5	Enhances sharpness and edge clarity

The total generator loss is computed as:
[ L_{G} = \lambda_{L1},L_{L1} + \lambda_{\text{perc}},L_{\text{perc}} + \lambda_{\text{adv}},L_{\text{adv}} + \lambda_{\text{edge}},L_{\text{edge}} ]

Evaluation Metrics

Metric	Definition	Result
L1 Loss	Mean absolute difference between generated and ground truth IR	0.0611
PSNR (Peak Signal-to-Noise Ratio)	Measures reconstruction quality (higher is better)	24.3096 dB
SSIM (Structural Similarity Index Measure)	Perceptual similarity between generated & target images	0.8386

Model Architectures

Model	Visualization
Generator
Discriminator
Combined GAN

Data Exploration

We analysed the LLVIP dataset and found that ~70% of image pairs are captured at < 50 lux lighting and ~30% at 50-200 lux. The average pedestrian height in IR channel was X pixels; outliers with <20 pixels height were excluded.

Visual Results

Training Progress (Sample Evolution)

✨ Final Convergence Samples

Early Epochs (Blurry, Low Brightness)	Later Epochs (Sharper, High Contrast)

Comparison: Input vs Ground Truth vs Generated

| RGB Input- Ground Truth IR - Predicted IR |

| | | | | | | |

Loss Curves

Generator & Discriminator Loss

Validation Loss per Epoch

All training metrics are logged in:

/
├── logs.log
└── loss_summary.csv

Observations

The model captures IR brightness and object distinction, but early epochs show slight blur due to L1-dominant stages.
Contrast and edge sharpness improve after ~70 epochs as adversarial and perceptual losses gain weight.
Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
We compared three variants: (i) U-Net regression (L1 only) → SSIM = 0.80;
(ii) cGAN with L1+adv → SSIM = 0.83; (iii) cGAN with L1+adv+perc+edge (our final) → SSIM = 0.8386

Future Work

Apply feature matching loss for smoother discriminator gradients
Add temporal or sequence consistency for video IR translation
Adaptive loss balancing with epoch-based dynamic weighting

Acknowledgements

LLVIP Dataset for paired RGB–IR samples

TensorFlow and VGG-19 for perceptual feature extraction

Kaggle GPU for high-performance model training

Downloads last month: -; Downloads are not tracked for this model. How to track

hash-map
/

conditional_gan_vis_to_ir