Conditional GAN for Visible → Infrared (LLVIP)

High-fidelity Visible-to-Infrared Translation using a Conditional GAN with Multi-Loss Optimization


Overview

This project implements a Conditional Generative Adversarial Network (cGAN) trained to translate visible-light (RGB) images into infrared (IR) representations.

It leverages multi-loss optimization — combining perceptual, pixel, adversarial, and edge-based objectives — to generate sharp, realistic IR outputs that preserve both scene structure and thermal contrast.

A higher emphasis is given to L1 loss, ensuring that overall brightness and object boundaries remain consistent between visible and infrared domains.


Dataset

  • Dataset: LLVIP Dataset
    Paired visible (RGB) and infrared (IR) images under diverse lighting and background conditions.

Model Architecture

  • Type: Conditional GAN (cGAN)
  • Direction: Visible → Infrared
  • Framework: TensorFlow
  • Pipeline Tag: image-to-image
  • License: MIT

Generator

  • U-Net encoder–decoder with skip connections
  • Conditioned on RGB input
  • Output: single-channel IR image

Discriminator

  • Evaluates realism for fine detail learning

⚙️ Training Configuration

Setting Value
Epochs 100
Steps per Epoch 376
Batch Size 4
Optimizer Adam (β₁ = 0.5, β₂ = 0.999)
Learning Rate 2e-4
Precision Mixed (32)
Hardware NVIDIA T4 (Kaggle GPU Runtime)

Multi-Loss Function Design

Loss Type Description Weight (λ) Purpose
L1 Loss Pixel-wise mean absolute error between generated and real IR 100 Ensures global brightness & shape consistency
Perceptual Loss (VGG) Feature loss from conv5_block4 of pretrained VGG-19 10 Captures high-level texture and semantic alignment
Adversarial Loss Binary cross-entropy loss from PatchGAN discriminator 1 Encourages realistic IR texture generation
Edge Loss Sobel/gradient difference between real & generated images 5 Enhances sharpness and edge clarity

The total generator loss is computed as:
[ L_{G} = \lambda_{L1},L_{L1} + \lambda_{\text{perc}},L_{\text{perc}} + \lambda_{\text{adv}},L_{\text{adv}} + \lambda_{\text{edge}},L_{\text{edge}} ]

Evaluation Metrics

Metric Definition Result
L1 Loss Mean absolute difference between generated and ground truth IR 0.0611
PSNR (Peak Signal-to-Noise Ratio) Measures reconstruction quality (higher is better) 24.3096 dB
SSIM (Structural Similarity Index Measure) Perceptual similarity between generated & target images 0.8386

Model Architectures

Model Visualization
Generator Generator Architecture
Discriminator Discriminator Architecture
Combined GAN GAN Architecture Combined

Data Exploration

We analysed the LLVIP dataset and found that ~70% of image pairs are captured at < 50 lux lighting and ~30% at 50-200 lux. The average pedestrian height in IR channel was X pixels; outliers with <20 pixels height were excluded.

Visual Results

Training Progress (Sample Evolution)

Training Progress

✨ Final Convergence Samples

Early Epochs (Blurry, Low Brightness) Later Epochs (Sharper, High Contrast)

Comparison: Input vs Ground Truth vs Generated

| RGB Input- Ground Truth IR - Predicted IR |

| | | | | | | |

Loss Curves

Generator & Discriminator Loss

Training Loss Curve

Validation Loss per Epoch

Validation Loss Curve

All training metrics are logged in:


/
├── logs.log
└── loss_summary.csv

Observations

  • The model captures IR brightness and object distinction, but early epochs show slight blur due to L1-dominant stages.
  • Contrast and edge sharpness improve after ~70 epochs as adversarial and perceptual losses gain weight.
  • Background variations in LLVIP introduce challenges; future fine-tuning on domain-aligned subsets can further improve realism.
  • We compared three variants: (i) U-Net regression (L1 only) → SSIM = 0.80;
  • (ii) cGAN with L1+adv → SSIM = 0.83; (iii) cGAN with L1+adv+perc+edge (our final) → SSIM = 0.8386

Future Work

  • Apply feature matching loss for smoother discriminator gradients
  • Add temporal or sequence consistency for video IR translation
  • Adaptive loss balancing with epoch-based dynamic weighting

Acknowledgements

LLVIP Dataset for paired RGB–IR samples

TensorFlow and VGG-19 for perceptual feature extraction

Kaggle GPU for high-performance model training

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train hash-map/conditional_gan_vis_to_ir