|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- image-classification |
|
|
- deepfake-detection |
|
|
- computer-vision |
|
|
- vision-transformer |
|
|
- sdxl |
|
|
- fake-face-detection |
|
|
datasets: |
|
|
- xhlulu/140k-real-and-fake-faces |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: SDXL-Deepfake-Detector |
|
|
results: |
|
|
- task: |
|
|
type: image-classification |
|
|
name: Image Classification |
|
|
dataset: |
|
|
name: 140k Real and Fake Faces |
|
|
type: xhlulu/140k-real-and-fake-faces |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.86 |
|
|
name: Accuracy |
|
|
--- |
|
|
|
|
|
# SDXL-Deepfake-Detector |
|
|
### Detecting AI-Generated Faces with Precision and Purpose |
|
|
|
|
|
>*Not just another classifier — a tool for digital truth.* |
|
|
> |
|
|
Developed by **[Sadra Milani Moghaddam](https://sadramilani.ir/)** |
|
|
|
|
|
--- |
|
|
|
|
|
## Why This Matters |
|
|
As generative AI (like SDXL, DALL·E, and Midjourney) becomes more accessible, the line between real and synthetic media blurs — especially for vulnerable communities. This project started as a technical experiment but evolved into a **privacy-aware, open-source defense** against visual misinformation, with a focus on **ethical AI deployment**. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**SDXL-Deepfake-Detector** is a fine-tuned vision transformer that classifies human faces as **artificial (0)** or **human (1)**, achieving an accuracy of **86%**. |
|
|
|
|
|
## Training Approach |
|
|
|
|
|
This model was obtained by **fine-tuning** the [`Organika/sdxl-detector`](https://huggingface.co/Organika/sdxl-detector) — a vision transformer pre-trained specifically to detect SDXL-generated faces — on the [140k Real and Fake Faces](https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces) dataset. |
|
|
|
|
|
This approach leverages: |
|
|
- Prior knowledge of SDXL artifacts from the base model |
|
|
- Broader generalization from a large-scale real/fake face dataset |
|
|
- Efficient training on limited hardware (single RTX 3060) |
|
|
|
|
|
The result is a lightweight, high-accuracy detector optimized for **both SDXL and general diffusion-based deepfakes**. |
|
|
|
|
|
### Key Highlights |
|
|
- **Architecture**: Fine-tuned Vision Transformer (ViT) via Hugging Face `transformers` |
|
|
- **Dataset**: 140k balanced real/fake face images |
|
|
- **License**: [MIT](https://opensource.org/licenses/MIT) — free for research and commercial use |
|
|
- **Hardware**: Trained on a single NVIDIA RTX 3060 (12GB VRAM) — proving high impact doesn’t require massive resources |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Dependencies |
|
|
```bash |
|
|
pip install transformers torch pillow |
|
|
``` |
|
|
### Python Script |
|
|
```python |
|
|
#predict.py |
|
|
import argparse |
|
|
from transformers import AutoModelForImageClassification, AutoFeatureExtractor |
|
|
from PIL import Image |
|
|
import torch |
|
|
import os |
|
|
|
|
|
def main(): |
|
|
parser = argparse.ArgumentParser( |
|
|
description="Classify an image as 'artificial' or 'human' using the SDXL-Deepfake-Detector." |
|
|
) |
|
|
parser.add_argument("--image", type=str, required=True, help="Path to the input image file") |
|
|
args = parser.parse_args() |
|
|
|
|
|
# Validate image path |
|
|
if not os.path.isfile(args.image): |
|
|
raise FileNotFoundError(f"Image file not found: {args.image}") |
|
|
|
|
|
# Load model and feature extractor from Hugging Face Hub |
|
|
model_name = "SADRACODING/SDXL-Deepfake-Detector" |
|
|
print(f"Loading model '{model_name}'...") |
|
|
model = AutoModelForImageClassification.from_pretrained(model_name) |
|
|
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name) |
|
|
|
|
|
# Set device (GPU if available) |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model.to(device) |
|
|
model.eval() |
|
|
print(f"Running on device: {device}") |
|
|
|
|
|
# Load and preprocess image |
|
|
image = Image.open(args.image).convert("RGB") |
|
|
inputs = feature_extractor(images=image, return_tensors="pt").to(device) |
|
|
|
|
|
# Inference |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
|
|
|
logits = outputs.logits |
|
|
predicted_class_idx = logits.argmax(-1).item() |
|
|
predicted_label = model.config.id2label[predicted_class_idx] |
|
|
|
|
|
# Output |
|
|
print(f"Prediction Result") |
|
|
print(f"Class Index: {predicted_class_idx}") |
|
|
print(f"Label : {predicted_label}") |
|
|
|
|
|
if __name__ == "__main__": |
|
|
main() |
|
|
``` |
|
|
### How to use |
|
|
```bash |
|
|
python predict.py --image path/to/image |
|
|
``` |
|
|
|
|
|
## Performance & Limitations |
|
|
|
|
|
> **Note**: Final test accuracy will be reported after full evaluation. Preliminary results show strong generalization on SDXL- and diffusion-based face forgeries. |
|
|
|
|
|
### Known Limitations |
|
|
- Trained primarily on **frontal, well-lit, aligned face crops** — may underperform on: |
|
|
- Low-resolution or blurry images |
|
|
- Heavily occluded or non-frontal faces |
|
|
- GAN-generated faces (e.g., StyleGAN2/3) |
|
|
- Label mapping: |
|
|
- `0` → `"artificial"` (AI-generated / Deepfake) |
|
|
- `1` → `"human"` (authentic human face) |
|
|
|
|
|
> ⚠️ This tool is **not a forensic proof**, but a probabilistic detector. Use responsibly. |
|
|
|
|
|
--- |
|
|
|
|
|
## Philosophy & Ethics |
|
|
|
|
|
This model is open-source because: |
|
|
- **Transparency** is essential in the fight against synthetic media. |
|
|
- **Accessibility** ensures researchers, journalists, and civil society can audit and use detection tools without gatekeeping. |
|
|
- **Privacy matters**: The model runs **entirely offline** — your images never leave your device. |
|
|
|
|
|
As a developer from a vulnerable community, I believe AI safety tools must be **inclusive, ethical, and human-centered** — not just technically accurate. |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
- **Dataset**: [140k Real and Fake Faces](https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces) by xhlulu |
|
|
- **Framework**: [Hugging Face Transformers](https://huggingface.co/docs/transformers) |
|
|
- **Model & Code**: [GitHub Repository](https://github.com/SadraCoding/SDXL-Deepfake-Detector) | [Hugging Face Hub](https://huggingface.co/SADRACODING/SDXL-Deepfake-Detector) |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Contribute |
|
|
|
|
|
Fine-tune this model on your domain-specific data using Hugging Face `Trainer`. |
|
|
|
|
|
--- |
|
|
|
|
|
> *Built with curiosity, ethics, and a 12GB GPU — because impactful AI doesn’t require a data center, just purpose.* |
|
|
> — Sadra Milani Moghaddam |