DGA Transformer Encoder

A custom transformer-based model for detecting Domain Generation Algorithm (DGA) domains used in malware C2 infrastructure.

Model Details

Architecture: Custom Transformer Encoder (4 layers, 256 dimensions, 4 attention heads)
Parameters: 3.2M
Training Data: ExtraHop DGA dataset (500K balanced samples)
Performance: 96.78% F1 score on test set
Inference Speed: <1ms per domain (GPU), ~10ms (CPU)

Usage

from transformers import AutoModelForSequenceClassification
import torch

# Character encoding
CHARSET = "abcdefghijklmnopqrstuvwxyz0123456789-."
CHAR_TO_IDX = {c: i + 1 for i, c in enumerate(CHARSET)}
PAD = 0

def encode_domain(domain: str, max_len: int = 64):
    ids = [CHAR_TO_IDX.get(c, PAD) for c in domain.lower()]
    ids = ids[:max_len]
    ids = ids + [PAD] * (max_len - len(ids))
    return ids

# Load model
model = AutoModelForSequenceClassification.from_pretrained("ccss17/dga-transformer-encoder")
model.eval()

# Classify a domain
def predict(domain: str):
    input_ids = torch.tensor([encode_domain(domain, max_len=64)])
    with torch.no_grad():
        logits = model(input_ids).logits
        probs = torch.softmax(logits, dim=-1)
        pred = torch.argmax(probs).item()
    
    label = "Legitimate" if pred == 0 else "DGA (Malicious)"
    confidence = probs[0, pred].item()
    return label, confidence

# Examples
print(predict("google.com"))        # ('Legitimate', 0.998)
print(predict("xjkd8f2h.com"))      # ('DGA (Malicious)', 0.976)

Try it on HuggingFace Spaces

🚀 Interactive Demo

Training Details

Framework: PyTorch + HuggingFace Transformers
Optimizer: AdamW
Learning Rate: 3e-4 with linear warmup
Batch Size: 2048 (gradient accumulation)
Epochs: 5 (early stopping at epoch 2.4)
Loss: CrossEntropyLoss

Model Architecture

Input: Domain string (e.g., "google.com")
  ↓
Character Tokenization: [g, o, o, g, l, e, ., c, o, m]
  ↓
Embedding Layer: 256-dim vectors
  ↓
Positional Encoding: Add position information
  ↓
Transformer Encoder (4 layers):
  - Multi-head Self-Attention (4 heads)
  - Feed-Forward Network (1024 hidden)
  - Layer Normalization
  - Residual Connections
  ↓
[CLS] Token Pooling: Extract sequence representation
  ↓
Classification Head: Linear(256 → 2)
  ↓
Output: [P(Legitimate), P(DGA)]

Performance

Metric	Score
F1 Score (Macro)	96.78%
F1 Score (Binary)	96.78%
Accuracy	96.78%
Precision	96.5%
Recall	97.1%

Confusion Matrix (Test Set):

	Predicted Legit	Predicted DGA
True Legit	24,180	820
True DGA	790	24,210

Limitations

Trained primarily on English domains
May not generalize to all DGA families (e.g., dictionary-based DGAs)
Requires domain without protocol/path for best performance
~3% false positive rate

Citation

If you use this model, please cite:

@misc{dga-transformer-encoder,
  author = {ccss17},
  title = {DGA Transformer Encoder},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ccss17/dga-transformer-encoder}
}

References

License

MIT License

Built with ❤️ using PyTorch, HuggingFace Transformers, and Gradio

Downloads last month: 13

Safetensors

Model size

3.19M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ccss17/dga-transformer-encoder 1

Evaluation results

F1 Score on ExtraHop DGA Dataset
self-reported

0.968
Accuracy on ExtraHop DGA Dataset
self-reported

0.968

View on Papers With Code