DGA Transformer Encoder
A custom transformer-based model for detecting Domain Generation Algorithm (DGA) domains used in malware C2 infrastructure.
Model Details
- Architecture: Custom Transformer Encoder (4 layers, 256 dimensions, 4 attention heads)
- Parameters: 3.2M
- Training Data: ExtraHop DGA dataset (500K balanced samples)
- Performance: 96.78% F1 score on test set
- Inference Speed: <1ms per domain (GPU), ~10ms (CPU)
Usage
from transformers import AutoModelForSequenceClassification
import torch
# Character encoding
CHARSET = "abcdefghijklmnopqrstuvwxyz0123456789-."
CHAR_TO_IDX = {c: i + 1 for i, c in enumerate(CHARSET)}
PAD = 0
def encode_domain(domain: str, max_len: int = 64):
ids = [CHAR_TO_IDX.get(c, PAD) for c in domain.lower()]
ids = ids[:max_len]
ids = ids + [PAD] * (max_len - len(ids))
return ids
# Load model
model = AutoModelForSequenceClassification.from_pretrained("ccss17/dga-transformer-encoder")
model.eval()
# Classify a domain
def predict(domain: str):
input_ids = torch.tensor([encode_domain(domain, max_len=64)])
with torch.no_grad():
logits = model(input_ids).logits
probs = torch.softmax(logits, dim=-1)
pred = torch.argmax(probs).item()
label = "Legitimate" if pred == 0 else "DGA (Malicious)"
confidence = probs[0, pred].item()
return label, confidence
# Examples
print(predict("google.com")) # ('Legitimate', 0.998)
print(predict("xjkd8f2h.com")) # ('DGA (Malicious)', 0.976)
Try it on HuggingFace Spaces
π Interactive Demo
Training Details
- Framework: PyTorch + HuggingFace Transformers
- Optimizer: AdamW
- Learning Rate: 3e-4 with linear warmup
- Batch Size: 2048 (gradient accumulation)
- Epochs: 5 (early stopping at epoch 2.4)
- Loss: CrossEntropyLoss
Model Architecture
Input: Domain string (e.g., "google.com")
β
Character Tokenization: [g, o, o, g, l, e, ., c, o, m]
β
Embedding Layer: 256-dim vectors
β
Positional Encoding: Add position information
β
Transformer Encoder (4 layers):
- Multi-head Self-Attention (4 heads)
- Feed-Forward Network (1024 hidden)
- Layer Normalization
- Residual Connections
β
[CLS] Token Pooling: Extract sequence representation
β
Classification Head: Linear(256 β 2)
β
Output: [P(Legitimate), P(DGA)]
Performance
| Metric | Score |
|---|---|
| F1 Score (Macro) | 96.78% |
| F1 Score (Binary) | 96.78% |
| Accuracy | 96.78% |
| Precision | 96.5% |
| Recall | 97.1% |
Confusion Matrix (Test Set):
| Predicted Legit | Predicted DGA | |
|---|---|---|
| True Legit | 24,180 | 820 |
| True DGA | 790 | 24,210 |
Limitations
- Trained primarily on English domains
- May not generalize to all DGA families (e.g., dictionary-based DGAs)
- Requires domain without protocol/path for best performance
- ~3% false positive rate
Citation
If you use this model, please cite:
@misc{dga-transformer-encoder,
author = {ccss17},
title = {DGA Transformer Encoder},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/ccss17/dga-transformer-encoder}
}
References
License
MIT License
Built with β€οΈ using PyTorch, HuggingFace Transformers, and Gradio
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Space using ccss17/dga-transformer-encoder 1
Evaluation results
- F1 Score on ExtraHop DGA Datasetself-reported0.968
- Accuracy on ExtraHop DGA Datasetself-reported0.968