DGA Transformer Encoder

A custom transformer-based model for detecting Domain Generation Algorithm (DGA) domains used in malware C2 infrastructure.

Model Details

  • Architecture: Custom Transformer Encoder (4 layers, 256 dimensions, 4 attention heads)
  • Parameters: 3.2M
  • Training Data: ExtraHop DGA dataset (500K balanced samples)
  • Performance: 96.78% F1 score on test set
  • Inference Speed: <1ms per domain (GPU), ~10ms (CPU)

Usage

from transformers import AutoModelForSequenceClassification
import torch

# Character encoding
CHARSET = "abcdefghijklmnopqrstuvwxyz0123456789-."
CHAR_TO_IDX = {c: i + 1 for i, c in enumerate(CHARSET)}
PAD = 0

def encode_domain(domain: str, max_len: int = 64):
    ids = [CHAR_TO_IDX.get(c, PAD) for c in domain.lower()]
    ids = ids[:max_len]
    ids = ids + [PAD] * (max_len - len(ids))
    return ids

# Load model
model = AutoModelForSequenceClassification.from_pretrained("ccss17/dga-transformer-encoder")
model.eval()

# Classify a domain
def predict(domain: str):
    input_ids = torch.tensor([encode_domain(domain, max_len=64)])
    with torch.no_grad():
        logits = model(input_ids).logits
        probs = torch.softmax(logits, dim=-1)
        pred = torch.argmax(probs).item()
    
    label = "Legitimate" if pred == 0 else "DGA (Malicious)"
    confidence = probs[0, pred].item()
    return label, confidence

# Examples
print(predict("google.com"))        # ('Legitimate', 0.998)
print(predict("xjkd8f2h.com"))      # ('DGA (Malicious)', 0.976)

Try it on HuggingFace Spaces

πŸš€ Interactive Demo

Training Details

  • Framework: PyTorch + HuggingFace Transformers
  • Optimizer: AdamW
  • Learning Rate: 3e-4 with linear warmup
  • Batch Size: 2048 (gradient accumulation)
  • Epochs: 5 (early stopping at epoch 2.4)
  • Loss: CrossEntropyLoss

Model Architecture

Input: Domain string (e.g., "google.com")
  ↓
Character Tokenization: [g, o, o, g, l, e, ., c, o, m]
  ↓
Embedding Layer: 256-dim vectors
  ↓
Positional Encoding: Add position information
  ↓
Transformer Encoder (4 layers):
  - Multi-head Self-Attention (4 heads)
  - Feed-Forward Network (1024 hidden)
  - Layer Normalization
  - Residual Connections
  ↓
[CLS] Token Pooling: Extract sequence representation
  ↓
Classification Head: Linear(256 β†’ 2)
  ↓
Output: [P(Legitimate), P(DGA)]

Performance

Metric Score
F1 Score (Macro) 96.78%
F1 Score (Binary) 96.78%
Accuracy 96.78%
Precision 96.5%
Recall 97.1%

Confusion Matrix (Test Set):

Predicted Legit Predicted DGA
True Legit 24,180 820
True DGA 790 24,210

Limitations

  • Trained primarily on English domains
  • May not generalize to all DGA families (e.g., dictionary-based DGAs)
  • Requires domain without protocol/path for best performance
  • ~3% false positive rate

Citation

If you use this model, please cite:

@misc{dga-transformer-encoder,
  author = {ccss17},
  title = {DGA Transformer Encoder},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ccss17/dga-transformer-encoder}
}

References

License

MIT License


Built with ❀️ using PyTorch, HuggingFace Transformers, and Gradio

Downloads last month
13
Safetensors
Model size
3.19M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using ccss17/dga-transformer-encoder 1

Evaluation results