🐾 PurrBERT-v1.1

PurrBERT-v1.1 is a lightweight content-safety classifier built on top of DistilBERT.
It’s designed to flag harmful or unsafe user prompts before they reach an AI assistant.

This model is trained on a combination of:

📝 Model Description

Architecture: DistilBERT with a classification head (2 labels: SAFE vs. FLAGGED)
Purpose: Detect hate speech, toxic content, and unsafe prompts in English text.
Input: A single string (prompt text).
Output: A binary prediction:
- 0 → SAFE
- 1 → FLAGGED

🧠 Training Details

Base model: distilbert-base-uncased
Epochs: 2 (initial run)
Optimizer: AdamW
Batch size: 16
Learning rate: 2e-5
Weight decay: 0.01

Loss dropped steadily during training, and metrics were evaluated on a held-out test set.

📊 Evaluation Results

On an Aegis test slice:

Metric	v1	v1.1
Accuracy	0.8050	0.8200
Precision	0.7731	0.8091
Recall	0.8846	0.8558
F1 Score	0.8251	0.8318

Latency per prompt on GPU: ~0.0230 sec

🚀 Usage

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

# Load trained model and tokenizer
model = DistilBertForSequenceClassification.from_pretrained("purrgpt-community/purrbert-v1.1")
tokenizer = DistilBertTokenizerFast.from_pretrained("purrgpt-community/purrbert-v1.1")
model.eval()

def classify_prompt(prompt):
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        pred = torch.argmax(outputs.logits, dim=-1).item()
    return "SAFE" if pred == 0 else "FLAGGED"

print(classify_prompt("You are worthless and nobody likes you!"))
# → FLAGGED

⚠️ Limitations & Bias

The model is trained primarily on English datasets.
It may produce false positives on edgy but non-harmful speech, or false negatives on subtle harms.
It reflects biases present in its training datasets.

🐾 Intended Use

PurrBERT is intended for moderating prompts before they’re passed to AI models or for content-safety tasks. It is not a replacement for professional moderation in high-risk settings.

Downloads last month: 24

Safetensors

Model size

67M params

Tensor type

F32

Model tree for purrgpt-community/PurrBERT-v1.1

Base model

distilbert/distilbert-base-uncased

Finetuned

(10045)

this model

Datasets used to train purrgpt-community/PurrBERT-v1.1

Spaces using purrgpt-community/PurrBERT-v1.1 2

Collection including purrgpt-community/PurrBERT-v1.1

PurrBERT

Collection

Our BERT base prompt guardian. • 2 items • Updated about 24 hours ago