Multilingual Toxicity Classifier for 15 Languages (2025)
This is an instance of bert-base-multilingual-cased that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.
Now, the models covers 15 languages from various language families:
| Language | Code | F1 Score | 
|---|---|---|
| English | en | 0.9035 | 
| Russian | ru | 0.9224 | 
| Ukrainian | uk | 0.9461 | 
| German | de | 0.5181 | 
| Spanish | es | 0.7291 | 
| Arabic | ar | 0.5139 | 
| Amharic | am | 0.6316 | 
| Hindi | hi | 0.7268 | 
| Chinese | zh | 0.6703 | 
| Italian | it | 0.6485 | 
| French | fr | 0.9125 | 
| Hinglish | hin | 0.6850 | 
| Hebrew | he | 0.8686 | 
| Japanese | ja | 0.8644 | 
| Tatar | tt | 0.6170 | 
How to use
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
model = AutoModelForSequenceClassification.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
batch = tokenizer.encode("You are amazing!", return_tensors="pt")
output = model(batch)
# idx 0 for neutral, idx 1 for toxic
Citation
The model is prepared for TextDetox 2025 Shared Task evaluation.
Citation TBD soon.
- Downloads last month
- 1,236
Model tree for textdetox/bert-multilingual-toxicity-classifier
Base model
google-bert/bert-base-multilingual-cased