--- library_name: transformers tags: - spam - text-classificatiom --- Spam Classifier model - fine-tuned ModernBERT with max_length = 1024 in training. Trained for 5 epochs - best model selected from traning step ~600/epoch 2-3. Model was trained with the 'sharecreative/spam_classifier_february_2025' dataset. ``` DatasetDict({ train: Dataset({ features: ['corpus_id', 'text', 'labels', 'split'], num_rows: 8432 }) test: Dataset({ features: ['corpus_id', 'text', 'labels', 'split'], num_rows: 942 }) validation: Dataset({ features: ['corpus_id', 'text', 'labels', 'split'], num_rows: 924 }) }) ``` Metrics: ``` { "epoch": 2.2727272727272725, "eval_eval/0_f1-score": 0.9234567901234568, "eval_eval/0_precision": 0.9396984924623115, "eval_eval/0_recall": 0.9077669902912622, "eval_eval/0_support": 412.0, "eval_eval/1_f1-score": 0.9422718808193669, "eval_eval/1_precision": 0.9301470588235294, "eval_eval/1_recall": 0.9547169811320755, "eval_eval/1_support": 530.0, "eval_eval/accuracy": 0.9341825902335457, "eval_eval/macro avg_f1-score": 0.9328643354714119, "eval_eval/macro avg_precision": 0.9349227756429205, "eval_eval/macro avg_recall": 0.9312419857116688, "eval_eval/macro avg_support": 942.0, "eval_eval/mps_allocated_gb": 1.822583552, "eval_eval/mps_reserved_gb": 143.440986112, "eval_eval/weighted avg_f1-score": 0.9340427753345314, "eval_eval/weighted avg_precision": 0.934324543599727, "eval_eval/weighted avg_recall": 0.9341825902335457, "eval_eval/weighted avg_support": 942.0, "eval_loss": 0.23617032170295715, "eval_runtime": 42.6005, "eval_samples_per_second": 22.112, "eval_steps_per_second": 1.385, "step": 600 } ``` ~92/93% on key metrics