| # π§ BERT-Spam-Job-Posting-Detection-Model | |
| A BERT-based binary classifier fine-tuned to detect whether a job posting is **fake** or **real**. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements. | |
| --- | |
| ## β¨ Model Highlights | |
| - π Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | |
| - π Fine-tuned on a custom dataset of job postings labeled as fake or real | |
| - β‘ Binary classification: Fake Job Posting vs Real Job Posting | |
| - πΎ Lightweight and optimized for CPU and GPU inference | |
| --- | |
| ## π§ Intended Uses | |
| - Automated detection of fraudulent job postings | |
| - Job board moderation and quality control | |
| - Enhancing recruitment platform security | |
| - Improving user trust in job marketplaces | |
| - Regulatory compliance monitoring for job ads | |
| --- | |
| ## π« Limitations | |
| - Trained primarily on English-language job postings | |
| - May underperform on postings from less-represented industries or regions | |
| - Not optimized for job descriptions longer than 128 tokens | |
| - Not suitable for multilingual or multimedia job posting content | |
| --- | |
| ## ποΈββοΈ Training Details | |
| | Field | Value | | |
| | -------------- | ----------------------------- | | |
| | **Base Model** | `bert-base-uncased` | | |
| | **Dataset** | Custom labeled job postings | | |
| | **Framework** | PyTorch with Transformers | | |
| | **Epochs** | 3 | | |
| | **Batch Size** | 16 | | |
| | **Max Length** | 128 tokens | | |
| | **Optimizer** | AdamW | | |
| | **Loss** | CrossEntropyLoss | | |
| | **Device** | CUDA-enabled GPU | | |
| --- | |
| ## π Evaluation Metrics | |
| | Metric | Score | | |
| | --------- | ------ | | |
| | Accuracy | 0.97 | | |
| | Precision | 0.81 | | |
| --- | |
| ## π Usage | |
| ```python | |
| from transformers import BertTokenizerFast, BertForSequenceClassification | |
| import torch | |
| model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model" | |
| tokenizer = BertTokenizerFast.from_pretrained(model_name) | |
| model = BertForSequenceClassification.from_pretrained(model_name) | |
| model.eval() | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model.to(device) | |
| def predict_with_bert(text): | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) | |
| device = next(model.parameters()).device # Get model device (cpu or cuda) | |
| inputs = {k: v.to(device) for k, v in inputs.items()} | |
| with torch.no_grad(): | |
| logits = model(**inputs).logits | |
| predicted_class_id = logits.argmax().item() | |
| return "Fake Job" if predicted_class_id == 1 else "Real Job" | |
| # Example | |
| print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now.")) | |
| print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python.")) | |
| ``` | |
| ## π Repository Structure | |
| ``` | |
| . | |
| βββ model/ # Quantized model files | |
| βββ tokenizer_config/ # Tokenizer and vocab files | |
| βββ model.safensors/ # Fine-tuned model in safetensors format | |
| βββ README.md # Model card | |
| ``` | |
| --- | |
| ## π€ Contributing | |
| Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue. |