AventIQ-AI
/

BERT-Spam-Job-Posting-Detection-Model

Model card Files Files and versions

BERT-Spam-Job-Posting-Detection-Model / README.md

vishal1364's picture

Create README.md

16a5de0 verified 7 months ago

|

history blame contribute delete

3.29 kB

	# 🧠 BERT-Spam-Job-Posting-Detection-Model

	A BERT-based binary classifier fine-tuned to detect whether a job posting is fake or real. Ideal for job portals, recruitment platforms, and fraud detection in job advertisements.

	---

	## ✨ Model Highlights

	- 📌 Based on [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)
	- 🔍 Fine-tuned on a custom dataset of job postings labeled as fake or real
	- ⚡ Binary classification: Fake Job Posting vs Real Job Posting
	- 💾 Lightweight and optimized for CPU and GPU inference

	---

	## 🧠 Intended Uses

	- Automated detection of fraudulent job postings
	- Job board moderation and quality control
	- Enhancing recruitment platform security
	- Improving user trust in job marketplaces
	- Regulatory compliance monitoring for job ads

	---

	## 🚫 Limitations

	- Trained primarily on English-language job postings
	- May underperform on postings from less-represented industries or regions
	- Not optimized for job descriptions longer than 128 tokens
	- Not suitable for multilingual or multimedia job posting content

	---

	## 🏋️‍♂️ Training Details

	\| Field \| Value \|
	\| -------------- \| ----------------------------- \|
	\| Base Model \| `bert-base-uncased` \|
	\| Dataset \| Custom labeled job postings \|
	\| Framework \| PyTorch with Transformers \|
	\| Epochs \| 3 \|
	\| Batch Size \| 16 \|
	\| Max Length \| 128 tokens \|
	\| Optimizer \| AdamW \|
	\| Loss \| CrossEntropyLoss \|
	\| Device \| CUDA-enabled GPU \|

	---

	## 📊 Evaluation Metrics

	\| Metric \| Score \|
	\| --------- \| ------ \|
	\| Accuracy \| 0.97 \|
	\| Precision \| 0.81 \|

	---

	## 🚀 Usage

	```python
	from transformers import BertTokenizerFast, BertForSequenceClassification
	import torch

	model_name = "AventIQ-AI/BERT-Spam-Job-Posting-Detection-Model"
	tokenizer = BertTokenizerFast.from_pretrained(model_name)
	model = BertForSequenceClassification.from_pretrained(model_name)
	model.eval()

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)

	def predict_with_bert(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
	device = next(model.parameters()).device # Get model device (cpu or cuda)
	inputs = {k: v.to(device) for k, v in inputs.items()}
	with torch.no_grad():
	logits = model(**inputs).logits

	predicted_class_id = logits.argmax().item()
	return "Fake Job" if predicted_class_id == 1 else "Real Job"

	# Example
	print(predict_with_bert("Hiring remote data entry clerk for a large online project. Apply now."))
	print(predict_with_bert("Looking for a Software Engineer with 5+ years of experience in Python."))
	```
	## 🗂 Repository Structure
	```
	.
	├── model/ # Quantized model files
	├── tokenizer_config/ # Tokenizer and vocab files
	├── model.safensors/ # Fine-tuned model in safetensors format
	├── README.md # Model card

	```
	---
	## 🤝 Contributing
	Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue.