Update README.md
Browse files
README.md
CHANGED
|
@@ -9,6 +9,11 @@ language:
|
|
| 9 |
|
| 10 |
This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
## 📦 Model Details
|
| 13 |
|
| 14 |
- **Architecture**: `MLPClassifier` with hidden layers `(128, 64)`
|
|
@@ -52,7 +57,7 @@ X_train_emb = embedder.encode(X_train.tolist(), convert_to_numpy=True, show_prog
|
|
| 52 |
X_test_emb = embedder.encode(X_test.tolist(), convert_to_numpy=True, show_progress_bar=True)
|
| 53 |
|
| 54 |
# Load the trained classifier
|
| 55 |
-
model_path = hf_hub_download(repo_id="selfconstruct3d/
|
| 56 |
model = joblib.load(model_path)
|
| 57 |
|
| 58 |
# Predict
|
|
|
|
| 9 |
|
| 10 |
This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts.
|
| 11 |
|
| 12 |
+
## 📊 Training Data
|
| 13 |
+
|
| 14 |
+
The model was trained on a multilingual dataset of cybersecurity and non-cybersecurity news articles. The dataset is publicly available on Zenodo:
|
| 15 |
+
🔗 [https://zenodo.org/records/16417939](https://zenodo.org/records/16417939)
|
| 16 |
+
|
| 17 |
## 📦 Model Details
|
| 18 |
|
| 19 |
- **Architecture**: `MLPClassifier` with hidden layers `(128, 64)`
|
|
|
|
| 57 |
X_test_emb = embedder.encode(X_test.tolist(), convert_to_numpy=True, show_progress_bar=True)
|
| 58 |
|
| 59 |
# Load the trained classifier
|
| 60 |
+
model_path = hf_hub_download(repo_id="selfconstruct3d/cybersec_classifier", filename="cybersec_classifier.pkl")
|
| 61 |
model = joblib.load(model_path)
|
| 62 |
|
| 63 |
# Predict
|