selfconstruct3d
/

AttackGroup-MPNET

Feature Extraction

Model card Files Files and versions

selfconstruct3d commited on Mar 8

Commit

dfd5c40

·

verified ·

1 Parent(s): a354664

Update README.md

Files changed (1) hide show

README.md +44 -14

README.md CHANGED Viewed

@@ -53,33 +53,63 @@ Always verify predictions with cybersecurity analysts before using in critical d
 ## How to Get Started with the Model
 ```python
-from transformers import AutoTokenizer, MPNetModel
 import torch
-model_name = "mpnet_classification_finetuned_v2"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = MPNetModel.from_pretrained(model_name)
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-model.to(device)
-# Example inference
-sentence = "APT38 has used phishing emails with malicious links to distribute malware."
-inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding="max_length", max_length=128).to(device)
-with torch.no_grad():
-    outputs = model(**inputs)
-    cls_embedding = outputs.last_hidden_state[:, 0, :]
-    predicted_class = classifier_model.classifier(cls_embedding).argmax(dim=1).cpu().item()
 print(f"Predicted GroupID: {predicted_class}")
 ```
 ## Training Details
 ### Training Data
-The training dataset comprises balanced textual descriptions of various cybersecurity threat groups' TTPs, augmented through synonym replacement to increase diversity.
 ### Training Procedure

 ## How to Get Started with the Model
 ```python
 import torch
+import torch.nn as nn
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch.optim as optim
+import numpy as np
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load explicitly your fine-tuned MPNet model
+classifier_model = AutoModelForSequenceClassification.from_pretrained("selfconstruct3d/AttackGroup-MPNET").to(device)
+# Load explicitly your tokenizer
+tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d/AttackGroup-MPNET")
+from huggingface_hub import hf_hub_download
+import json
+label_to_groupid_file = hf_hub_download(
+    repo_id="selfconstruct3d/AttackGroup-MPNET",
+    filename="label_to_groupid.json"
+)
+with open(label_to_groupid_file, "r") as f:
+    label_to_groupid = json.load(f)
+def predict_group(sentence):
+    classifier_model.eval()
+    encoding = tokenizer(
+        sentence,
+        truncation=True,
+        padding="max_length",
+        max_length=128,
+        return_tensors="pt"
+    )
+    input_ids = encoding["input_ids"].to(device)
+    attention_mask = encoding["attention_mask"].to(device)
+    with torch.no_grad():
+        outputs = classifier_model(input_ids=input_ids, attention_mask=attention_mask)
+        logits = outputs.logits
+        predicted_label = torch.argmax(logits, dim=1).cpu().item()
+    predicted_groupid = label_to_groupid[str(predicted_label)]
+    return predicted_groupid
+# Example usage explicitly:
+sentence = "APT38 has used phishing emails with malicious links to distribute malware."
+predicted_class = predict_group(sentence)
 print(f"Predicted GroupID: {predicted_class}")
 ```
+Predicted GroupID: G0001
 ## Training Details
 ### Training Data
+To be anounced...
 ### Training Procedure