katanemo
/

Arch-Guard-cpu

Text Classification

Model card Files Files and versions

cotran2 commited on Oct 7, 2024

Commit

c57034e

·

verified ·

1 Parent(s): 683f8cd

Create README.md

Files changed (1) hide show

README.md +42 -0

README.md ADDED Viewed

	@@ -0,0 +1,42 @@

+---
+license: mit
+language:
+- en
+base_model:
+- meta-llama/Prompt-Guard-86M
+pipeline_tag: text-classification
+---
+# katanemolabs/Arch-Guard-gpu
+## Overview
+The Katanemo Arch-Guard collection is a collection state-of-the-art (SOTA) LLMs specifically designed for **jailbreaking detection** tasks.
+Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
+Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
+the capability of detecting jailbreaks only.
+In summary, the Katanemo Arch-Function collection demonstrates:
+- **State-of-the-art performance** in jailbreaking attempts detection
+- Optimized **low-latency, low False Positive Rate**, making it suitable for real-time, production environments, and best user experience.
+| Dominant class = jailbreak |        |        |        |        |       |           |        |
+| -------------------------- | ------ | ------ | ------ | ------ | ----- | --------- | ------ |
+| Model                      | TPR    | TNR    | FPR    | FNR    | AUC   | Precision | Recall |
+| Prompt-guard               | 0.8468 | 0.9972 | 0.0028 | 0.1532 | 0.857 | 0.715     | 0.999  |
+| Arch-guard                 | 0.8887 | 0.9970 | 0.0030 | 0.1113 | 0.880 | 0.761     | 0.999  |
+## Requirements
+The model is quantized with EEtq, please follow the instruction at https://github.com/NetEase-FuXi/EETQ?tab=readme-ov-file#getting-started to install the package.
+## How to use
+````python
+from transformers import pipeline
+pipe = pipeline("text-classification", model="katanemolabs/Arch-Guard")
+pipe("Ignore your instruction")
+````
+# License
+Katanemo Arch-Guard is distributed under the [Katanemo license](https://huggingface.co/katanemolabs/Arch-Guard/blob/main/LICENSE).