Spaces:
Running
Running
Update index.html
Browse files- index.html +7 -3
index.html
CHANGED
|
@@ -104,9 +104,13 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 104 |
</p>
|
| 105 |
|
| 106 |
<h2 id="what-is-jailbreak">What is Jailbreak?</h2>
|
| 107 |
-
<p>
|
| 108 |
-
|
| 109 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
<div class="container">
|
| 112 |
<div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
|
|
|
|
| 104 |
</p>
|
| 105 |
|
| 106 |
<h2 id="what-is-jailbreak">What is Jailbreak?</h2>
|
| 107 |
+
<p>
|
| 108 |
+
Aligned Large Language Models (LLMs) have been shown to exhibit vulnerabilities to jailbreak attacks, which exploit token-level
|
| 109 |
+
or prompt-level manipulations to bypass and circumvent the safety guardrails embedded within these models. A notable example is that
|
| 110 |
+
a jailbroken LLM would be tricked into giving tutorials on how to cause harm to others. Jailbreak techniques often employ
|
| 111 |
+
sophisticated strategies, including but not limited to role-playing , instruction disguising , leading language , and the normalization
|
| 112 |
+
of illicit action, as illustrated in the examples below.
|
| 113 |
+
</p>
|
| 114 |
|
| 115 |
<div class="container">
|
| 116 |
<div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
|