Spaces:
Running
Running
Update index.html
Browse files- index.html +1 -1
index.html
CHANGED
|
@@ -118,7 +118,7 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 118 |
</div>
|
| 119 |
</div>
|
| 120 |
|
| 121 |
-
<h2 id="refusal-loss">Interpretability</h2>
|
| 122 |
<p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
|
| 123 |
autoregressive sampling-based generation. With this randomness, it is an
|
| 124 |
interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
|
|
|
|
| 118 |
</div>
|
| 119 |
</div>
|
| 120 |
|
| 121 |
+
<h2 id="refusal-loss">Token Highlighter: Principle and Interpretability</h2>
|
| 122 |
<p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
|
| 123 |
autoregressive sampling-based generation. With this randomness, it is an
|
| 124 |
interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
|