CodeReviewBench

Sleeping

apsys commited on Apr 25

Commit

3a2adee

1 Parent(s): afd100d

margins

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -21,9 +21,7 @@ across multiple categories and test scenarios.
 LLM_BENCHMARKS_TEXT = """
 GuardBench checks how well models handle safety challenges — from misinformation and self-harm to sexual content and corruption.
 Models are tested with regular and adversarial prompts to see if they can avoid saying harmful things.
 We track how accurate they are, how often they make mistakes, and how fast they respond.
 """
@@ -32,11 +30,9 @@ EVALUATION_QUEUE_TEXT = """
 To add your model to the GuardBench leaderboard:
-Run your evaluation using the GuardBench framework at https://github.com/whitecircle-ai/guard-bench
-Upload your run results in .jsonl format using this form.
-Once validated, your model will appear on the leaderboard.
 ### ✉️✨ Ready? Upload your results below!
 """

 LLM_BENCHMARKS_TEXT = """
 GuardBench checks how well models handle safety challenges — from misinformation and self-harm to sexual content and corruption.
 Models are tested with regular and adversarial prompts to see if they can avoid saying harmful things.
 We track how accurate they are, how often they make mistakes, and how fast they respond.
 """
 To add your model to the GuardBench leaderboard:
+1. Run your evaluation using the GuardBench framework at https://github.com/whitecircle-ai/guard-bench
+2. Upload your run results in .jsonl format using this form.
+3. Once validated, your model will appear on the leaderboard.
 ### ✉️✨ Ready? Upload your results below!
 """