Spaces:

llmonitor
/

benchmarks

Build error

vincelwt commited on Oct 8, 2023

Commit

12e3264

unverified ·

1 Parent(s): 8cd07d9

typos

Files changed (1) hide show

app/page.js CHANGED Viewed

@@ -9,22 +9,22 @@ export default async function Leaderboard() {
     <>
       <p>
         Traditional LLMs benchmarks have drawbacks: they quickly become part of
-        training datasets and are hard to relate-to in terms of real-world
         use-cases.
       </p>
       <p>
-        I made this as an experiment to address these issues. Here the dataset
         is dynamic (changes every week) and composed of crowdsourced real-world
         prompts.
       </p>
       <p>
         We then use GPT-4 to grade each model's response against a set of
         rubrics (more details on the about page). The prompt dataset is easily
-        explorable as the score is only 1 dimension.
       </p>
       <p>
-        The results are stored in Postgres database and those are the raw
-        results.
       </p>
       <br />

     <>
       <p>
         Traditional LLMs benchmarks have drawbacks: they quickly become part of
+        training datasets and are hard to relate to in terms of real-world
         use-cases.
       </p>
       <p>
+        I made this as an experiment to address these issues. Here, the dataset
         is dynamic (changes every week) and composed of crowdsourced real-world
         prompts.
       </p>
       <p>
         We then use GPT-4 to grade each model's response against a set of
         rubrics (more details on the about page). The prompt dataset is easily
+        explorable.
       </p>
       <p>
+        Everything is then stored in a Postgres database and this page shows the
+        raw results.
       </p>
       <br />