Updated leaderboard description
Browse files- src/about.py +1 -1
    	
        src/about.py
    CHANGED
    
    | @@ -101,7 +101,7 @@ TITLE = """<h1 align="center" id="space-title">🚀 EVALITA-LLM Leaderboard 🚀 | |
| 101 |  | 
| 102 | 
             
            # What does your leaderboard evaluate?
         | 
| 103 | 
             
            INTRODUCTION_TEXT = """
         | 
| 104 | 
            -
            Evalita-LLM is a benchmark designed to evaluate Large Language Models (LLMs) on Italian tasks. The distinguishing features of Evalita-LLM are the following: (i) **all tasks are native Italian**, avoiding translation issues and potential cultural biases; (ii) the benchmark includes **generative** tasks, enabling more natural interaction with LLMs; (iii) **all tasks are evaluated against multiple prompts**, this way mitigating the model sensitivity to specific prompts and allowing a fairer evaluation.
         | 
| 105 |  | 
| 106 | 
             
            **<small>Multiple-choice tasks:</small>** <small> 📊TE (Textual Entailment), 😃SA (Sentiment Analysis), ⚠️HS (Hate Speech Detection), 🏥AT (Admission Test), 🔤WIC (Word in Context), ❓FAQ (Frequently Asked Questions) </small><br>
         | 
| 107 | 
             
            **<small>Generative tasks:</small>** <small>🔄LS (Lexical Substitution), 📝SU (Summarization), 🏷️NER (Named Entity Recognition), 🔗REL (Relation Extraction) </small>
         | 
|  | |
| 101 |  | 
| 102 | 
             
            # What does your leaderboard evaluate?
         | 
| 103 | 
             
            INTRODUCTION_TEXT = """
         | 
| 104 | 
            +
            Evalita-LLM is a benchmark designed to evaluate Large Language Models (LLMs) on Italian tasks. The distinguishing features of Evalita-LLM are the following: (i) **all tasks are native Italian**, avoiding translation issues and potential cultural biases; (ii) the benchmark includes **generative** tasks, enabling more natural interaction with LLMs; (iii) **all tasks are evaluated against multiple prompts**, this way mitigating the model sensitivity to specific prompts and allowing a fairer evaluation. To provide a comprehensive evaluation, Evalita-LLM leverages the **Combined Performance Score (CPS)**, which combines both the peak and average performance of models across multiple prompts.
         | 
| 105 |  | 
| 106 | 
             
            **<small>Multiple-choice tasks:</small>** <small> 📊TE (Textual Entailment), 😃SA (Sentiment Analysis), ⚠️HS (Hate Speech Detection), 🏥AT (Admission Test), 🔤WIC (Word in Context), ❓FAQ (Frequently Asked Questions) </small><br>
         | 
| 107 | 
             
            **<small>Generative tasks:</small>** <small>🔄LS (Lexical Substitution), 📝SU (Summarization), 🏷️NER (Named Entity Recognition), 🔗REL (Relation Extraction) </small>
         |