Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -171,7 +171,21 @@ with tab1:
|
|
| 171 |
# st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
|
| 172 |
st.markdown('<div class="tab-content">', unsafe_allow_html=True)
|
| 173 |
|
| 174 |
-
st.markdown('
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
st.markdown('@Farima populate here')
|
| 176 |
|
| 177 |
st.markdown("""
|
|
|
|
| 171 |
# st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
|
| 172 |
st.markdown('<div class="tab-content">', unsafe_allow_html=True)
|
| 173 |
|
| 174 |
+
st.markdown('Metrics Explanation')
|
| 175 |
+
st.markdown( '''
|
| 176 |
+
<div class="metric">
|
| 177 |
+
<br/>
|
| 178 |
+
<p style="font-size:16px;">
|
| 179 |
+
<strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> measures the degree of incorrect or inconclusive content units in model response, with details provided in the paper. We also provide statistics on the average number of unsupported unit (<strong>Avg. Unsupported</strong>), average number of units labelled as undecided (<strong>Avg. Undecided</strong>), Average length of response in terms of the number of tokens, and the average verifiable units existing in the model responses.
|
| 180 |
+
</p>
|
| 181 |
+
<p style="font-size:16px;">
|
| 182 |
+
🔒 for closed LLMs; 🔑 for open-weights LLMs; 🚨 for newly added models"
|
| 183 |
+
</p>
|
| 184 |
+
</div>
|
| 185 |
+
''',
|
| 186 |
+
unsafe_allow_html=True
|
| 187 |
+
)
|
| 188 |
+
|
| 189 |
st.markdown('@Farima populate here')
|
| 190 |
|
| 191 |
st.markdown("""
|