Spaces:
Runtime error
Runtime error
Commit
·
ee82c0f
1
Parent(s):
0d196a8
Update README.md
Browse files
README.md
CHANGED
|
@@ -5,11 +5,14 @@ datasets:
|
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
-
description:
|
|
|
|
|
|
|
| 9 |
sdk: gradio
|
| 10 |
sdk_version: 3.19.1
|
| 11 |
app_file: app.py
|
| 12 |
pinned: false
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Metric Card for relation_extraction evalutation
|
|
@@ -31,16 +34,14 @@ This metric takes 2 inputs, prediction and references(ground truth). Both of the
|
|
| 31 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
| 32 |
... ]
|
| 33 |
... ]
|
| 34 |
-
|
| 35 |
>>> predictions = [
|
| 36 |
... [
|
| 37 |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
| 38 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
| 39 |
... ]
|
| 40 |
... ]
|
| 41 |
-
|
| 42 |
-
>>> evaluation_scores
|
| 43 |
-
>>> print(evaluation_scores)
|
| 44 |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
|
| 45 |
```
|
| 46 |
|
|
@@ -126,10 +127,17 @@ Example with two or more prediction and reference:
|
|
| 126 |
```
|
| 127 |
|
| 128 |
## Limitations and Bias
|
| 129 |
-
This metric has
|
| 130 |
|
| 131 |
## Citation
|
| 132 |
-
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
## Further References
|
| 135 |
-
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
+
description: >-
|
| 9 |
+
This metric is used for evaluating the F1 accuracy of input references and
|
| 10 |
+
predictions.
|
| 11 |
sdk: gradio
|
| 12 |
sdk_version: 3.19.1
|
| 13 |
app_file: app.py
|
| 14 |
pinned: false
|
| 15 |
+
license: apache-2.0
|
| 16 |
---
|
| 17 |
|
| 18 |
# Metric Card for relation_extraction evalutation
|
|
|
|
| 34 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
| 35 |
... ]
|
| 36 |
... ]
|
|
|
|
| 37 |
>>> predictions = [
|
| 38 |
... [
|
| 39 |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
| 40 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
| 41 |
... ]
|
| 42 |
... ]
|
| 43 |
+
>>> evaluation_scores = module.compute(predictions=predictions, references=references)
|
| 44 |
+
>>> print(evaluation_scores)
|
|
|
|
| 45 |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
|
| 46 |
```
|
| 47 |
|
|
|
|
| 127 |
```
|
| 128 |
|
| 129 |
## Limitations and Bias
|
| 130 |
+
This metric has strict filter mechanism, if any of the prediction's entity names, such as head, head_type, type, tail, or tail_type, is not exactly the same as the reference one. It will count as fp or fn.
|
| 131 |
|
| 132 |
## Citation
|
| 133 |
+
```bibtex
|
| 134 |
+
@Paper{
|
| 135 |
+
author = {Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari},
|
| 136 |
+
title = {Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!},
|
| 137 |
+
year = {2020},
|
| 138 |
+
}
|
| 139 |
+
*https://arxiv.org/abs/2009.10684*
|
| 140 |
+
```
|
| 141 |
## Further References
|
| 142 |
+
This evaluation metric implementation uses
|
| 143 |
+
*https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py*
|