variants
Browse files
README.md
CHANGED
|
@@ -62,15 +62,16 @@ the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/me
|
|
| 62 |
This value can be interpreted as the ability to identify speakers only with timbral cues.
|
| 63 |
Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of timbral cues.
|
| 64 |
|
|
|
|
|
|
|
|
|
|
| 65 |
The table below provides the EER and threshold of the different [variants](#variants) of this model.
|
| 66 |
|
| 67 |
| Variant name| EER (%) | threshold |
|
| 68 |
| --- | --- | --- |
|
| 69 |
| W-TBR | 1.68 | 0.472 |
|
| 70 |
| WTA128 | 2.29 | 0.472 |
|
| 71 |
-
|
| 72 |
-
A discussion about this interpretation can be
|
| 73 |
-
found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
|
| 74 |
|
| 75 |
Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
|
| 76 |
|
|
@@ -114,6 +115,7 @@ The table below provides a short description of the variants and their performan
|
|
| 114 |
| --- | --- | --- | --- |
|
| 115 |
| W-TBR | main | baseline, description in paper | 128 |
|
| 116 |
| WTA128 | wta128 | enriched training dataset, more conversions | 128 |
|
|
|
|
| 117 |
|
| 118 |
|
| 119 |
# License
|
|
|
|
| 62 |
This value can be interpreted as the ability to identify speakers only with timbral cues.
|
| 63 |
Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of timbral cues.
|
| 64 |
|
| 65 |
+
A discussion about this interpretation can be
|
| 66 |
+
found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
|
| 67 |
+
|
| 68 |
The table below provides the EER and threshold of the different [variants](#variants) of this model.
|
| 69 |
|
| 70 |
| Variant name| EER (%) | threshold |
|
| 71 |
| --- | --- | --- |
|
| 72 |
| W-TBR | 1.68 | 0.472 |
|
| 73 |
| WTA128 | 2.29 | 0.472 |
|
| 74 |
+
| WTA64 | 2.88 | 0.446 |
|
|
|
|
|
|
|
| 75 |
|
| 76 |
Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
|
| 77 |
|
|
|
|
| 115 |
| --- | --- | --- | --- |
|
| 116 |
| W-TBR | main | baseline, description in paper | 128 |
|
| 117 |
| WTA128 | wta128 | enriched training dataset, more conversions | 128 |
|
| 118 |
+
| WTA64 | wta64 | enriched training dataset, more conversions | 64 |
|
| 119 |
|
| 120 |
|
| 121 |
# License
|