Orange
/

Speaker-wavLM-tbr

🇪🇺 Region: EU

Model card Files Files and versions

ggmbr commited on Sep 8

Commit

b8d2608

·

1 Parent(s): ce51710

variants

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -62,15 +62,16 @@ the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/me
 This value can be interpreted as the ability to identify speakers only with timbral cues.
 Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of timbral cues.
 The table below provides the EER and threshold of the different [variants](#variants) of this model.
 | Variant name| EER (%) | threshold |
 | --- | --- | --- |
 | W-TBR   | 1.68 | 0.472 |
 | WTA128 | 2.29 | 0.472 |
-A discussion about this interpretation can be
-found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
 Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
@@ -114,6 +115,7 @@ The table below provides a short description of the variants and their performan
 | --- | --- | --- | --- |
 | W-TBR | main    | baseline, description in paper | 128 |
 | WTA128 | wta128 | enriched training dataset, more conversions | 128  |
 # License

 This value can be interpreted as the ability to identify speakers only with timbral cues.
 Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of timbral cues.
+A discussion about this interpretation can be
+found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
 The table below provides the EER and threshold of the different [variants](#variants) of this model.
 | Variant name| EER (%) | threshold |
 | --- | --- | --- |
 | W-TBR   | 1.68 | 0.472 |
 | WTA128 | 2.29 | 0.472 |
+| WTA64 | 2.88 | 0.446 |
 Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
 | --- | --- | --- | --- |
 | W-TBR | main    | baseline, description in paper | 128 |
 | WTA128 | wta128 | enriched training dataset, more conversions | 128  |
+| WTA64 | wta64 | enriched training dataset, more conversions | 64  |
 # License