ggmbr commited on
Commit
b8d2608
·
1 Parent(s): ce51710
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -62,15 +62,16 @@ the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/me
62
  This value can be interpreted as the ability to identify speakers only with timbral cues.
63
  Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of timbral cues.
64
 
 
 
 
65
  The table below provides the EER and threshold of the different [variants](#variants) of this model.
66
 
67
  | Variant name| EER (%) | threshold |
68
  | --- | --- | --- |
69
  | W-TBR | 1.68 | 0.472 |
70
  | WTA128 | 2.29 | 0.472 |
71
-
72
- A discussion about this interpretation can be
73
- found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
74
 
75
  Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
76
 
@@ -114,6 +115,7 @@ The table below provides a short description of the variants and their performan
114
  | --- | --- | --- | --- |
115
  | W-TBR | main | baseline, description in paper | 128 |
116
  | WTA128 | wta128 | enriched training dataset, more conversions | 128 |
 
117
 
118
 
119
  # License
 
62
  This value can be interpreted as the ability to identify speakers only with timbral cues.
63
  Tests between two utterances leading to a cosine similarity above the threshold should be considered as similar in terms of timbral cues.
64
 
65
+ A discussion about this interpretation can be
66
+ found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
67
+
68
  The table below provides the EER and threshold of the different [variants](#variants) of this model.
69
 
70
  | Variant name| EER (%) | threshold |
71
  | --- | --- | --- |
72
  | W-TBR | 1.68 | 0.472 |
73
  | WTA128 | 2.29 | 0.472 |
74
+ | WTA64 | 2.88 | 0.446 |
 
 
75
 
76
  Please note that the EER value can vary a little depending on the max_size defined to reduce long audios (max 30 seconds in our case).
77
 
 
115
  | --- | --- | --- | --- |
116
  | W-TBR | main | baseline, description in paper | 128 |
117
  | WTA128 | wta128 | enriched training dataset, more conversions | 128 |
118
+ | WTA64 | wta64 | enriched training dataset, more conversions | 64 |
119
 
120
 
121
  # License