speechbrain
/

google_speech_command_xvector

@@ -3,38 +3,53 @@ language: "en"
 thumbnail:
 tags:
 - embeddings
-- Speaker
-- Verification
-- Identification
 - pytorch
 - xvectors
 - TDNN
 license: "apache-2.0"
 datasets:
-- voxceleb
 metrics:
-- EER
-- min_dct
 ---
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
-# Speaker Verification with xvector embeddings on Voxceleb
-This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
-The system is trained on Voxceleb 1+ Voxceleb2 training data.
 For a better experience, we encourage you to learn more about
-[SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
-| Release | EER(%)
 |:-------------:|:--------------:|
-| 05-03-21 | 3.2 |
 ## Pipeline description
-This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
 ## Install SpeechBrain
@@ -47,21 +62,23 @@ pip install speechbrain
 Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
-### Compute your speaker embeddings
 ```python
 import torchaudio
 from speechbrain.pretrained import EncoderClassifier
-classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
-signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
-embeddings = classifier.encode_batch(signal)
 ```
 ### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ### Training
-The model was trained with SpeechBrain (aa018540).
 To train it from scratch follows these steps:
 1. Clone SpeechBrain:
 ```bash
@@ -76,11 +93,11 @@ pip install -e .
 3. Run Training:
 ```
-cd  recipes/VoxCeleb/SpeakerRec/
-python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
 ```
-You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
@@ -100,6 +117,21 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
 }
 ```
 #### Referencing SpeechBrain
@@ -110,7 +142,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
     year = {2021},
     publisher = {GitHub},
     journal = {GitHub repository},
-    howpublished = {\url{https://github.com/speechbrain/speechbrain}},
   }
 ```

 thumbnail:
 tags:
 - embeddings
+- Commands
+- Keywords
+- Keyword Spotting
 - pytorch
 - xvectors
 - TDNN
+- Command Recognition
 license: "apache-2.0"
 datasets:
+- google speech commands
 metrics:
+- Accuracy
 ---
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
+# Command Recognition with xvector embeddings on Google Speech Commands
+This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
+You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
+The dataset is primary provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
+-'yes'
+-'no'
+-'up'
+-'down'
+-'left'
+-'right'
+-'on'
+-'off'
+-'stop'
+-'go'
+-'unknown'
+-'silence'
 For a better experience, we encourage you to learn more about
+[SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
+| Release | Accuracy(%)
 |:-------------:|:--------------:|
+| 06-02-21 | 98.14 |
 ## Pipeline description
+This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on the top of that.
 ## Install SpeechBrain
 Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
+### Perform Command Recognition
 ```python
 import torchaudio
 from speechbrain.pretrained import EncoderClassifier
+classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
+out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
+print(text_lab)
+out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
+print(text_lab)
 ```
 ### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ### Training
+The model was trained with SpeechBrain (b7ff9dc4).
 To train it from scratch follows these steps:
 1. Clone SpeechBrain:
 ```bash
 3. Run Training:
 ```
+cd recipes/Google-speech-commands
+python train.py hparams/xvect.yaml --data_folder=your_data_folder
 ```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 }
 ```
+#### Referencing Google Speech Commands
+```@article{speechcommands,
+   author = { {Warden}, P.},
+    title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
+  journal = {ArXiv e-prints},
+  archivePrefix = "arXiv",
+  eprint = {1804.03209},
+  primaryClass = "cs.CL",
+  keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
+    year = 2018,
+    month = apr,
+    url = {https://arxiv.org/abs/1804.03209},
+}
+```
 #### Referencing SpeechBrain
     year = {2021},
     publisher = {GitHub},
     journal = {GitHub repository},
+    howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
   }
 ```