Commit
·
4aec8d8
1
Parent(s):
2dfe872
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,38 +3,53 @@ language: "en"
|
|
| 3 |
thumbnail:
|
| 4 |
tags:
|
| 5 |
- embeddings
|
| 6 |
-
-
|
| 7 |
-
-
|
| 8 |
-
-
|
| 9 |
- pytorch
|
| 10 |
- xvectors
|
| 11 |
- TDNN
|
|
|
|
| 12 |
license: "apache-2.0"
|
| 13 |
datasets:
|
| 14 |
-
-
|
| 15 |
metrics:
|
| 16 |
-
-
|
| 17 |
-
|
| 18 |
---
|
| 19 |
|
| 20 |
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
|
| 21 |
<br/><br/>
|
| 22 |
|
| 23 |
-
#
|
| 24 |
-
|
| 25 |
-
This repository provides all the necessary tools to
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
For a better experience, we encourage you to learn more about
|
| 29 |
-
[SpeechBrain](https://speechbrain.github.io). The given model performance on
|
| 30 |
|
| 31 |
-
| Release |
|
| 32 |
|:-------------:|:--------------:|
|
| 33 |
-
|
|
| 34 |
|
| 35 |
|
| 36 |
## Pipeline description
|
| 37 |
-
This system is composed of a TDNN model coupled with statistical pooling.
|
| 38 |
|
| 39 |
## Install SpeechBrain
|
| 40 |
|
|
@@ -47,21 +62,23 @@ pip install speechbrain
|
|
| 47 |
Please notice that we encourage you to read our tutorials and learn more about
|
| 48 |
[SpeechBrain](https://speechbrain.github.io).
|
| 49 |
|
| 50 |
-
###
|
| 51 |
|
| 52 |
```python
|
| 53 |
import torchaudio
|
| 54 |
from speechbrain.pretrained import EncoderClassifier
|
| 55 |
-
classifier = EncoderClassifier.from_hparams(source="speechbrain/
|
| 56 |
-
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
```
|
| 59 |
|
| 60 |
### Inference on GPU
|
| 61 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
| 62 |
|
| 63 |
### Training
|
| 64 |
-
The model was trained with SpeechBrain (
|
| 65 |
To train it from scratch follows these steps:
|
| 66 |
1. Clone SpeechBrain:
|
| 67 |
```bash
|
|
@@ -76,11 +93,11 @@ pip install -e .
|
|
| 76 |
|
| 77 |
3. Run Training:
|
| 78 |
```
|
| 79 |
-
cd
|
| 80 |
-
python
|
| 81 |
```
|
| 82 |
|
| 83 |
-
You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/
|
| 84 |
|
| 85 |
### Limitations
|
| 86 |
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|
|
@@ -100,6 +117,21 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
|
|
| 100 |
}
|
| 101 |
```
|
| 102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
#### Referencing SpeechBrain
|
| 105 |
|
|
@@ -110,7 +142,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
|
|
| 110 |
year = {2021},
|
| 111 |
publisher = {GitHub},
|
| 112 |
journal = {GitHub repository},
|
| 113 |
-
howpublished = {
|
| 114 |
}
|
| 115 |
```
|
| 116 |
|
|
|
|
| 3 |
thumbnail:
|
| 4 |
tags:
|
| 5 |
- embeddings
|
| 6 |
+
- Commands
|
| 7 |
+
- Keywords
|
| 8 |
+
- Keyword Spotting
|
| 9 |
- pytorch
|
| 10 |
- xvectors
|
| 11 |
- TDNN
|
| 12 |
+
- Command Recognition
|
| 13 |
license: "apache-2.0"
|
| 14 |
datasets:
|
| 15 |
+
- google speech commands
|
| 16 |
metrics:
|
| 17 |
+
- Accuracy
|
| 18 |
+
|
| 19 |
---
|
| 20 |
|
| 21 |
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
|
| 22 |
<br/><br/>
|
| 23 |
|
| 24 |
+
# Command Recognition with xvector embeddings on Google Speech Commands
|
| 25 |
+
|
| 26 |
+
This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
|
| 27 |
+
You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
|
| 28 |
+
The dataset is primary provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
|
| 29 |
+
|
| 30 |
+
-'yes'
|
| 31 |
+
-'no'
|
| 32 |
+
-'up'
|
| 33 |
+
-'down'
|
| 34 |
+
-'left'
|
| 35 |
+
-'right'
|
| 36 |
+
-'on'
|
| 37 |
+
-'off'
|
| 38 |
+
-'stop'
|
| 39 |
+
-'go'
|
| 40 |
+
-'unknown'
|
| 41 |
+
-'silence'
|
| 42 |
|
| 43 |
For a better experience, we encourage you to learn more about
|
| 44 |
+
[SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
|
| 45 |
|
| 46 |
+
| Release | Accuracy(%)
|
| 47 |
|:-------------:|:--------------:|
|
| 48 |
+
| 06-02-21 | 98.14 |
|
| 49 |
|
| 50 |
|
| 51 |
## Pipeline description
|
| 52 |
+
This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on the top of that.
|
| 53 |
|
| 54 |
## Install SpeechBrain
|
| 55 |
|
|
|
|
| 62 |
Please notice that we encourage you to read our tutorials and learn more about
|
| 63 |
[SpeechBrain](https://speechbrain.github.io).
|
| 64 |
|
| 65 |
+
### Perform Command Recognition
|
| 66 |
|
| 67 |
```python
|
| 68 |
import torchaudio
|
| 69 |
from speechbrain.pretrained import EncoderClassifier
|
| 70 |
+
classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
|
| 71 |
+
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
|
| 72 |
+
print(text_lab)
|
| 73 |
+
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
|
| 74 |
+
print(text_lab)
|
| 75 |
```
|
| 76 |
|
| 77 |
### Inference on GPU
|
| 78 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
| 79 |
|
| 80 |
### Training
|
| 81 |
+
The model was trained with SpeechBrain (b7ff9dc4).
|
| 82 |
To train it from scratch follows these steps:
|
| 83 |
1. Clone SpeechBrain:
|
| 84 |
```bash
|
|
|
|
| 93 |
|
| 94 |
3. Run Training:
|
| 95 |
```
|
| 96 |
+
cd recipes/Google-speech-commands
|
| 97 |
+
python train.py hparams/xvect.yaml --data_folder=your_data_folder
|
| 98 |
```
|
| 99 |
|
| 100 |
+
You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
|
| 101 |
|
| 102 |
### Limitations
|
| 103 |
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
|
|
|
|
| 117 |
}
|
| 118 |
```
|
| 119 |
|
| 120 |
+
#### Referencing Google Speech Commands
|
| 121 |
+
```@article{speechcommands,
|
| 122 |
+
author = { {Warden}, P.},
|
| 123 |
+
title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
|
| 124 |
+
journal = {ArXiv e-prints},
|
| 125 |
+
archivePrefix = "arXiv",
|
| 126 |
+
eprint = {1804.03209},
|
| 127 |
+
primaryClass = "cs.CL",
|
| 128 |
+
keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
|
| 129 |
+
year = 2018,
|
| 130 |
+
month = apr,
|
| 131 |
+
url = {https://arxiv.org/abs/1804.03209},
|
| 132 |
+
}
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
|
| 136 |
#### Referencing SpeechBrain
|
| 137 |
|
|
|
|
| 142 |
year = {2021},
|
| 143 |
publisher = {GitHub},
|
| 144 |
journal = {GitHub repository},
|
| 145 |
+
howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
|
| 146 |
}
|
| 147 |
```
|
| 148 |
|