Spaces:
Running
Running
add about
Browse files- src/texts.py +34 -2
src/texts.py
CHANGED
|
@@ -1,11 +1,43 @@
|
|
| 1 |
LLM_BENCHMARKS_TEXT = f"""
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
|
| 5 |
|
| 6 |
## Reproducibility
|
| 7 |
To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
"""
|
| 10 |
|
| 11 |
EVALUATION_QUEUE_TEXT = """
|
|
|
|
| 1 |
LLM_BENCHMARKS_TEXT = f"""
|
| 2 |
+
# About
|
| 3 |
+
|
| 4 |
+
As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech.
|
| 5 |
+
However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments.
|
| 6 |
+
Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility.
|
| 7 |
+
By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up.
|
| 8 |
+
|
| 9 |
+
## More information
|
| 10 |
More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
|
| 11 |
|
| 12 |
## Reproducibility
|
| 13 |
To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
|
| 14 |
|
| 15 |
+
## Credits
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
This benchmark is inspired by [TTS Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) which instead focuses on the subjective evaluation of TTS models.
|
| 19 |
+
Our benchmark would not be possible without the many open-source TTS models on Hugging Face and GitHub.
|
| 20 |
+
Additionally, our benchmark uses the following datasets:
|
| 21 |
+
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/h)
|
| 22 |
+
- [LibriTTS](https://www.openslr.org/60/)
|
| 23 |
+
- [VCTK](https://datashare.ed.ac.uk/handle/10283/2950)
|
| 24 |
+
- [Common Voice](https://commonvoice.mozilla.org/)
|
| 25 |
+
- [ESC-50](https://github.com/karolpiczak/ESC-50)
|
| 26 |
+
And the following metrics/representations/tools:
|
| 27 |
+
- [Wav2Vec2](https://arxiv.org/abs/2006.11477)
|
| 28 |
+
- [Hubert](https://arxiv.org/abs/2006.11477)
|
| 29 |
+
- [WavLM](https://arxiv.org/abs/2110.13900)
|
| 30 |
+
- [PESQ](https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Speech_Quality)
|
| 31 |
+
- [VoiceFixer](https://arxiv.org/abs/2204.05841)
|
| 32 |
+
- [WADA SNR](https://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf)
|
| 33 |
+
- [Whisper](https://arxiv.org/abs/2212.04356)
|
| 34 |
+
- [Masked Prosody Model](https://huggingface.co/cdminix/masked_prosody_model)
|
| 35 |
+
- [PyWorld](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder)
|
| 36 |
+
- [WeSpeaker](https://arxiv.org/abs/2210.17016)
|
| 37 |
+
- [D-Vector](https://github.com/yistLin/dvector)
|
| 38 |
+
|
| 39 |
+
Authors: Christoph Minixhofer, Ondřej Klejch, and Peter Bell
|
| 40 |
+
of the University of Edinburgh.
|
| 41 |
"""
|
| 42 |
|
| 43 |
EVALUATION_QUEUE_TEXT = """
|