Spaces:
Build error
A newer version of the Gradio SDK is available:
5.49.1
Framework of NATSpeech
NATSpeech is a simple framework for Non-Autoregressive Text-to-Speech.
Directory Structure
egs: configuration files, which will be loaded byutils/commons/hparams.pydata_gen: data binarization codesmodules: modules and modelstasks: the training and inference logicsutils: commonly used utilsdata: dataraw: raw dataprocessed: data after preprocessbinary: binary data
checkpoints: model checkpoints, tensorboard logs and generated results for all experiments.
How to Add New Tasks and Run?
We show the basic steps of adding a new task/model and running the code (LJSpeech dataset as an example).
Add the model
Add your model to modules.
Add the task
Task classes are used to manage the training and inference procedures.
A new task (e.g., tasks.tts.fs.FastSpeechTask) should inherit the base task (tasks.tts.speech_base.TTSBaseTask)
class.
You must implement these methods:
build_tts_model, which builds the model for your task. -run_model, indicating how to use the model in training and inference.
You can override test_step and save_valid_result to change the validation/testing logics or add more plots to
tensorboard.
Add a new config file
Add a new config file in egs/datasets/audio/lj/YOUR_TASK.yaml. For example:
base_config: ./base_text2mel.yaml
task_cls: tasks.tts.fs.FastSpeechTask
# model configs
hidden_size: 256
dropout: 0.1
# some more configs .....
If you use a new dataset YOUR_DATASET, you should also add a YOUR_DATASET_Processor
in egs/datasets/audio/YOUR_DATASET/preprocess.py, inheriting data_gen.tts.base_preprocess.BasePreprocessor, which
loads some meta information of the dataset.
Preprocess and binary dataset
python data_gen/tts/runs/align_and_binarize.py --config egs/datasets/audio/lj/base_text2mel.yaml
Training
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config YOUR_CONFIG --exp_name YOUR_EXP_NAME --reset
You can open Tensorboard via:
tensorboard --logdir checkpoints/EXP_NAME
Inference (Testing)
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/datasets/audio/lj/YOUR_TASK.yaml --exp_name YOUR_EXP_NAME --reset --infer
Design Philosophy
Random-Access Binarized Dataset
To address the IO problem when reading small files, we design a IndexedDataset class (utils/commons/indexed_datasets.py)
Global Config
We introduce a global config hparams, which is load from a .yaml config file and can be used in anywhere. However,
we do not recommend using it in some general-purpose modules.
BaseTrainer Framework
Our base trainer and base task classes refer to PytorchLightning, which builds some commonly used training/inference code structure. Our framework supports multi-process GPU training without changing the subclass codes.
Checkpoint Saving
All checkpoints and tensorboard logs are saved in checkpoints/EXP_NAME, where EXP_NAME is set in the running
command: python tasks/run.py .... --exp_name EXP_NAME. You can use tensorboard --logdir checkpoints/EXP_NAME to open
the tensorboard and check the training loss curves etc.