PyTorch implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention based partially on the following projects:
- https://github.com/Kyubyong/dc_tts (audio pre processing)
- https://github.com/r9y9/deepvoice3_pytorch (data loader sampler)
Online Text-To-Speech Demo
The following notebooks are executable on https://colab.research.google.com :
For audio samples and pretrained models, visit the above notebook links.
Training/Synthesizing English Text-To-Speech
The English TTS uses the LJ-Speech dataset.
- Download the dataset:
python dl_and_preprop_dataset.py --dataset=ljspeech
- Train the Text2Mel model:
python train-text2mel.py --dataset=ljspeech
- Train the SSRN model:
python train-ssrn.py --dataset=ljspeech
- Synthesize sentences:
python synthesize.py --dataset=ljspeech
- The WAV files are saved in the
samples
folder.
- The WAV files are saved in the
Training/Synthesizing Mongolian Text-To-Speech
The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.
- Download the dataset:
python dl_and_preprop_dataset.py --dataset=mbspeech
- Train the Text2Mel model:
python train-text2mel.py --dataset=mbspeech
- Train the SSRN model:
python train-ssrn.py --dataset=mbspeech
- Synthesize sentences:
python synthesize.py --dataset=mbspeech
- The WAV files are saved in the
samples
folder.
- The WAV files are saved in the