• Stars
    star
    144
  • Rank 254,016 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

  1. English: LJSpeech
  2. Mandarin: DataBaker(ๆ ‡่ด)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(ๆ ‡่ด):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(ๆ ‡่ด):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(ๆ ‡่ด):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

Reference

  1. XuezheMax/flowseq
  2. keithito/tacotron

More Repositories

1

NeuCoSVC

Python
235
star
2

Crystal

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.
C++
220
star
3

SECap

Python
85
star
4

Crystal.TTVS

Crystal TTVS engine is a real-time audio-visual Multilingual speech synthesizer with a 3D expressive avatar.
C++
83
star
5

FlatTN

Chinese Text Normalization and Dataset
Python
76
star
6

NeuFA

Neural network-based forced alignment with bidirectional attention mechanism
Python
69
star
7

SpanPSP

Python
69
star
8

LightGrad

Python
65
star
9

SnakeGAN

Please visit https://thuhcsi.github.io/SnakeGAN/
Python
33
star
10

tacotron

PyTorch implementation of Tacotron and Tacotron2
Python
32
star
11

icassp2021-emotion-tts

Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/
Python
32
star
12

DiffVar

Python
26
star
13

IJCAI2019-DRL4SER

The python implementation for paper "Towards Discriminative Representation Learning for Speech Emotion Recognition" in IJCAI-2019
Python
22
star
14

S2G-MDDiffusion

Python
22
star
15

english-conversation-corpus

English conversation corpus for conversational TTS.
Shell
19
star
16

NeuCoSVC-Demo

HTML
14
star
17

PortableTTS

Python
12
star
18

Contextual-Biasing-Dataset

open-source Mandarian biased word dataset
10
star
19

dpss-exp3-VC-BNF

Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>
Python
9
star
20

torch_speaker

Python
8
star
21

mm2022-conversational-tts

Python
8
star
22

adsv_voting

Python
7
star
23

ExpressiveBailando

7
star
24

icassp2022-Transformer-S2A

HTML
5
star
25

mst-fastspeech2

Python
4
star
26

thuhcsi.github.io

https://thuhcsi.github.io/
HTML
3
star
27

SCNet

Python
3
star
28

icassp2022-hybrid-bottleneck-vc

Please visit https://thuhcsi.github.io/icassp2022-hybrid-bottleneck-vc/
CSS
2
star
29

icassp2023-coherent-tts

Please visit https://thuhcsi.github.io/icassp2023-coherent-tts
SCSS
2
star
30

icassp2022-FastFoley

2
star
31

dpss-exp2-HMM-2023

ex2 for dpss 2023
Python
2
star
32

icassp2022-conversational-tts

Please visit https://thuhcsi.github.io/icassp2022-conversational-tts/
1
star
33

interspeech2022-expressive-svs

Please visit https://thuhcsi.github.io/interspeech2022-expressive-svs
SCSS
1
star
34

interspeech2022-acciaccatura-svs

Please visit https://thuhcsi.github.io/interspeech2022-acciaccatura-svs/
SCSS
1
star
35

melody-unsupervised-pretraining-svs

SCSS
1
star
36

interspeech2019-tts-samples

Please visit https://thuhcsi.github.io/interspeech2019-tts-samples/
1
star
37

StyleDub

Please visit https://thuhcsi.github.io/StyleDub/
Python
1
star
38

Semi-Supervised-MDD

Python
1
star
39

icassp2023-ddpm-prosody-predictor

Please visit https://thuhcsi.github.io/icassp2023-ddpm-prosody-predictor
SCSS
1
star