Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Rust

Dart

Lua

Assembly

F#

Nix

Go

C#

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Nix

PowerShell

R

Go

TypeScript

C++

Kotlin

Java

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇦🇩 Andorra

🇳🇦 Namibia

🇮🇸 Iceland

🇰🇿 Kazakhstan

🇷🇪 Réunion

🇻🇺 Vanuatu

🇦🇽 Åland Islands

🇵🇦 Panama

All Countries Compare Countries

thuhcsi/VAENAR-TTS

Stars
144
Rank 254,016 (Top 6 %)
Language
Python
License
MIT License
Created over 3 years ago
Updated about 3 years ago

thuhcsi/VAENAR-TTS

thuhcsi

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

English: LJSpeech
Mandarin: DataBaker(标贝)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

Reference

NeuCoSVC

Crystal

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.

SECap

Crystal.TTVS

Crystal TTVS engine is a real-time audio-visual Multilingual speech synthesizer with a 3D expressive avatar.

FlatTN

Chinese Text Normalization and Dataset

NeuFA

Neural network-based forced alignment with bidirectional attention mechanism

SpanPSP

LightGrad

SnakeGAN

Please visit https://thuhcsi.github.io/SnakeGAN/

tacotron

PyTorch implementation of Tacotron and Tacotron2

icassp2021-emotion-tts

Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/

DiffVar

IJCAI2019-DRL4SER

The python implementation for paper "Towards Discriminative Representation Learning for Speech Emotion Recognition" in IJCAI-2019

S2G-MDDiffusion

english-conversation-corpus

English conversation corpus for conversational TTS.

NeuCoSVC-Demo

PortableTTS

Contextual-Biasing-Dataset

open-source Mandarian biased word dataset

dpss-exp3-VC-BNF

Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>

torch_speaker

mm2022-conversational-tts

adsv_voting

ExpressiveBailando

icassp2022-Transformer-S2A

mst-fastspeech2

thuhcsi.github.io

https://thuhcsi.github.io/

SCNet

icassp2022-hybrid-bottleneck-vc

Please visit https://thuhcsi.github.io/icassp2022-hybrid-bottleneck-vc/

icassp2023-coherent-tts

Please visit https://thuhcsi.github.io/icassp2023-coherent-tts

icassp2022-FastFoley

dpss-exp2-HMM-2023

ex2 for dpss 2023

icassp2022-conversational-tts

Please visit https://thuhcsi.github.io/icassp2022-conversational-tts/

interspeech2022-expressive-svs

Please visit https://thuhcsi.github.io/interspeech2022-expressive-svs

interspeech2022-acciaccatura-svs

Please visit https://thuhcsi.github.io/interspeech2022-acciaccatura-svs/

melody-unsupervised-pretraining-svs

interspeech2019-tts-samples

Please visit https://thuhcsi.github.io/interspeech2019-tts-samples/

StyleDub

Please visit https://thuhcsi.github.io/StyleDub/

Semi-Supervised-MDD

icassp2023-ddpm-prosody-predictor

Please visit https://thuhcsi.github.io/icassp2023-ddpm-prosody-predictor