• Stars
    star
    156
  • Rank 239,589 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

AdaSpeech: Adaptive Text to Speech for Custom Voice

AdaSpeech: Adaptive Text to Speech for Custom Voice [WIP]

Unofficial Pytorch implementation of AdaSpeech.

Note:

  • I am not considering multi-speaker use case, Iam much more focus only on single speaker.
  • I will use only Utterance level encoder and Phoneme level encoder not condition layer norm (which is the soul of AdaSpeech paper), it definelty restrict the adaptive nature of AdaSpeech but my focus is to improve FastSpeech 2 acoustic generalization rather than adaptation.

Citations

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Requirements :

All code written in Python 3.6.2 .

  • Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

  • Installing other requirements :
pip install -r requirements.txt
  • To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

For training

 python train_fastspeech.py --outdir etc -c configs/default.yaml -n "name"

Note

  • For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox please visit Deepsync Technologies.

More Repositories

1

ViViT-pytorch

Implementation of ViViT: A Video Vision Transformer
Python
500
star
2

ResUnet

Pytorch implementation of ResUnet and ResUnet ++
Python
444
star
3

VocGAN

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Python
317
star
4

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Python
249
star
5

convolution-vision-transformers

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers
Python
217
star
6

iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Python
214
star
7

MLP-Mixer-pytorch

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
Python
207
star
8

hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Python
198
star
9

CrossViT-pytorch

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Python
180
star
10

HiFiplusplus-pytorch

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement
Python
148
star
11

SoundStorm-pytorch

Google's SoundStorm: Efficient Parallel Audio Generation
Python
116
star
12

Avocodo-pytorch

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Python
115
star
13

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Python
101
star
14

CeiT-pytorch

Implementation of Convolutional enhanced image Transformer
Python
99
star
15

vae_tacotron2

VAE Tacotron 2, an alternative of GST Tacotron
Python
85
star
16

TalkNet2-pytorch

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.
Python
85
star
17

TFGAN

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Python
84
star
18

LightSpeech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Python
80
star
19

HiFi-GAN

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Python
79
star
20

NaturalSpeech2

Python
70
star
21

UnivNet-pytorch

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Python
69
star
22

AdaSpeech2

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Jupyter Notebook
69
star
23

AudioMAE-pytorch

Unofficial PyTorch implementation of Masked Autoencoders that Listen
Python
61
star
24

melgan

MelGAN implementation with Multi-Band and Full Band supports...
Jupyter Notebook
59
star
25

Liveness-Detection

Liveness Detection for human face
Python
52
star
26

gmvae_tacotron

Gaussian Mixture VAE Tacotron
Python
52
star
27

iSTFT-Avocodo-pytorch

Ultrafast GAN based Vocoder for Text to Speech
Python
50
star
28

Phone-Level-Mixture-Density-Network-for-TTS

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Jupyter Notebook
45
star
29

LSTM-Time-Series-Analysis

Using LSTM network for time series forecasting
Jupyter Notebook
44
star
30

NU-Wave-pytorch

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Python
37
star
31

ResMLP-pytorch

ResMLP: Feedforward networks for image classification with data-efficient training
Python
36
star
32

PPSpeech

PPSpeech: Phrase based Parallel End-to-End TTS System
Python
35
star
33

Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Python
34
star
34

NU-Wave2-pytorch

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]
Python
24
star
35

SiT-pytorch

SiT: Self-supervised vision Transformer
Python
19
star
36

rectified-linear-attention

Sparse Attention with Linear Units
Python
17
star
37

CoaT-pytorch

CoaT: Co-Scale Conv-Attentional Image Transformers
Python
16
star
38

Movie-Recommender-System

Python
13
star
39

LocalViT-pytorch

LocalViT: Bringing Locality to Vision Transformers
Python
9
star
40

Bidirectional-LEM-pytorch

Pytorch Implementation of Bidirectional Long Expressive Memory
Python
9
star
41

compact-convolution-transformer

Compact Convolution Transformers
Python
8
star
42

WaveFlow

WaveFlow : A Compact Flow-based Model for Raw Audio
Python
4
star
43

McKinsey-Hiring-Hack-Challenge

My solution for Online McKinsey Hiring Hack Challenge hosted by Analytics Vidhya.
Jupyter Notebook
4
star
44

Word2Vec

Word2Vec tutorial using tensorflow
Jupyter Notebook
3
star
45

IMDB-Movie-Review-sentiment-Analysis

Jupyter Notebook
3
star
46

Meme-recognizer

Recognize the given image is Meme or not
Jupyter Notebook
3
star
47

Introduction-to-Tensorflow

Tensorflow tutorial from scratch
Jupyter Notebook
2
star
48

fastspeech2_samples

2
star
49

MyApplication

Android application in which audio and image play simultaneously
Java
2
star
50

Loan-Prediction-Challenge

Jupyter Notebook
2
star
51

CNN-Visualization

Jupyter Notebook
2
star
52

Twins-SVT-pytorch

Twins: Revisiting the Design of Spatial Attention in Vision Transformers
2
star
53

Image-classifier-for-all

Universal Image classifier
Jupyter Notebook
2
star
54

PropertySetUp

1
star
55

Document-Classifier

Classify documents using Machine learning
Jupyter Notebook
1
star
56

Data-Analysis

Jupyter Notebook
1
star
57

Avito-Duplicate-ads

Jupyter Notebook
1
star
58

Natural-Language-Processing

Jupyter Notebook
1
star
59

Keras

Predictive analysis using Keras a powerful Neural network library run over theano for python
Jupyter Notebook
1
star
60

Data-Mining-Algos

Famous Data Mining Algos written in python using scikit-learn library
Python
1
star
61

Identify-Question-Type

Given a question, the aim is to identify the category it belongs to. The four categories to handle for this assignment are : Who, What, When, Affirmation(yes/no). Label any sentence that does not fall in any of the above four as "Unknown" type.
Jupyter Notebook
1
star
62

Inception-Transformer-pytorch

iFormer: Inception Transformer
1
star
63

Email-Classification-Statement-Contract

classify emails into statements and contracts
Python
1
star
64

Movie-Recommendation-System

Hybrid Movie recommendation system
Jupyter Notebook
1
star
65

SystemInfo

Jupyter Notebook
1
star
66

rishikksh20.github.io

My Github Blog
HTML
1
star
67

LSTM_syntheic_gradient

Jupyter Notebook
1
star