• Stars
    star
    317
  • Rank 132,216 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Modified VocGAN


This repo implements modified version of [VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network](https://arxiv.org/abs/2007.15256) using Pytorch, for actual VocGAN checkout to `baseline` branch. I bit modify the VocGAN's generator and used Full-Band MelGAN's discriminator instead of VocGAN's discriminator, as in my research I found MelGAN's discriminator is very fast while training and enough powerful to train Generator to produce high fidelity voice whereas VocGAN Hierarchically-nested JCU discriminator is quite huge and extremely slows the training process.

Tested on Python 3.6

pip install -r requirements.txt

Prepare Dataset

  • Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
  • preprocess: python preprocess.py -c config/default.yaml -d [data's root path]
  • Edit configuration yaml file

Train & Tensorboard

  • python trainer.py -c [config yaml file] -n [name of the run]

    • cp config/default.yaml config/config.yaml and then edit config.yaml
    • Write down the root path of train/validation files to 2nd/3rd line.
  • tensorboard --logdir logs/

Notes

  1. This repo implements modified VocGAN for faster training although for true VocGAN implementation please checkout baseline branch, In my testing I am available to generate High-Fidelity audio in real time from Modified VocGAN.
  2. Training cost for baseline VocGAN's Discriminator is too high (2.8 sec/it on P100 with batch size 16) as compared to Generator (7.2 it/sec on P100 with batch size 16), so it's unfeasible for me to train this model for long time.
  3. May be we can optimizer baseline VocGAN's Discriminator by downsampling the audio on pre-processing stage instead of Training stage (currently I used torchaudio.transform.Resample as layer for downsampling the audio), this step might be speed-up overall Discriminator training.
  4. I trained baseline model for 300 epochs (with batch size 16) on LJSpeech, and quality of generated audio is similar to the MelGAN at same epoch on same dataset. Author recommend to train model till 3000 epochs which is not feasible at current training speed (2.80 sec/it).
  5. I am open for any suggestion and modification on this repo.
  6. For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox 🤖 please visit Deepsync Technologies.

Inference

  • python inference.py -p [checkpoint path] -i [input mel path]

Pretrained models

Two pretrained model are provided. Both pretrained models are trained using modified-VocGAN structure.

Audio Samples

Using pretrained models, we can reconstruct audio samples. Visit here to listen.

Results

[WIP]

References

More Repositories

1

ViViT-pytorch

Implementation of ViViT: A Video Vision Transformer
Python
500
star
2

ResUnet

Pytorch implementation of ResUnet and ResUnet ++
Python
444
star
3

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Python
249
star
4

convolution-vision-transformers

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers
Python
217
star
5

iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Python
214
star
6

MLP-Mixer-pytorch

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
Python
207
star
7

hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Python
198
star
8

CrossViT-pytorch

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Python
180
star
9

AdaSpeech

AdaSpeech: Adaptive Text to Speech for Custom Voice
Jupyter Notebook
156
star
10

HiFiplusplus-pytorch

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement
Python
148
star
11

SoundStorm-pytorch

Google's SoundStorm: Efficient Parallel Audio Generation
Python
116
star
12

Avocodo-pytorch

Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Python
115
star
13

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Python
101
star
14

CeiT-pytorch

Implementation of Convolutional enhanced image Transformer
Python
99
star
15

vae_tacotron2

VAE Tacotron 2, an alternative of GST Tacotron
Python
85
star
16

TalkNet2-pytorch

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.
Python
85
star
17

TFGAN

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Python
84
star
18

LightSpeech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Python
80
star
19

HiFi-GAN

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Python
79
star
20

NaturalSpeech2

Python
70
star
21

UnivNet-pytorch

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Python
69
star
22

AdaSpeech2

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Jupyter Notebook
69
star
23

AudioMAE-pytorch

Unofficial PyTorch implementation of Masked Autoencoders that Listen
Python
61
star
24

melgan

MelGAN implementation with Multi-Band and Full Band supports...
Jupyter Notebook
59
star
25

Liveness-Detection

Liveness Detection for human face
Python
52
star
26

gmvae_tacotron

Gaussian Mixture VAE Tacotron
Python
52
star
27

iSTFT-Avocodo-pytorch

Ultrafast GAN based Vocoder for Text to Speech
Python
50
star
28

Phone-Level-Mixture-Density-Network-for-TTS

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Jupyter Notebook
45
star
29

LSTM-Time-Series-Analysis

Using LSTM network for time series forecasting
Jupyter Notebook
44
star
30

NU-Wave-pytorch

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Python
37
star
31

ResMLP-pytorch

ResMLP: Feedforward networks for image classification with data-efficient training
Python
36
star
32

PPSpeech

PPSpeech: Phrase based Parallel End-to-End TTS System
Python
35
star
33

Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Python
34
star
34

NU-Wave2-pytorch

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]
Python
24
star
35

SiT-pytorch

SiT: Self-supervised vision Transformer
Python
19
star
36

rectified-linear-attention

Sparse Attention with Linear Units
Python
17
star
37

CoaT-pytorch

CoaT: Co-Scale Conv-Attentional Image Transformers
Python
16
star
38

Movie-Recommender-System

Python
13
star
39

LocalViT-pytorch

LocalViT: Bringing Locality to Vision Transformers
Python
9
star
40

Bidirectional-LEM-pytorch

Pytorch Implementation of Bidirectional Long Expressive Memory
Python
9
star
41

compact-convolution-transformer

Compact Convolution Transformers
Python
8
star
42

WaveFlow

WaveFlow : A Compact Flow-based Model for Raw Audio
Python
4
star
43

McKinsey-Hiring-Hack-Challenge

My solution for Online McKinsey Hiring Hack Challenge hosted by Analytics Vidhya.
Jupyter Notebook
4
star
44

Word2Vec

Word2Vec tutorial using tensorflow
Jupyter Notebook
3
star
45

IMDB-Movie-Review-sentiment-Analysis

Jupyter Notebook
3
star
46

Meme-recognizer

Recognize the given image is Meme or not
Jupyter Notebook
3
star
47

Introduction-to-Tensorflow

Tensorflow tutorial from scratch
Jupyter Notebook
2
star
48

fastspeech2_samples

2
star
49

MyApplication

Android application in which audio and image play simultaneously
Java
2
star
50

Loan-Prediction-Challenge

Jupyter Notebook
2
star
51

CNN-Visualization

Jupyter Notebook
2
star
52

Twins-SVT-pytorch

Twins: Revisiting the Design of Spatial Attention in Vision Transformers
2
star
53

Image-classifier-for-all

Universal Image classifier
Jupyter Notebook
2
star
54

PropertySetUp

1
star
55

Document-Classifier

Classify documents using Machine learning
Jupyter Notebook
1
star
56

Data-Analysis

Jupyter Notebook
1
star
57

Avito-Duplicate-ads

Jupyter Notebook
1
star
58

Natural-Language-Processing

Jupyter Notebook
1
star
59

Keras

Predictive analysis using Keras a powerful Neural network library run over theano for python
Jupyter Notebook
1
star
60

Data-Mining-Algos

Famous Data Mining Algos written in python using scikit-learn library
Python
1
star
61

Identify-Question-Type

Given a question, the aim is to identify the category it belongs to. The four categories to handle for this assignment are : Who, What, When, Affirmation(yes/no). Label any sentence that does not fall in any of the above four as "Unknown" type.
Jupyter Notebook
1
star
62

Inception-Transformer-pytorch

iFormer: Inception Transformer
1
star
63

Email-Classification-Statement-Contract

classify emails into statements and contracts
Python
1
star
64

Movie-Recommendation-System

Hybrid Movie recommendation system
Jupyter Notebook
1
star
65

SystemInfo

Jupyter Notebook
1
star
66

rishikksh20.github.io

My Github Blog
HTML
1
star
67

LSTM_syntheic_gradient

Jupyter Notebook
1
star