• Stars
    star
    154
  • Rank 242,095 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Fast (GAN Based Neural) Vocoder

Todo

  • Support Basis-MelGAN
  • Add more demo
  • Add pretrained model
  • Support NHV

Discription

Include Basis-MelGAN (paper link: https://arxiv.org/pdf/2106.13419.pdf), MelGAN, HifiGAN and Multiband-HifiGAN, maybe include Neural Homomorphic Vocoder in the future. Developed on BiaoBei dataset, you can modify conf and hparams.py to fit your own dataset and model.

Demo

RTF

  • Platform: MacBook Pro M1
  • HiFiGAN (large): NaN
  • HiFiGAN (light, baseline): 0.2424
  • MultiBand-HiFiGAN (large): 0.4956
  • MultiBand-HiFiGAN (light): 0.1591
  • Basis-MelGAN: 0.0498
  • HiFiGAN (light) : MultiBand-HiFiGAN (large) : MultiBand-HiFiGAN (light) : Basis-MelGAN = 50 : 102 : 33 : 10

Usage (of Basis-MelGAN)

1. abstract

Recent studies have shown that neural vocoders based on generative adversarial network (GAN) can generate audios with high quality. While GAN based neural vocoders have shown to be computationally much more efficient than those based on autoregressive predictions, the real-time generation of the highest quality audio on CPU is still a very challenging task. One major computation of all GAN-based neural vocoders comes from the stacked upsampling layers, which were designed to match the length of the waveform's length of output and temporal resolution. Meanwhile, the computational complexity of upsampling networks is closely correlated with the numbers of samples generated for each window. To reduce the computation of upsampling layers, we propose a new GAN based neural vocoder called Basis-MelGAN where the raw audio samples are decomposed with a learned basis and their associated weights. As the prediction targets of Basis-MelGAN are the weight values associated with each learned basis instead of the raw audio samples, the upsampling layers in Basis-MelGAN can be designed with much simpler networks. Compared with other GAN based neural vocoders, the proposed Basis-MelGAN could produce comparable high-quality audio but significantly reduced computational complexity from HiFi-GAN V1's 17.74 GFLOPs to 7.95 GFLOPs.

2. Prepare data

  • Refer to xcmyz: ConvTasNet4BasisMelGAN to get dataset for Basis-MelGAN
  • Move ConvTasNet4BasisMelGAN/Basis-MelGAN-dataset to FastVocoder
  • Write path of wav data in a file, for example: cd dataset && python3 basismelgan.py
  • Run bash preprocess.sh dataset/basismelgan.txt Basis-MelGAN-dataset/processed dataset/audio dataset/mel

3. Train

  • command:
bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    /path/to/configuration/file \
    <if use scheduler> \
    <if mix precision training>
  • for example:
bash train.sh \
    0 \
    dataset/audio/train \
    dataset/audio/valid \
    dataset/mel/train \
    dataset/mel/valid \
    basis-melgan \
    conf/basis-melgan/light.yaml \
    0 0

4. Train from checkpoint

  • command:
bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    /path/to/configuration/file \
    <if use scheduler> \
    <if mix precision training> \
    /path/to/checkpoint \
    <step of checkpoint>

5. Synthesize

  • command:
bash synthesize.sh \
    /path/to/checkpoint \
    /path/to/mel \
    /path/for/saving/wav \
    <model name> \
    /path/to/configuration/file

Usage (of MelGAN, HifiGAN and Multiband-HifiGAN)

1. Prepare data

  • write path of wav data in a file, for example: cd dataset && python3 biaobei.py
  • bash preprocess.sh <wav path file> <path to save processed data> dataset/audio dataset/mel
  • for example: bash preprocess.sh dataset/BZNSYP.txt processed dataset/audio dataset/mel

2. Train

  • command:
bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    /path/to/configuration/file \
    <if use scheduler> \
    <if mix precision training>
  • for example:
bash train.sh \
    0 \
    dataset/audio/train \
    dataset/audio/valid \
    dataset/mel/train \
    dataset/mel/valid \
    hifigan \
    conf/hifigan/light.yaml \
    0 0

3. Train from checkpoint

  • command:
bash train.sh \
    <GPU ids> \
    /path/to/audio/train \
    /path/to/audio/valid \
    /path/to/mel/train \
    /path/to/mel/valid \
    <model name> \
    /path/to/configuration/file \
    <if use scheduler> \
    <if mix precision training> \
    /path/to/checkpoint \
    <step of checkpoint>

4. Synthesize

  • command:
bash synthesize.sh \
    /path/to/checkpoint \
    /path/to/mel \
    /path/for/saving/wav \
    <model name> \
    /path/to/configuration/file

More Repositories

1

FastSpeech

The Implementation of FastSpeech based on pytorch.
Python
856
star
2

Transformer-TTS

TTS model based on Transformer.
Python
57
star
3

FastSpeech2

The Implementation of FastSpeech2 Based on Pytorch.
Python
52
star
4

CLONE

20
star
5

ConvTasNet4BasisMelGAN

This repo contains conv-tasnet for basis-melgan. If you want to get code of basis-melgan, please refer to FastVocoder.
Python
19
star
6

Tacotron2-Pytorch

follow NVIDIA, simplify it and support data parallel.
Python
13
star
7

Lifelong-Learning-Tacotron2

MultiSpeaker Tacotron2 using LifeLong Learning.
Python
13
star
8

Hackathon-EnglishLearning

Voice Scoring System.
JavaScript
8
star
9

tacotron2.xcmyz

new version of tacotron2 (old version: https://github.com/xcmyz/Tacotron2-Pytorch)
Python
8
star
10

LM-Tacotron2

Tacotron2 Combine with Language Model (BERT).
Python
7
star
11

SpeakerVerification

Speaker Verification (GE2E Loss)
Python
7
star
12

Gobang-AI

A C++ Implementation of Gobang AI.
C++
6
star
13

Forced-Alignment

using montreal-forced-aligner.
Python
2
star
14

bert-race

BERT/ALBERT based model for RACE dataset, support multi-worker, multi-GPU, FP16 and bind CPU.
Python
2
star
15

Calculator

A Calculator implemented in Python.
Python
1
star
16

FaceDetection

Python
1
star
17

VAE-Tacotron

A Pytorch Implementation of Tacotron Combined with VAE
Python
1
star
18

xcmyz

1
star
19

Polynomial-Calculator

基于Python实现的带有图形界面的多项式计算器
Python
1
star
20

AVX-programming

CPU acceleration using AVX (Advanced Vector Extensions)
1
star
21

ExpressionTransformation

prefix expression, infix expression, postfix expression.
Python
1
star