• Stars
    star
    631
  • Rank 71,222 (Top 2 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created about 5 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MelGAN vocoder (compatible with NVIDIA/tacotron2)

MelGAN

Unofficial PyTorch implementation of MelGAN vocoder

Key Features

  • MelGAN is lighter, faster, and better at generalizing to unseen speakers than WaveGlow.
  • This repository use identical mel-spectrogram function from NVIDIA/tacotron2, so this can be directly used to convert output from NVIDIA's tacotron2 into raw-audio.
  • Pretrained model on LJSpeech-1.1 via PyTorch Hub.

Prerequisites

Tested on Python 3.6

pip install -r requirements.txt

Prepare Dataset

  • Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
  • preprocess: python preprocess.py -c config/default.yaml -d [data's root path]
  • Edit configuration yaml file

Train & Tensorboard

  • python trainer.py -c [config yaml file] -n [name of the run]
    • cp config/default.yaml config/config.yaml and then edit config.yaml
    • Write down the root path of train/validation files to 2nd/3rd line.
    • Each path should contain pairs of *.wav with corresponding (preprocessed) *.mel file.
    • The data loader parses list of files within the path recursively.
  • tensorboard --logdir logs/

Pretrained model

Try with Google Colab: TODO

import torch
vocoder = torch.hub.load('seungwonpark/melgan', 'melgan')
vocoder.eval()
mel = torch.randn(1, 80, 234) # use your own mel-spectrogram here

if torch.cuda.is_available():
    vocoder = vocoder.cuda()
    mel = mel.cuda()

with torch.no_grad():
    audio = vocoder.inference(mel)

Inference

  • python inference.py -p [checkpoint path] -i [input mel path]

Results

See audio samples at: http://swpark.me/melgan/. Model was trained at V100 GPU for 14 days using LJSpeech-1.1.

Implementation Authors

License

BSD 3-Clause License.

Useful resources

More Repositories

1

RandWireNN

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"
Python
683
star
2

ghudegy-chain

Nth commit must have commit hash with N leading zeros - μ§„μ§œ ꡬ데기컡 2018
Python
89
star
3

awesome-tts-samples

Awesome list of TTS papers with audio samples
58
star
4

awesome-model-cards

Resources related to the model cards for ML
11
star
5

istft-pytorch

Two different PyTorch implementation of Inverse-STFT for discussion at https://github.com/keunwoochoi/torchaudio-contrib/issues/27
Python
9
star
6

can-google-ocr-this

Will these images/videos eventually show up in the google search results?
7
star
7

LearningToProtect

Implementation of "Learning to Protect Communications with Adversarial Neural Cryptography" in PyTorch
Python
6
star
8

PS-latex-template

LaTeX template for PS description
TeX
5
star
9

tex-lecture

κ°•μ˜ - 텍: λ¬Έμ„œ μž‘μ—… 효율의 κ·ΉλŒ€ν™”. (2018.11.05 @ SNU)
4
star
10

norazo-lotto

λ‹ˆνŒ”μžμ•Όμ— λ‚˜μ˜¨ 둜또번호 비ꡐ
PHP
3
star
11

lipsum-seminar

Lecture Notes of Lorem Ipsum Seminar (2017 Summer)
Jupyter Notebook
3
star
12

tikzNN

Neural Net related illustrations using TikZ
TeX
3
star
13

dotfiles

짜깁기둜 μ‹œμž‘ν•˜λŠ” dotfiles
Shell
2
star
14

HR-Diagram

Software for drawing H-R Diagram
Python
2
star
15

SNU_physics_board_rss

SNU Physics Board RSS feed
Python
2
star
16

mediapipe_arch_vis

Collection of naive visualization of tflite models from MediaPipe
HTML
2
star
17

SunSpotTracker

Crawl images of the Sun from SDO HMII and track sunspots from them.
Python
2
star
18

voxceleb_tools

Shell
1
star
19

seungwonpark.github.io

Seung-won Park's homepage
SCSS
1
star
20

userscripts

Personal collection of userscripts (I use Tampermonkey)
JavaScript
1
star
21

tikz-gallery

My small TikZ gallery
1
star