• Stars
    star
    111
  • Rank 314,510 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Pytorch implementation for the ZeroSpeech 2019 challenge.

ZeroSpeech 2019: TTS without T - Pytorch

Quick Start

Setup

  • Clone this repo: git clone [email protected]:andi611/ZeroSpeech-TTS-without-T.git
  • CD into this repo: cd ZeroSpeech-TTS-without-T

Installing dependencies

  1. Install Python 3.

  2. Install the latest version of Pytorch according to your platform. For better performance, install with GPU support (CUDA) if viable. This code works with Pytorch 0.4 and later.

Prepare data

  1. Download the ZeroSpeech dataset.

    • The English dataset:
    wget https://download.zerospeech.com/2019/english.tgz
    tar xvfz english.tgz -C data
    rm -f english.tgz
    
    • The Surprise dataset:
    wget https://download.zerospeech.com/2019/surprise.zip
    # Go to https://download.zerospeech.com  and accept the licence agreement 
    # to get the password protecting the archive
    unzip surprise.zip -d data
    rm -f surprise.zip
    
  2. After unpacking the dataset into ~/ZeroSpeech-TTS-without-T/data, data tree should look like this:

     |- ZeroSpeech-TTS-without-T
    	 |- data
    		 |- english
    			 |- train
    			 	|- unit
    			 	|- voice
    			 |- test
    		|- surprise
    			 |- train
    			 	|- unit
    			 	|- voice
    			 |- test
    
  3. Preprocess the dataset and sample model-ready index files:

    python3 main.py --preprocess —-remake
    

Usage

Training

  1. Train ASR-TTS autoencoder model for discrete linguistic units discovery:

    python3 main.py --train_ae
    

    Tunable hyperparameters can be found in hps/zerospeech.json. You can adjust these parameters and setting by editing the file, the default hyperparameters are recommended for this project.

  2. Train TTS patcher for voice conversion performance boosting:

    python3 main.py --train_p --load_model --load_train_model_name=model.pth-ae-400000
    
  3. Train TTS patcher with target guided adversarial training:

    python3 main.py --train_tgat --load_model --load_train_model_name=model.pth-ae-400000
    
  4. Monitor with Tensorboard (OPTIONAL)

    tensorboard --logdir='path to log dir'
    or
    python3 -m tensorboard.main --logdir='path to log dir'
    

Testing

  1. Test on a single speech::

    python3 main.py --test_single --load_test_model_name=model.pth-ae-200000
    
  2. Test on 'synthesis.txt' and generate resynthesized audio files::

    python3 main.py --test --load_test_model_name=model.pth-ae-200000
    
  3. Test on all the testing speech under test/ and generate encoding files::

    python3 main.py --test_encode --load_test_model_name=model.pth-ae-200000
    
  4. Add --enc_only if testing with ASR-TTS autoencoder only:

    python3 main.py --test_single --load_test_model_name=model.pth-ae-200000 --enc_only
    python3 main.py --test --load_test_model_name=model.pth-ae-200000 --enc_only
    python3 main.py --test_encode --load_test_model_name=model.pth-ae-200000 --enc_only
    

Switching between datasets

  1. Simply use --dataset=surprise to switch to the default alternative set, all paths are handled automatically if the data tree structure is placed as suggested. For example:
    python3 main.py --train_ae --dataset=surprise
    

Trained-Models

  1. We provide trained models as ckpt files, Donwload Link: bit.ly/ZeroSpeech2019-Liu
  2. Reload model for training:
    --load_train_model_name=model.pth-ae-400000-128-multi-1024-english
    
    (--ckpt_dir=./ckpt_english or --ckpt_dir=./ckpt_surprise by default).
  3. 2 ways to load model for testing:
    --load_test_model_name=model.pth-ae-400000-128-multi-1024-english (by name)
    --ckpt_pth=ckpt/model.pth-ae-400000-128-multi-1024-english (direct path)
    
  4. Care that hps/zerospeech.json needs to be set accordingly to the model you are loading. If a 128-multi-1024 model is being loaded, seg_len and enc_size should be set to 128 and 1024, respectively. If a ae model is being loaded, the argument --enc_only must be used when running main.py (See 4. in the Testing section).

Notes

  • This code includes all the settings and methods we've tested for this challenge, some of which did not suceess but we did not remove them from our code. However, the previous instructions and default settings are for the method we proposed. By running them one can easily reproduce our results.
  • TODO: upload pre-trained models

Citation

@article{Liu_2019,
   title={Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion},
   url={http://dx.doi.org/10.21437/interspeech.2019-2048},
   DOI={10.21437/interspeech.2019-2048},
   journal={Interspeech 2019},
   publisher={ISCA},
   author={Liu, Andy T. and Hsu, Po-chun and Lee, Hung-Yi},
   year={2019},
   month={Sep}
}

More Repositories

1

Mockingjay-Speech-Representation

Official Implementation of Mockingjay in Pytorch
Python
52
star
2

Conditional-SeqGAN-Tensorflow

Conditional Sequence Generative Adversarial Network trained with policy gradient, Implementation in Tensorflow
Python
48
star
3

Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.
Python
46
star
4

TTS-Tacotron-Pytorch

Pytorch implementation of Tacotron, a speech synthesis end-to-end generative TTS model.
Python
29
star
5

Hidden-Markov-Model-Digital-Signal-Processing

Discrete Hidden Markov Model (HMM) Implementation in C++
C++
25
star
6

CS-Tacotron-Pytorch

Pytorch implementation of CS-Tacotron, a code-switching speech synthesis end-to-end generative TTS model.
Python
23
star
7

DQN-Deep-Q-Network-Atari-Breakout-Tensorflow

Training a vision-based agent with the Deep Q Learning Network (DQN) in Atari's Breakout environment, implementation in Tensorflow.
Python
17
star
8

Conditional-SpecGAN-Tensorflow

Text-to-Speech Synthesis by Generating Spectrograms using Generative Adversarial Network
Python
10
star
9

Stock-Buy-Sell-Dynamic-Programming-FinTech

Find the best time to buy and sell stock with transaction fe using Dynamic Programming, implementation in Python.
Python
8
star
10

Pacman-With-AI-Python

Implementations of artificial intelligence agents that plays Pac-Man
Python
4
star
11

AC-Actor-Critic-SlitherIO-Tensorflow

Training a vision-based agent with the Actor Critic model in an online environment, implementation in Tensorflow.
Python
4
star
12

Dynamic-Programming-Algorithm

A program that generates the Bézier curve for a given set of points based on dynamic programming, Implementation in C++
C++
2
star
13

Character-Based-Language-Model

A character-based language model build with the SRILM toolkit, plus a Viterbi-based decoding process of the language model implemented in C++.
C++
2
star
14

Naive-Bayes-and-Decision-Tree-Classifiers

Naive Bayes and Decision Tree Classifiers implemented with Scikit-Learn and Graphviz visualization (Datasets - News, Mushroom, Income)
Python
2
star
15

andi611

andi611/andi611 is a ✨special ✨ repository for my GitHub profile!
1
star
16

Intersection-Manager-Algorithm

A 4 way Intersection Manager That Minimizes the Average Waiting Time of Each Car at a crossroad, Implementation in C++
C++
1
star
17

LibSVM-Classification

Performing classification tasks with the LibSVM toolkit on four different datasets: Iris, News, Abalone, and Income.
Java
1
star
18

HTK-Toolkit-Digital-Signal-Processing

Digital Signal Processing: HMM Training and Testing with the HTK toolkit
C
1
star
19

Ohlc-Extraction-FinTech

The goal of this work is to compute the OHLC (open, high, low, close) prices of 台指期 within a given date based on minute-based trading record, implementation in Python.
Python
1
star
20

Kaldi-LibriSpeech-fMLLR

This repository contains Kaldi recipes on the LibriSpeech corpora to extract fMLLR features
Shell
1
star