• Stars
    star
    524
  • Rank 83,943 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Voice Converter Using CycleGAN and Non-Parallel Data

Voice Converter CycleGAN

Lei Mao

University of Chicago

Introduction

Cycle-consistent adversarial networks (CycleGAN) has been widely used for image conversions. It turns out that it could also be used for voice conversion. This is an implementation of CycleGAN on human speech conversions. The neural network utilized 1D gated convolution neural network (Gated CNN) for generator, and 2D Gated CNN for discriminator. The model takes Mel-cepstral coefficients (MCEPs) (for spectral envelop) as input for voice conversions.

Dependencies

  • Python 3.5
  • Numpy 1.14
  • TensorFlow 1.8
  • ProgressBar2 3.37.1
  • LibROSA 0.6
  • FFmpeg 4.0
  • PyWorld

Files

.
โ”œโ”€โ”€ convert.py
โ”œโ”€โ”€ demo
โ”œโ”€โ”€ download.py
โ”œโ”€โ”€ figures
โ”œโ”€โ”€ LICENSE.md
โ”œโ”€โ”€ model.py
โ”œโ”€โ”€ module.py
โ”œโ”€โ”€ preprocess.py
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ train_log
โ”œโ”€โ”€ train.py
โ””โ”€โ”€ utils.py

Usage

Docker Container

Build the Docker container image using the following command.

$ docker build --rm -t tensorflow-cyclegan-vc:1.0 -f Dockerfile .

Start the Docker container for CycleGAN-VC using the following command.

$ nvidia-docker run -it --rm -v $(pwd):/mnt tensorflow-cyclegan-vc:1.0

Because the model was implemented using TensorFlow 1.8, there could be some warnings due to function deprecations when running the programs.

Download Dataset

Download and unzip VCC2016 dataset to designated directories.

$ python download.py --help
usage: download.py [-h] [--download_dir DOWNLOAD_DIR] [--data_dir DATA_DIR]
                   [--datasets DATASETS]

Download CycleGAN voice conversion datasets.

optional arguments:
  -h, --help            show this help message and exit
  --download_dir DOWNLOAD_DIR
                        Download directory for zipped data
  --data_dir DATA_DIR   Data directory for unzipped data
  --datasets DATASETS   Datasets available: vcc2016

For example, to download the datasets to download directory and extract to data directory:

$ python download.py --download_dir ./download --data_dir ./data --datasets vcc2016

Train Model

To have a good conversion capability, the training would take at least 1000 epochs, which could take very long time even using a NVIDIA GTX TITAN X graphic card.

$ python train.py --help
usage: train.py [-h] [--train_A_dir TRAIN_A_DIR] [--train_B_dir TRAIN_B_DIR]
                [--model_dir MODEL_DIR] [--model_name MODEL_NAME]
                [--random_seed RANDOM_SEED]
                [--validation_A_dir VALIDATION_A_DIR]
                [--validation_B_dir VALIDATION_B_DIR]
                [--output_dir OUTPUT_DIR]
                [--tensorboard_log_dir TENSORBOARD_LOG_DIR]

Train CycleGAN model for datasets.

optional arguments:
  -h, --help            show this help message and exit
  --train_A_dir TRAIN_A_DIR
                        Directory for A.
  --train_B_dir TRAIN_B_DIR
                        Directory for B.
  --model_dir MODEL_DIR
                        Directory for saving models.
  --model_name MODEL_NAME
                        File name for saving model.
  --random_seed RANDOM_SEED
                        Random seed for model training.
  --validation_A_dir VALIDATION_A_DIR
                        Convert validation A after each training epoch. If set
                        none, no conversion would be done during the training.
  --validation_B_dir VALIDATION_B_DIR
                        Convert validation B after each training epoch. If set
                        none, no conversion would be done during the training.
  --output_dir OUTPUT_DIR
                        Output directory for converted validation voices.
  --tensorboard_log_dir TENSORBOARD_LOG_DIR
                        TensorBoard log directory.

For example, to train CycleGAN model for voice conversion between SF1 and TM1:

$ python train.py --train_A_dir ./data/vcc2016_training/SF1 --train_B_dir ./data/vcc2016_training/TM1 --model_dir ./model/sf1_tm1 --model_name sf1_tm1.ckpt --random_seed 0 --validation_A_dir ./data/evaluation_all/SF1 --validation_B_dir ./data/evaluation_all/TM1 --output_dir ./validation_output --tensorboard_log_dir ./log

With validation_A_dir, validation_B_dir, and output_dir set, we could monitor the conversion of validation voices after each epoch using our bare ear.

Voice Conversion

Convert voices using pre-trained models.

$ python convert.py --help
usage: convert.py [-h] [--model_dir MODEL_DIR] [--model_name MODEL_NAME]
                  [--data_dir DATA_DIR]
                  [--conversion_direction CONVERSION_DIRECTION]
                  [--output_dir OUTPUT_DIR]

Convert voices using pre-trained CycleGAN model.

optional arguments:
  -h, --help            show this help message and exit
  --model_dir MODEL_DIR
                        Directory for the pre-trained model.
  --model_name MODEL_NAME
                        Filename for the pre-trained model.
  --data_dir DATA_DIR   Directory for the voices for conversion.
  --conversion_direction CONVERSION_DIRECTION
                        Conversion direction for CycleGAN. A2B or B2A. The
                        first object in the model file name is A, and the
                        second object in the model file name is B.
  --output_dir OUTPUT_DIR
                        Directory for the converted voices.

To convert voice, put wav-formed speeches into data_dir and run the following commands in the terminal, the converted speeches would be saved in the output_dir:

$ python convert.py --model_dir ./model/sf1_tm1 --model_name sf1_tm1.ckpt --data_dir ./data/evaluation_all/SF1 --conversion_direction A2B --output_dir ./converted_voices

The convention for conversion_direction is that the first object in the model filename is A, and the second object in the model filename is B. In this case, SF1 = A and TM1 = B.

Demo

VCC2016 SF1 and TF2 Conversion

In the demo directory, there are voice conversions between the validation data of SF1 and TF2 using the pre-trained model.

200001_SF1.wav and 200001_TF2.wav are real voices for the same speech from SF1 and TF2, respectively.

200001_SF1toTF2.wav and 200001_TF2.wav are the converted voice using the pre-trained model.

200001_SF1toTF2_author.wav is the converted voice from the NTT website for comparison with our model performance.

The conversion performance is extremely good and the converted speech sounds real to me.

Download the pre-trained SF1-TF2 conversion model and conversion of all the validation samples from Google Drive.

Reference

  • Takuhiro Kaneko, Hirokazu Kameoka. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. 2017. (Voice Conversion CycleGAN)
  • Wenzhe Shi, Jose Caballero, Ferenc Huszรกr, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. 2016. (Pixel Shuffler)
  • Yann Dauphin, Angela Fan, Michael Auli, David Grangier. Language Modeling with Gated Convolutional Networks. 2017. (Gated CNN)
  • Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino. Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. 2017. (1D Gated CNN)
  • Kun Liu, Jianping Zhang, Yonghong Yan. High Quality Voice Conversion through Phoneme-based Linear Mapping Functions with STRAIGHT for Mandarin. 2007. (Foundamental Frequnecy Transformation)
  • PyWorld and SPTK Comparison
  • Gated CNN TensorFlow

To-Do List

  • Parallelize data preprocessing
  • Evaluation metrics
  • Hyper parameter tuning
  • Train more conversion models
  • Argparse

More Repositories

1

Frozen-Graph-TensorFlow

Save, Load Frozen Graph and Run Inference From Frozen Graph in TensorFlow 1.x and 2.x
Python
299
star
2

ONNX-Runtime-Inference

ONNX Runtime Inference C++ Example
C++
215
star
3

DeepLab-V3

Google DeepLab V3 for Image Semantic Segmentation
Python
146
star
4

Particle-Filter

Robot Localization in Maze Using Particle Filter
Python
123
star
5

PyTorch-Quantization-Aware-Training

PyTorch Quantization Aware Training Example
Python
119
star
6

Console-Snake

Snake Game in Console Implemented Using C++
C++
114
star
7

CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization
Cuda
110
star
8

Two-Layer-Hierarchical-Softmax-PyTorch

Two-Layer Hierarchical Softmax Implementation for PyTorch
Python
68
star
9

Singing-Voice-Separation-RNN

Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks
Python
60
star
10

Rotated-Rectangle-Crop-OpenCV

Rotated Rectangle Crop Function for OpenCV
Python
52
star
11

PyTorch-Pruning-Example

PyTorch Pruning Example
Python
46
star
12

gRPC-Examples

gRPC Beginner's C++ Examples with CMake
C++
45
star
13

Sampled-Softmax-PyTorch

Sampled Softmax Implementation for PyTorch
Python
43
star
14

PyTorch-Static-Quantization

PyTorch Static Quantization Example
Python
39
star
15

LibTorch-ResNet-CIFAR

ResNet Implementation, Training, and Inference Using LibTorch C++ API
C++
34
star
16

Docker-WeChat

Run WeChat Using Docker
Dockerfile
34
star
17

Protocol-Buffer-Examples

Google Protocol Buffer 3.0 Beginner's C++ and Python Examples with CMake
Python
31
star
18

CPP-Debug-Docker

Debug C/C++ Programs In Docker
C++
26
star
19

Wine-Docker-Image

Wine Docker Image to Run Windows Applications
Dockerfile
23
star
20

TensorRT-Custom-Plugin-Example

Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
C++
23
star
21

Simple-Inference-Server

Inference Server Implementation from Scratch for Machine Learning Models
Python
23
star
22

Siamese_Network_MNIST

Siamese Network on MNIST Dataset
Python
17
star
23

Boost-Docker

Docker for Boost C++ Library
Dockerfile
17
star
24

Nsight-Systems-Docker-Image

Nsight Systems in Docker
Dockerfile
17
star
25

Sphinx-CPP-TriangleLib

Using Sphinx to Document C++ Libraries
C++
15
star
26

ONNX-Python-Examples

ONNX Python Examples
Dockerfile
15
star
27

PPMIO

PPM Image I/O Library for C/C++
C++
14
star
28

What-Is-The-Date-Today

GitHub Self-Updating Repository Using GitHub Scheduled Actions
12
star
29

OpenAI_Gym_AI

These are my learning algorithm solutions to OpenAI Gym environments.
Python
11
star
30

PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration

TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models
Python
11
star
31

Nsight-Compute-Docker-Image

Nsight Compute in Docker
Dockerfile
11
star
32

Tensorflow_Assignment_Solutions

These are my solutions to all six assignments of tensorflow tutorial in Udacity, covering CNN, RNN, Regularization (L2 and dropout), Embeddings (word2vec) and Seq2Seq LSTM (bigrams prediction and sequence mirror)
Jupyter Notebook
10
star
33

Doxygen-CPP-TriangleLib

Using Doxygen to Document C++ Libraries
C++
10
star
34

Image-Converter-CycleGAN

Image Converter Using CycleGAN and Non-Parallel Data
Python
9
star
35

PyTorch-Dynamic-Quantization

PyTorch Dynamic Quantization Example
Python
9
star
36

Ramachandran

Ramachandran Plot Tool
Python
8
star
37

CPP-Treasure-Bay

A Collection of C++ Basic and Advanced Usages
C++
7
star
38

TensorFlow_Dataset_API_Demo

Efficient Dataset Loading Using TensorFlow Dataset API
Python
6
star
39

Flappy_Bird_AI

Flappy Bird Artificial Intelligence
Python
5
star
40

DockerFiles

Collection of Dockerfiles Used for Different Purposes
Dockerfile
5
star
41

Console-Player

Audio Player in Console Implemented Using C/C++
C++
5
star
42

TensorRT-Docker-Image

TensorRT in Docker
Dockerfile
5
star
43

Pixabay-AsyncIO-Download

Download Pixabay Images Using Python AsyncIO
Python
5
star
44

PyTorch-Automatic-Mixed-Precision-Training

PyTorch Automatic Mixed Precision Training Example
Python
5
star
45

Dota2-Profile-Pictures

Dota2ๅไธปๆ’ญๅคดๅƒ๏ผŒ้ƒจๅˆ†็จ€ๆœ‰๏ผŒๆ‰Žๅฎžใ€‚
Shell
4
star
46

Auto-Red-Eye-Remover

Automatic Red Eye Removal From Photos Using C++ and OpenCV
C++
4
star
47

Drunken-Sniper

Modified 64.0 Game Implemented Using Jack Language and Run in Nand2Tetris VM Emulator
HTML
4
star
48

Convolutional_Neural_Network_CIFAR10

Object-Oriented Convolutional Neural Network for CIFAR10 Dataset
Python
4
star
49

Sphinx-Python-TriangleLib

Using Sphinx to Document Python Libraries
Python
4
star
50

Intelligent_Mouse

Maze Explorer and Solver
Jupyter Notebook
3
star
51

CMake-Examples

CMake Examples for Noobs
CMake
3
star
52

Heroku-Docker-Example

Deploy Docker Applications on Heroku
Python
3
star
53

ABI-Breaking-Change-Demo

ABI Breaking Change Demo
C++
3
star
54

Image_Resizer

Image Resizer Web App Coded in Python
Python
2
star
55

leimao

2
star
56

PNGs_to_GIF

This is a script that transforms png-formatted images to gif-formatted animated image.
Python
2
star
57

Docker-Chess

Playing Chess Using SCID-VS-PC in Docker Container
Dockerfile
2
star
58

CPP-Public-Header-Abstract-Class-Declarations-Example

Using Abstract Class Declarations for Hiding Private Methods and Members
C++
2
star
59

Decision_Tree_Python

Efficient Implementation of Decision Tree from Scratch in Python
Python
2
star
60

Directory-Tree-Printer

Useful Directory Tree Printer
C++
2
star
61

Blog-Images

Hosting Images for Lei Mao's Log Book
JavaScript
2
star
62

MPCS_Programming_Placement_Exam

These are my solutions to the programming placement exam of the MPCS program at The University of Chicago.
HTML
2
star
63

Di-Tech_Algorithm_Challenge_2016

Didi Taxi is a mobile platform taxi-calling application in China. It has acquired Uber's China operations in 2016 with the capital from Apple. It hosted an machine learning algorithm contest in 2016 in order to solve the demand-supply problem of Didi taxi. I took part in the contest in 2016 when I was still studying at Duke University. This is my solution to the problem. My teammates also includes Guotu Li from Duke University and Jianhai Zhang from Dalian University of Technology.
Jupyter Notebook
2
star
64

UD282_Java_Programming_Basics

Assignment Solutions for UD282 Java Programming Basics at Udacity
Java
1
star
65

PyTorch-Automatic-Differentiation

PyTorch Automatic Differentiation Forward Mode and Reverse Mode Using autograd and functorch
Python
1
star
66

WikiMidas

Wikipedia Data Crawler
Python
1
star
67

TTIC_Deep_Learning_2018_Pareto_Competition

2018 Fundamentals of Deep Learning (TTIC 31230) Course Project - Language Modeling
Python
1
star
68

Thinkpad_Manuals

Thinkpad Manuals
1
star
69

Automated_Environment_Configurations

Environment Configuration Bash Scripts for New Operating Systems
Shell
1
star
70

Audience_SQL_Database

Audience SQL Database is a SQLite3 database designed for the management of the information of the VIP audience of China Dalian TV.
Python
1
star
71

Deep_Learning_Papers

Library of Deep Learning Papers
1
star
72

Image_Blurring_CUDA

RGB Image Blurring Using CUDA and CMake
Cuda
1
star
73

Stanford_CS229_Machine_Learning

Stanford CS229 Machine Learning Course Materials @ http://cs229.stanford.edu/materials.html
MATLAB
1
star
74

Logistic_Regression_Python

Efficient Implementation of Logistic Regression from Scratch in Python
Python
1
star
75

Document_Clustering_and_Retrieval

This is an unsupervised learning project to cluster and retrieve documents of similar topics. This is also my solution to the Document Clustering and Retrieval project in Machine Learning Specialization Courses at Coursera.
Jupyter Notebook
1
star
76

Semantic-Segmentation-CARLA

Semantic Segmentation Using CARLA Synthetic Dataset
Python
1
star
77

Data_Structures_and_Algorithms

These are the solutions to the assignments in Coursera Data Structures and Algorithms Specialization.
C++
1
star
78

MoeGirlMidas

MoeGirlMidas is a spidering program to crawl fictional characters' data from MoeGirl (https://zh.moegirl.org/Mainpage).
Jupyter Notebook
1
star
79

SPIEC_EASI_Analysis

This is a R and Python script to analyze the associations between different microbiome taxonomies using SPIEC-EASI.
Jupyter Notebook
1
star
80

PyTorch-Variational-Autoencoder

PyTorch Variational Autoencoder Example
Python
1
star
81

XLSX_to_CSVs

Transform the worksheets in Excel xlsl file to multiple csv files.
Python
1
star
82

Python_for_Everybody_Capstone

This is the capstone projects for the "Python for Every Body" Coursera Specialization. It includes some basic codes for data crawling from the web and data visualization using Python.
JavaScript
1
star