crank
Non-parallel voice conversion based on vector-quantized variational autoencoder with adversarial learning
Setup
- Install Python dependency
$ git clone https://github.com/k2kobayashi/crank.git
$ cd crank/tools
$ make
- install dependency for mosnet
$ sudo apt install ffmpeg # mosnet dependency
Recipes
- English
- VCC2020
- VCC2018 (Thanks to @unilight)
- Japanese
- jsv_ver1
Conversion samples
You can access several converted audio samples of VCC 2018 dataset in the URL.
Run VCC2020 recipe
crank has prepared recipe for Voice Conversion Challenge 2020. In crank recipe, there are 7 stages to implement non-parallel voice conversion.
- stage 0
- download dataset
- stage 1
- initialization
- generate scp files and figures to be determine speaker-dependent parameters
- initialization
- stage 2
- feature extraction
- extract mlfb and mcep features
- feature extraction
- stage 3
- training
- stage 4
- reconstuction
- generate reconstructed feature for fine-tuning of neural vocoder
- reconstuction
- stage 5
- evaluation
- convert evaluation waveform
- evaluation
- stage 6
- synthesis
- synthesis waveform by pre-trained ParallelWaveGAN
- synthesis waveform by GriffinLim
- synthesis
- stage 7
- objective evalution
- mel-cepstrum distortion
- mosnet
- objective evalution
Put dataset to downloads
Note that dataset is only released for the participants (2020/05/26).
$ cd egs/vaevc/vcc2020v1
$ mkdir downloads && cd downloads
$ mv <path_to_zip>/vcc2020_{training,evaluation}.zip downloads
$ unzip vcc2020_training.zip
$ unzip vcc2020_evaluation.zip
Run feature extraction and model training
Because the challenge defines its training and evaluation set, we have initially put configuration files. So, you need to run from 2nd stage.
$ ./run.sh --n_jobs 10 --stage 2 --stop_stage 5
where the n_jobs
indicates the number of CPU cores used in the training.
Configuration
Configurations are defined in conf/mlfb_vqvae.yml
.
Followings are explanation of representative parameters.
- feature
When you create your own recipe, be carefull to set parameters for feature extraction such as fs
, fftl
, hop_size
, framems
, shiftms
, and mcep_alpha
. These parameters depend on sampling frequency.
- feat_type
You can choose feat_type
either mlfb
or mcep
.
If you choose mlfb
, the converted waveforms are generated by either GllifinLim vocoder or ParallelWaveGAN vocoder.
If you choose mcep
, the converted waveforms are generated by world vocoder (i.e., excitation generation and MLSA filtering).
- trainer_type
We support training with vqvae
, lsgan
, cyclegan
, and stargan
using same generator network.
vqvae
: default vqvae settinglsgan
: vqvae with adversarial learningcyclegan
: vqvae with adevesarial learning and cyclic constraintsstargan
: vqvae with adevesarial learning similar to cyclegan
Create your recipe
Copy recipe template
Please copy template directory to start creation of your recipe.
$ cp -r egs/vaevc/template egs/vaevc/<new_recipe>
$ cd egs/vaevc/<new_recipe>
Put .wav files
You need to put wav files appropriate directory.
You can choose either modifying download.sh
or putting wav files.
In either case, the wav files should be located in each speaker like following
<new_recipe>/downloads/wav/{spkr1, spkr2, ..., spkr3}/*.wav
.
If you modify downaload.sh
,
$ vim local/download.sh
If you put wav files,
$ mkdir downloads
$ mv <path_to_your_wav_directory> downloads/wav
$ touch downloads/.done
Run initialization
The initialization process generates kaldi-like scp files.
$ ./run.sh --stage 0 --stop_stage 1
Then you modify speaker-dependent parameters in conf/spkr.yml
using generated figures.
Page 20~22 in slide help you how to set these parameters.
Run feature extraction, train, reconstruction, and evaluation
After preparing configuration, you run it.
$ ./run.sh --stage 2 --stop_stage 7
Citation
Please cite this paper when you use crank.
K. Kobayashi, W-C. Huang, Y-C. Wu, P.L. Tobing, T. Hayashi, T. Toda,
"crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder",
Proc. ICASSP, 2021. (accepted)
Achknowledgements
Thank you @kan-bayashi for lots of contributions and encouragement helps.
Who we are
-
Kazuhiro Kobayashi @k2kobayashi [maintainer, design and development]
-
Wen-Chin Huang @unilight [maintainer, design and development]
-
Tomoki Toda [advisor]