StarGAN-Voice-Conversion-2
A pytorch implementation based on: StarGAN-VC2: https://arxiv.org/pdf/1907.12279.pdf.
- Uses source and target domain codes in D but not G as I found better quality output
- Doesnt make use of PS in G.
Installation
Tested on Python version 3.6.2 in a linux VM environment
Recommended to use a linux environment - not tested for mac or windows OS
Python
- Create a new environment using Anaconda
conda create -n stargan-vc python=3.6.2
- Install conda dependencies
conda install pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.1 -c pytorch
conda install pillow=5.4.1
conda install -c conda-forge librosa=0.6.1
conda install -c conda-forge tqdm=4.43.0
- Intall dependencies not available through conda using pip
pip install pyworld=0.2.8
pip install mcd=0.4
NB: For mac users who cannot install pyworld see: https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder
Libraries
- Install binaries
Usage
Download Datasets
Example with VCTK:
mkdir ../data/VCTK-Data
wget https://datashare.is.ed.ac.uk/bitstream/handle/10283/2651/VCTK-Corpus.zip?sequence=2&isAllowed=y
unzip VCTK-Corpus.zip -d ../data/VCTK-Data
If the downloaded VCTK is in tar.gz, run this:
tar -xzvf VCTK-Corpus.tar.gz -C ../data/VCTK-Data
Preprocessing data
We will use Mel-Cepstral coefficients(MCEPs) here.
Example script for VCTK data which we can resample to 22.05kHz. The VCTK dataset is not split into train and test wavs, so we perform a data split.
# VCTK-Data
python preprocess.py --perform_data_split y \
--resample_rate 22050 \
--origin_wavpath ../data/VCTK-Data/VCTK-Corpus/wav48 \
--target_wavpath ../data/VCTK-Data/VCTK-Corpus/wav22 \
--mc_dir_train ../data/VCTK-Data/mc/train \
--mc_dir_test ../data/VCTK-Data/mc/test \
--speakers p229 p232 p236 p243
Example Script for VCC2018 data which is already seperated into train and test wav folders and is already at 22.05kHz.
# VCC2018-Data
python preprocess.py --perform_data_splt n \
--target_wav_path_train ../data/VCC2018-Data/VCC2018-Corpus/wav22_train \
--target_wav_path_eval ../data/VCC2018-Data/VCC2018-Corpus/wav22_eval \
--mc_dir_train ../data/VCC2018-Data/mc/train \
--mc_dir_test ../data/VCC2018-Data/mc/test \
--speakers VCC2SF1 VCC2SF2 VCC2SM1 VCC2SM2
Training
Example script:
# example with VCTK
python main.py --train_data_dir ../data/VCTK-Data/mc/train \
--test_data_dir ../data/VCTK-Data/mc/test \
--wav_dir ../data/VCTK-Data/VCTK-Corpus/wav22 \
--model_save_dir ./models/experiment_name \
--sample_dir ./samples/experiment_name \
--speakers p229 p232 p236 p243
If you encounter an error such as:
ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found
You may need to export export LD_LIBRARY_PATH: (See Stack Overflow)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/<PATH>/<TO>/<YOUR>/.conda/envs/<ENV>/lib/
Conversion
For example: restore model at step 90000 and specify the speakers
# example with VCTK
python convert.py --resume_model 90000 \
--speakers p229 p232 p236 p243 \
--train_data_dir ../data/VCTK-Data/mc/train/ \
--test_data_dir ../data/VCTK-Data/mc/test/ \
--wav_dir ../data/VCTK-Data/VCTK-Corpus/wav22 \
--model_save_dir ./models/experiment_name \
--convert_dir ./converted/experiment_name
TODO:
- Include converted samples
- Include s-t loss like original paper (NB: not exactly the same, see top of this README)