StarGAN-VC
This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.
The converted voice examples are in samples and results_2019-06-10 directory
Dependencies
- Python 3.6+
- pytorch 1.0
- librosa
- pyworld
- tensorboardX
- scikit-learn
Usage
Download dataset
Download the vcc 2016 dataset to the current directory
python download.py
The downloaded zip files are extracted to ./data/vcc2016_training
and ./data/evaluation_all
.
- training set: In the paper, the author choose four speakers from
./data/vcc2016_training
. So we move the corresponding folder(eg. SF1,SF2,TM1,TM2 ) to./data/speakers
. - testing set In the paper, the author choose four speakers from
./data/evaluation_all
. So we move the corresponding folder(eg. SF1,SF2,TM1,TM2 ) to./data/speakers_test
.
The data directory now looks like this:
data
โโโ speakers (training set)
โย ย โโโ SF1
โย ย โโโ SF2
โย ย โโโ TM1
โย ย โโโ TM2
โโโ speakers_test (testing set)
โย ย โโโ SF1
โย ย โโโ SF2
โย ย โโโ TM1
โย ย โโโ TM2
โโโ vcc2016_training (vcc 2016 training set)
โย ย โโโ ...
โโโ evaluation_all (vcc 2016 evaluation set, we use it as testing set)
โย ย โโโ ...
Preprocess
Extract features (mcep, f0, ap) from each speech clip. The features are stored as npy files. We also calculate the statistical characteristics for each speaker.
python preprocess.py
This process may take minutes !
Train
python main.py
Convert
python main.py --mode test --test_iters 200000 --src_speaker TM1 --trg_speaker "['TM1','SF1']"
Network structure
Note: Our implementation follows the original paperโs network structure, while pytorch StarGAN-VC code use StarGAN's network.Both can generate good audio quality.
Reference
Update 2019/06/10
The former implementation's network structure is the network of the original paper, but in order to achieve better conversion result, the following modifications are made in this update:
- Modification of classifier without training problem
- Update loss function
- Modify the discriminator activation function to tanh
If you feel this repo is good, please star !
Your encouragement is my biggest motivation!