WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN
Pritish Chandna, Merlijn Blaauw, Jordi Bonada, Emilia Gómez
Music Technology Group, Universitat Pompeu Fabra, Barcelona
This repository contains the source code for multi-voice singing voice synthesis
To install, clone the repository and use Installation
pip install -r requirements.txt
The main code is in the main.py file.
Training and inference
To use the WGANSing, you will have to download the model weights and place it in the log_dir directory, defined in config.py.
The NUS-48E dataset can be downloaded from here. Once downloaded, please change wav_dir_nus in config.py to the same directory that the dataset is in.
To prepare the data for use, please use prep_data_nus.py.
Once setup, you can run the following commands. To train the model:
python main.py -t
To synthesize a .lab file: Use
python main.py -e filename alternate_singer_name
If no alternate singer is given then the original singer will be used for synthesis. A list of valid singer names will be displayed if an invalid singer is entered.
You will also be prompted on wether plots showed be displayed or not, press y or Y to view plots.
The TITANX used for this research was donated by the NVIDIA Corporation. This work is partially supported by the Towards Richer Online Music Public-domain Archives Acknowledgments(TROMPA) (H2020 770376) European project.
[1] Duan, Zhiyan, et al. "The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech." 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 2013.
[2] Blaauw, Merlijn, and Jordi Bonada. "A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs." Applied Sciences 7.12 (2017): 1313.
[3] Blaauw, Merlijn, et al. “Data efficient voice cloning for neural singing synthesis,” in2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.