2021-11-06: I have just updated the code structure to make it easier to understand. It may have potential bug now. I will do some test training later.
2021-11-01: I will update the code and make it easier to use later.
VoiceFixer
VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.
Materials
- Arxiv preprint: https://arxiv.org/abs/2109.13731
- Demo page contains comparison between single task speech restoration, general speech restoration, and voicefixer.
- We wrote a pip package for voicefixer.
- The dataset we use in this repo: training and testing datasets
Usage
Environment (Do this at first)
# Download dataset and prepare running environment
git clone https://github.com/haoheliu/voicefixer_main.git
cd voicefixer_main
source init.sh
VoiceFixer for general speech restoration
Here we take VF_UNet(voicefixer with unet as analysis module) as an example.
- Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json # you can modify the configuration file to personalize your training
You can checkout the logs directory for checkpoints, logging and validation results.
- Evaluation
Automatic evaluation and generating .csv file on all testsets.
For example, if you like to evaluate on all testset (default).
python3 eval_gsr_voicefixer.py \
--config <path-to-the-config-file> \
--ckpt <path-to-the-checkpoint>
For example, if you just wanna evaluate on GSR testset.
python3 eval_gsr_voicefixer.py
--config <path-to-the-config-file> \
--ckpt <path-to-the-checkpoint> \
--testset general_speech_restoration \
--description general_speech_restoration_eval
There are generally seven testsets you can pass to --testset:
- base: all testset
- clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
- reverb: testset with reverberate speech
- general_speech_restoration: testset with speech that contain all kinds of random distortions
- enhancement: testset with noisy speech
- speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.
And if you would like to evaluate on a small portion of data, e.g. 10 utterance. You can pass the number to --limit_numbers argument.
python3 eval_gsr_voicefixer.py \
--config <path-to-the-config-file> \
--ckpt <path-to-the-checkpoint> \
--limit_numbers 10
Evaluation results will be presented in the exp_results folder.
ResUNet for general speech restoration
- Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json
You can checkout the logs directory for checkpoints, logging and validation results.
- Evaluation (similar to voicefixer evaluation)
python3 eval_ssr_unet.py --config <path-to-the-config-file> \ --ckpt <path-to-the-checkpoint> \ --limit_numbers <int-test-only-on-a-few-utterance> \ --testset <the-testset-you-want-to-use> \ --description <describe-this-test>
ResUNet for single task speech restoration
-
Training
- Denoising
# pass in a configuration file to the training script python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_denoising.json
- Dereverberation
# pass in a configuration file to the training script python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_dereverberation.json
- Super Resolution
# pass in a configuration file to the training script python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_super_resolution.json
- Declipping
# pass in a configuration file to the training script python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_declipping.json
You can checkout the logs directory for checkpoints, logging and validation results.
- Evaluation (similar to voicefixer evaluation)
python3 eval_ssr_unet.py --config <path-to-the-config-file> \ --ckpt <path-to-the-checkpoint> \ --limit_numbers <int-test-only-on-a-few-utterance> \ --testset <the-testset-you-want-to-use> \ --description <describe-this-test>
Citation
@misc{liu2021voicefixer,
title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},
author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},
year={2021},
eprint={2109.13731},
archivePrefix={arXiv},
primaryClass={cs.SD}
}