• Stars
    star
    271
  • Rank 150,794 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 3 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

General Speech Restoration

arXiv Open In Colab PyPI version githubio

2021-11-06: I have just updated the code structure to make it easier to understand. It may have potential bug now. I will do some test training later.

2021-11-01: I will update the code and make it easier to use later.

VoiceFixer

VoiceFixer is a framework for general speech restoration. We aim at the restoration of severely degraded speech and historical speech.

Materials

Usage

Environment (Do this at first)

# Download dataset and prepare running environment
git clone https://github.com/haoheliu/voicefixer_main.git
cd voicefixer_main
source init.sh 

VoiceFixer for general speech restoration

Here we take VF_UNet(voicefixer with unet as analysis module) as an example.

  • Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json # you can modify the configuration file to personalize your training

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation

Automatic evaluation and generating .csv file on all testsets.

For example, if you like to evaluate on all testset (default).

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> 

For example, if you just wanna evaluate on GSR testset.

python3 eval_gsr_voicefixer.py  
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --testset  general_speech_restoration \ 
                    --description  general_speech_restoration_eval 

There are generally seven testsets you can pass to --testset:

  • base: all testset
  • clip: testset with speech that have clipping threshold of 0.1, 0.25, and 0.5
  • reverb: testset with reverberate speech
  • general_speech_restoration: testset with speech that contain all kinds of random distortions
  • enhancement: testset with noisy speech
  • speech_super_resolution: testset with low resolution speech that have sampling rate of 2kHz, 4kHz, 8kHz, 16kHz, and 24kHz.

And if you would like to evaluate on a small portion of data, e.g. 10 utterance. You can pass the number to --limit_numbers argument.

python3 eval_gsr_voicefixer.py  \
                    --config  <path-to-the-config-file> \
                    --ckpt  <path-to-the-checkpoint> \
                    --limit_numbers 10 

Evaluation results will be presented in the exp_results folder.

ResUNet for general speech restoration

  • Training
# pass in a configuration file to the training script
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation (similar to voicefixer evaluation)
    python3 eval_ssr_unet.py  
                        --config  <path-to-the-config-file> \
                        --ckpt  <path-to-the-checkpoint> \
                        --limit_numbers <int-test-only-on-a-few-utterance> \
                        --testset  <the-testset-you-want-to-use> \ 
                        --description  <describe-this-test>

ResUNet for single task speech restoration

  • Training

    • Denoising
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_denoising.json
    • Dereverberation
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_dereverberation.json
    • Super Resolution
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_super_resolution.json
    • Declipping
    # pass in a configuration file to the training script
    python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_declipping.json

You can checkout the logs directory for checkpoints, logging and validation results.

  • Evaluation (similar to voicefixer evaluation)
    python3 eval_ssr_unet.py  
                        --config  <path-to-the-config-file> \
                        --ckpt  <path-to-the-checkpoint> \
                        --limit_numbers <int-test-only-on-a-few-utterance> \
                        --testset  <the-testset-you-want-to-use> \ 
                        --description  <describe-this-test>

Citation

 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

real-life-example real-life-example real-life-example

More Repositories

1

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.
Python
2,310
star
2

AudioLDM2

Text-to-Audio/Music Generation
Python
2,187
star
3

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.
Python
963
star
4

voicefixer

General Speech Restoration
Python
952
star
5

audioldm_eval

This toolbox aims to unify audio generation model evaluation for easier comparison.
Python
275
star
6

AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.
Python
165
star
7

ssr_eval

Evaluation and Benchmarking of Speech Super-resolution Methods
Python
129
star
8

2021-ISMIR-MSS-Challenge-CWS-PResUNet

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.
Python
113
star
9

SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Python
111
star
10

Subband-Music-Separation

Pytorch: Channel-wise subband (CWS) input for better voice and accompaniment separation
Python
89
star
11

torchsubband

Pytorch implementation of subband decomposition
HTML
78
star
12

SemantiCodec

HTML
37
star
13

diffres-python

Learning differentiable temporal resolution on time-series data.
Python
30
star
14

DCASE_2022_Task_5

System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection
Python
27
star
15

ontology-aware-audio-tagging

Python
13
star
16

courseProject_Compiler

java implementation of NWPU Compiler course project-西工大编译原理-试点班
Java
13
star
17

Key-word-spotting-DNN-GRU-DSCNN

key word spotting GRU/DNN/DSCNN
Python
8
star
18

DM_courseProject

KNN Bayes 西北工业大学 NWPU 数据挖掘与分析
Python
6
star
19

netease_downloader

网易云音乐上以歌单为单位进行下载
Python
3
star
20

Channel-wise-Subband-Input

The demos of paper: Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
Jupyter Notebook
2
star
21

haoheliu.github.io

SCSS
1
star
22

demopage-NVSR

HTML
1
star
23

deepDecagon

Python
1
star
24

visa-monitor

实时监控可预约签证的时间,有更早的就邮件通知
Python
1
star
25

colab_collection

Jupyter Notebook
1
star
26

SatProj

西北工业大学应用综合实验
Python
1
star
27

demopage-voicefixer

Voicefixer is a speech restoration model that handles noise, reverberation, low resolution (2kHz~44.1kHz), and clipping (0.1-1.0 threshold) distortion simultaneously.
HTML
1
star
28

mushra_test_2024_April

1
star