• Stars
    star
    149
  • Rank 247,131 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

StoRM: A Stochastic Regeneration Model for Speech Enhancement and Dereverberation

StoRM inference process on a spectrogram. A predictive model is first used to get a estimate of the clean speech, with some possible distortions and resiudla noise. The diffusion generative model then uses this estimate as the initial point for a reverse process learns to generate clean speech in an iterative fashion starting from the corrupted signal xT.

This repository contains the official PyTorch implementation for the paper:

Audio examples and supplementary materials are available on our project page.

Installation

  • Create a new virtual environment with Python 3.8 (we have not tested other Python versions, but they may work).
  • Install the package dependencies via pip install -r requirements.txt.
  • Your logs will be stored as local TensorBoard logs. Run tensorboard --logdir logs/ to see them.

Pretrained checkpoints

  • We provide pretrained checkpoints for the models trained on TIMIT+Chime3 (enhancement), WSJ0+Chime3 (enhancement), Voicebank/DEMAND (enhancement) and WSJ0+Reverb (dereverberation), as in the original paper [1]. We also included the checkpoints for WSJ0+Wind as in [3]. All checkpoints can be downloaded here.

Usage:

  • For resuming training, you can use the --resume_from_checkpoint option of train.py.
  • For evaluating these checkpoints, use the --ckpt option of enhancement.py (see section Evaluation below).

Training

Training is done by executing train.py. A minimal running example with default settings (as in our paper [2]) can be run with

python train.py --format <your_format>--base_dir <your_base_dir> --gpus 0,

where

  • your_base_dir should be a path to a folder containing subdirectories train/ and valid/ (optionally test/ as well). The subdirectory structure depends on your_format:
    • your_format=wsj0: Each subdirectory must itself have two subdirectories clean/ and noisy/, with the same filenames present in both.
    • Add formats on your own, correpsonding to your data structure

To see all available training options, run python train.py --help. These include options for the backbone DNN, the SDE parameters, the PytorchLightning Trainer usual parameters such as max_epochs, limit_train_batches and so on.

Note:

  • This paper [1] uses a lighter configuration of the NCSN++ backbone with 27.8M parameters, which is passed with --backbone ncsnpp by default. By contrast, the architecture used in the paper [2] uses --backbone ncsnpp-large which is the baseline 65M parameters NCSN++.

Evaluation

To evaluate on a test set, run

python enhancement.py --test_dir <your_test_dir> --enhanced_dir <your_enhanced_dir> --ckpt <path_to_model_checkpoint>

to generate the enhanced .wav files. The --cpkt parameter of enhancement.py should be the path to a trained model checkpoint, as stored by the logger in logs/.

Data Creation

  • In preprocessing/, you will find the data generation script used to create all the datasets used in the paper. Minimal example is:
    cd preprocessing;
    python3 create_data.py --task <your_task> --speech <your_speech_format> --noise <your_noise_data>

Please check the script for other options

  • For the wind noise generation scripts and non-linear mixing technique presented in [3], we refer the reader to [4] and suggest asking the authors about their wind noise generator code. We only provide here the script for parsing the commands to that generator + the non-linar mixing method. We are not responsible for distribution of the code by [4].

Citations / References

We kindly ask you to cite our papers in your publication when using any of our research or code:

@article{lemercier2023storm,
  author={Lemercier, Jean-Marie and Richter, Julius and Welker, Simon and Gerkmann, Timo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation}, 
  year={2023},
  volume={31},
  number={},
  pages={2724-2737},
  doi={10.1109/TASLP.2023.3294692}}
@inproceedings{lemercier2023wind,
  author={Lemercier, Jean-Marie and Thiemannm, Joachim and Konig, Raphael and Gerkmann, Timo},
  booktitle={VDE 15th ITG conference on Speech Communication}, 
  title={Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model}, 
  year={2023}}

[1] Jean-Marie Lemercier, Julius Richter, Simon Welker, and Timo Gerkmann. "StoRM: A Stochastic Regeneration Model for Speech Enhancement And Dereverberation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2724-2737, 2023.

[2] Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay and Timo Gerkmann. "Speech Enhancement and Dereverberation with Diffusion-Based Generative Models", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351-2364, 2023.

[3] Jean-Marie Lemercier, Joachim Thiemann, Raphael Konig and Timo Gerkmann. "Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model", ITG Speech Communication, Aachen, Germany, 2023

[4] D. Mirabilii et al. "Simulating wind noise with airflow speed-dependent characteristics,โ€ Int. Workshop on Acoustic Signal Enhancement, Aachen, Germany, 2022

More Repositories

1

sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
Python
448
star
2

mp-gtf

Multi-Phase Gammatone Filterbank (MP-GTF) construction for Python
Python
46
star
3

deep-non-linear-filter

Python
39
star
4

dual-path-rnn

Dual-Path RNN for Single-Channel Speech Separation (in Keras-Tensorflow2)
Python
34
star
5

buddy

BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
Python
30
star
6

sgmse-bbed

TODO
Python
28
star
7

uncertainty-SE

Python
16
star
8

stcn-nmf

VAE and STCN with NMF for single-channel speech enhancement
Python
13
star
9

sgmse_crp

Python
13
star
10

derevdps

Python
12
star
11

diffphase

DiffPhase: Generative Diffusion-based STFT Phase Retrieval
Python
10
star
12

ears_benchmark

Generation scripts for EARS-WHAM and EARS-Reverb
Python
9
star
13

guided-vae-nmf

This is the repository of the paper
Jupyter Notebook
7
star
14

audio-visual-vad

Re-implementation of the paper "An End-to-End Multimodal Voice Activity Detection Using WaveNet Encoder and Residual Networks"
Python
6
star
15

disentangled-vae

Repository for the paper "Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement".
Python
6
star
16

driftrec

DriftRec: Adapting diffusion models to blind image restoration tasks
Python
4
star
17

WSJ0-mixK-Dataset-Creation

MATLAB
4
star
18

label-uncertainty-ser

Python
3
star
19

livepty

Live Iterative Ptychography with projection-based algorithms
Python
3
star
20

av-phoneme

Continous Phoneme Recognition based on Audio-Visual Modality Fusion
Python
2
star
21

ears_dataset

HTML
2
star
22

2sderev

Two-stage Dereverberation Algorithm using DNN-supported multi-channel linear filtering and single-channel non-linear post-filtering
Python
2
star