• Stars
    star
    989
  • Rank 46,300 (Top 1.0 %)
  • Language
  • License
    GNU General Publi...
  • Created over 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

speech enhancement\speech seperation\sound source localization

Awesome Speech Enhancement

This repository summarizes the papers, codes, and tools for single-/multi-channel speech enhancement/speech separation. Welcome to pull requests.

Contents

Speech_Enhancement

alt Speech Enhancement Tree

Magnitude spectrogram

spectral masking

  • 2014, On Training Targets for Supervised Speech Separation, Wang. [Paper]
  • 2018, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement, Valin. [Paper] [RNNoise] [RNNoise16k]
  • 2020, A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech, Valin. Paper [PercepNet]
  • 2020, Online Monaural Speech Enhancement using Delayed Subband LSTM, Li. [Paper]
  • 2020, FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement, Hao. [Paper] [FullSubNet]
  • 2020, Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement, Xia. [Paper] [NSNet]
  • 2020, RNNoise-like fixed-point model deployed on Microcontroller using NNoM inference framework [example] [NNoM]
  • 2021, RNNoise-Ex: Hybrid Speech Enhancement System based on RNN and Spectral Features. [Paper] [RNNoise-Ex]
  • Other IRM-based SE repositories: [IRM-SE-LSTM] [nn-irm] [rnn-se] [DL4SE]

spectral mapping

  • 2014, An Experimental Study on Speech Enhancement Based on Deep Neural Networks, Xu. [Paper]

  • 2014, A Regression Approach to Speech Enhancement Based on Deep Neural Networks, Xu. [Paper] [sednn] [DNN-SE-Xu] [DNN-SE-Li]

  • Other DNN magnitude spectrum mapping-based SE repositories: [SE toolkit] [TensorFlow-SE] [UNetSE]

  • 2015, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, Weninger. [Paper]

  • 2016, A Fully Convolutional Neural Network for Speech Enhancement, Park. [Paper] [CNN4SE]

  • 2017, Long short-term memory for speaker generalizationin supervised speech separation, Chen. [Paper]

  • 2018, A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement, Tan. [Paper] [CRN-Tan]

  • 2018, Convolutional-Recurrent Neural Networks for Speech Enhancement, Zhao. [Paper] [CRN-Hao]

Complex domain

  • 2017, Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, Fu. [Paper]

  • 2017, Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising, Williamson. [Paper]

  • 2019, PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network, Yin. [Paper] [PHASEN]

  • 2019, Phase-aware Speech Enhancement with Deep Complex U-Net, Choi. [Paper] [DC-UNet]

  • 2020, Learning Complex Spectral Mapping With GatedConvolutional Recurrent Networks forMonaural Speech Enhancement, Tan. [Paper] [GCRN]

  • 2020, DCCRN: Deep Complex Convolution Recurrent Network for Phase-AwareSpeech Enhancement, Hu. [Paper] [DCCRN]

  • 2020, T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement, Kim. [Paper]

  • 2020, Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net, Choi. [Paper]

  • 2021, DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement, Le. [Paper] [DPCRN]

  • 2021, Real-time denoising and dereverberation with tiny recurrent u-net, Choi. [Paper]

  • 2021, DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement, Lv [Paper]

  • 2022, FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement, Chen [Paper] [FullSubNet+]

  • 2022, Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement, Yu [Paper]

Time domain

  • 2018, Improved Speech Enhancement with the Wave-U-Net, Macartney. [Paper] [WaveUNet]
  • 2019, A New Framework for CNN-Based Speech Enhancement in the Time Domain, Pandey. [Paper]
  • 2019, TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain, Pandey. [Paper]
  • 2020, Real Time Speech Enhancement in the Waveform Domain, Defossez. [Paper] [facebookDenoiser]
  • 2020, Monaural speech enhancement through deep wave-U-net, Guimarães. [Paper] [SEWUNet]
  • 2020, Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis, Ali. [Paper]
  • 2020, Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in the Time Domain, Pandey. [Paper] [DDAEC]
  • 2021, Dense CNN With Self-Attention for Time-Domain Speech Enhancement, Pandey. [Paper]
  • 2021, Dual-path Self-Attention RNN for Real-Time Speech Enhancement, Pandey. [Paper]
  • 2022, Speech Denoising in the Waveform Domain with Self-Attention, Kong. [Paper]

Generative Model

GAN

  • 2017, SEGAN: Speech Enhancement Generative Adversarial Network, Pascual. [Paper] [SEGAN]
  • 2019, SERGAN: Speech enhancement using relativistic generative adversarial networks with gradient penalty, Deepak Baby. [Paper] [SERGAN]
  • 2019, MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement, Fu. [Paper] [MetricGAN]
  • 2019, MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement, Fu. [Paper] [MetricGAN+]
  • 2020, HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks, Su. [Paper] [HifiGAN]
  • 2022, CMGAN: Conformer-Based Metric GAN for Monaural Speech Enhancement, Abdulatif, Cao & Yang. [Paper] [CMGAN]

Flow

  • 2021, A Flow-based Neural Network for Time Domain Speech Enhanceent, Strauss & Edler. [Paper]

VAE

  • 2018, A variance modeling framework based on variational autoencoders for speech enhancement, [Leglaive]. [Paper] [mlsp]
  • 2020, Speech Enhancement with Stochastic Temporal Convolutional Networks, Richter. [Paper] [STCN-NMF]

Diffusion Model

  • 2022, Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain, Welker. [Paper] [SGMSE]
  • 2022, StoRM: A Stochastic Regeneration Model for Speech Enhancement And Dereverberation, Lemercier. [Paper] [StoRM]
  • 2022, Conditional Diffusion Probabilistic Model for Speech Enhancement, Lu. [Paper] [CDiffuSE]
  • 2023, Speech Enhancement and Dereverberation with Diffusion-Based Generative Models, Richter. [Paper] [SGMSE]

Hybrid SE

  • 2019, Deep Xi as a Front-End for Robust Automatic Speech Recognition, Nicolson. [Paper] [DeepXi]
  • 2019, Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep-Learning-Based Speech Enhancement, Li. [Paper] [SE-MLC]
  • 2020, Deep Residual-Dense Lattice Network for Speech Enhancement, Nikzad. [Paper] [RDL-SE]
  • 2020, DeepMMSE: A Deep Learning Approach to MMSE-based Noise Power Spectral Density Estimation, Zhang. [Paper]
  • 2020, Speech Enhancement Using a DNN-Augmented Colored-Noise Kalman Filter, Yu. [Paper] [DNN-Kalman]

Decoupling-style

  • 2020, A Recursive Network with Dynamic Attention for Monaural Speech Enhancement, Li. [Paper] [DARCN]
  • 2020, Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise, Hao. [Paper]
  • 2020, A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement, Du. [Paper]
  • 2020, Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression, Westhausen. [Paper] [DTLN]
  • 2020, Listening to Sounds of Silence for Speech Denoising, Xu. [Paper] [LSS]
  • 2021, ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network, Li. [Paper]
  • 2022, Glance and Gaze: A Collaborative Learning Framework for Single-channel Speech Enhancement, Li [Paper]
  • 2022, HGCN : harmonic gated compensation network for speech enhancement, Wang. [Paper]
  • 2022, Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation, Fu. [Paper] [Uformer]
  • 2022, DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio, Schröter. [Paper] [DeepFilterNet]
  • 2021, Multi-Task Audio Source Separation, Zhang. [Paper] [Code]

Data collection

Loss

Challenge

Other repositories

  • Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement [Link]
  • nanahou's awesome speech enhancement [Link]

Dereverberation

Traditional method

Hybrid method

NN-based Derev

Speech Separation (single channel)

  • Tutorial speech separation, like awesome series [Link]

NN-based separation

  • 2015, Deep-Clustering:Discriminative embeddings for segmentation and separation, Hershey and Chen.[Paper] [Code] [Code] [Code]
  • 2016, DANet:Deep Attractor Network (DANet) for single-channel speech separation, Chen.[Paper] [Code]
  • 2017, Multitalker speech separation with utterance-level permutation invariant training of deep recurrent, Yu.[Paper] [Code]
  • 2018, LSTM_PIT_Speech_Separation [Code]
  • 2018, Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo.[Paper] [Code]
  • 2019, Conv-TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation, Luo.(Paper) [Code]
  • 2019, Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo.[Paper] [Code1] [Code2]
  • 2019, TAC end-to-end microphone permutation and number invariant multi-channel speech separation, Luo.[Paper] [Code]
  • 2020, Continuous Speech Separation with Conformer, Chen.[Paper] [Code]
  • 2020, Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation, Chen.[Paper] [Code]
  • 2020, Wavesplit: End-to-End Speech Separation by Speaker Clustering, Zeghidour.[Paper]
  • 2021, Attention is All You Need in Speech Separation, Subakan.[Paper] [Code]
  • 2021, Ultra Fast Speech Separation Model with Teacher Student Learning, Chen.[Paper]
  • sound separation(Google) [Code]
  • sound separation: Deep learning based speech source separation using Pytorch [Code]
  • music-source-separation [Code]
  • Singing-Voice-Separation [Code]
  • Comparison-of-Blind-Source-Separation-techniques[Code]

BSS/ICA method

  • FastICA[Code]
  • A localisation- and precedence-based binaural separation algorithm[Download]
  • Convolutive Transfer Function Invariant SDR [Code]

Array Signal Processing

  • MASP:Microphone Array Speech Processing [Code]
  • BeamformingSpeechEnhancer [Code]
  • TSENet [Code]
  • steernet [Code]
  • DNN_Localization_And_Separation [Code]
  • nn-gev:Neural network supported GEV beamformer CHiME3 [Code]
  • chime4-nn-mask:Implementation of NN based mask estimator in pytorch(reuse some programming from nn-gev)[Code]
  • beamformit_matlab:A MATLAB implementation of CHiME4 baseline Beamformit [Code]
  • pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [Code]
  • beamformit:麦克风阵列算法 [Code]
  • Beamforming-for-speech-enhancement [Code]
  • deepBeam [Code]
  • NN_MASK [Code]
  • Cone-of-Silence [Code]

Tools

  • APS:A workspace for single/multi-channel speech recognition & enhancement & separation. [Code]
  • AKtools:the open software toolbox for signal acquisition, processing, and inspection in acoustics [SVN Code](username: aktools; password: ak)
  • espnet [Code]
  • asteroid:The PyTorch-based audio source separation toolkit for researchers[PDF][Code]
  • pytorch_complex [Code]
  • ONSSEN: An Open-source Speech Separation and Enhancement Library [Code]
  • separation_data_preparation[Code]
  • MatlabToolbox [Code]
  • athena-signal [[Code]](https://github.com/athena-team/athena-signal)
  • python_speech_features [Code]
  • speechFeatures [Code]
  • sap-voicebox [Code]
  • Calculate-SNR-SDR [Code]
  • RIR-Generator [Code]
  • Signal-Generator (for moving sources or a moving array) [Code]
  • Python library for Room Impulse Response (RIR) simulation with GPU acceleration [Code]
  • ROOMSIM:binaural image source simulation [Code]
  • binaural-image-source-model [Code]
  • PESQ [Code]
  • SETK: Speech Enhancement Tools integrated with Kaldi [Code]
  • pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [Code]

Books

  • P. C.Loizou: Speech Enhancement: Theory and Practice
  • J. Benesty, Y. Huang: Adaptive Signal Processing: Applications to Real-World Problems
  • S. Haykin: Adaptive Filter Theory
  • Eberhard Hansler, Gerhard Schmidt: Single-Channel Acoustic Echo Cancellation 和 Topics in Acoustic Echo and Noise Control
  • J. Benesty, S. Makino, J. Chen: Speech Enhancement
  • J. Benesty, M. M. Sondhi, Y. Huang: Handbook Of Speech Processing
  • Ivan J. Tashev: Sound Capture and Processing: Practical Approaches
  • I. Cohen, J. Benesty, S. Gannot: Speech Processing in Modern Communication
  • E. Vincent, T. Virtanen, S. Gannot: Audio Source Separation and Speech Enhancement
  • J. Benesty 等: A Perspective on Stereophonic Acoustic Echo Cancellation
  • J. Benesty 等: Advances in Network and Acoustic Echo Cancellation
  • T. F.Quatieri: Discrete-time speech signal processing: principles and practice
  • 宋知用: MATLAB在语音信号分析与合成中的应用
  • Harry L.Van Trees: Optimum Array Processing
  • 王永良: 空间谱估计理论与算法
  • 鄢社锋: 优化阵列信号处理
  • 张小飞: 阵列信号处理及matlab实现
  • 赵拥军: 宽带阵列信号波达方向估计理论与方法
  • The-guidebook-of-speech-enhancement

Resources

  • Speech Signal Processing Course(ZH) [Link]
  • Speech Algorithms(ZH) [Link]
  • Speech Resources[Link]
  • Sound capture and speech enhancement for speech-enabled devices [Link]
  • CCF语音对话与听觉专业组语音对话与听觉前沿研讨会(ZH) [Link]

  • binauralLocalization [Code]
  • robotaudition_examples:Some Robot Audition simplified examples (sound source localization and separation), coded in Octave/Matlab [Code]
  • WSCM-MUSIC [Code]
  • doa-tools [Code]
  • Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks [Code] [PDF]
  • messl:Model-based EM Source Separation and Localization [Code]
  • messlJsalt15:MESSL wrappers etc for JSALT 2015, including CHiME3 [Code]
  • fast_sound_source_localization_using_TLSSC:Fast Sound Source Localization Using Two-Level Search Space Clustering [Code]
  • Binaural-Auditory-Localization-System [Code]
  • Binaural_Localization:ITD-based localization of sound sources in complex acoustic environments [Code]
  • Dual_Channel_Beamformer_and_Postfilter [Code]
  • 麦克风声源定位 [Code]
  • RTF-based-LCMV-GSC [Code]
  • DOA [Code]

Sound Event Detection

  • sed_eval - Evaluation toolbox for Sound Event Detection [Code]
  • Benchmark for sound event localization task of DCASE 2019 challenge [Code]
  • sed-crnn DCASE 2017 real-life sound event detection winning method. [Code]
  • seld-net [Code]