Audio-WestlakeU/FullSubNet

Stars
529
Rank 83,810 (Top 2 %)
Language
Python
License
MIT License
Created almost 4 years ago
Updated over 1 year ago

Audio-WestlakeU/FullSubNet

Audio-WestlakeU

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet

Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

Documentation Status

Guides

The documentation is hosted on Read the Docs. Check the documentation for how to train and test models.

Improved FullSubNet: Further reduces computational costs and enables high sampling rate data processing, e.g., 48 KHz and 24 KHz.
- ❇️ Model Architecture
📰 FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement, ICASSP 2021
- 📸 Demo (Audio Clips)
- 🎏 Model Checkpoints
- ❇️ Model Architecture
📰 Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement
- ❇️ Model Architecture
- 📸 Demo (Audio Clips)
cIRM-based Fullband baseline model (described in the original FullSubNet paper)
- ❇️ Model Architecture

Citation

If you use this code for your research, please consider citeing:

@INPROCEEDINGS{hao2020fullsubnet,
    author={Hao, Xiang and Su, Xiangdong and Horaud, Radu and Li, Xiaofei},
    booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
    title={Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement},
    year={2021},
    pages={6633-6637},
    doi={10.1109/ICASSP39728.2021.9414177}
}

License

This repository Under the MIT license.

NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

McNet

The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023

audiossl

A library built for easier audio self-supervised training, downstream tasks evaluation

FN-SSL

The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization

ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook

FS-EEND

The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024]

RealMAN

A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

RVAE-EM

Official PyTorch implementation of "RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function" [ICASSP2024]

pytorch_lightning_template_for_beginners

A pytorch template for beginners based on pytorch_lightning

Narrowband_DeepFiltering

UMA-ASR

This repository is the official implementation of "Unimodal Aggregation for CTC-based Speech Recognition".

RCT

This repo gives the code for the official implementation of RCT.

OnlineSSL_DPRTF_EG

LSTM-noisePSD

bss_ctf_lasso

Microphone-Array-Generalization-for-Multichannel-Narrowband-Deep-Speech-Enhancement-

Audio-WestlakeU.github.io

Audio and Signal Information Processing Lab in Westlake University concentrates on speech processing algorithm

DP_RTF_SSL

SMIF_online_dereverb

ATST-RCT

ATST-RCT model for DCASE 2022 task4.

RTF_InterFrameSpecSub

RS_noisePSD