JusperLee/TDANet

Stars
218
Rank 181,805 (Top 4 %)
Language
Python
License
Apache License 2.0
Created about 2 years ago
Updated 7 months ago

JusperLee/TDANet

JusperLee

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

An efficient speech separation method

An efficient encoder-decoder architecture with top-down attention for speech separation

This repository is the official implementation of An efficient encoder-decoder architecture with top-down attention for speech separation Paper link.

@inproceedings{tdanet2023iclr,
  title={An efficient encoder-decoder architecture with top-down attention for speech separation},
  author={Li, Kai and Yang, Runxuan and Hu, Xiaolin},
  booktitle={ICLR},
  year={2023}
}

News

🔥 October, 2023: We have released the pre-trained models of our TDANet. You can download them from and .

🌟 July, 2023: We are pleased to announce the update of our model training framework! This new framework has excellent versatility, and it can flexibly handle the training and testing tasks of various voice separation models.

Datasets

The LRS2 dataset contains thousands of video clips acquired through BBC. LRS2 contains a large amount of noise and reverberation interference, which is more challenging and closer to the actual environment than the WSJ0 and LibriSpeech corpora.

LRS2-2Mix is created by using the LRS2 corpus, where the training set, validation set and test set contain 20000, 5000 and 3000 utterances, respectively. The two different speaker audios from different scenes with 16 kHz sample rate were randomly selected from the LRS2 corpus and were mixed with signal-to-noise ratios sampled between -5 dB and 5 dB. The length of mixture audios is 2 seconds.

Dataset Download Link: Google Driver

Training and evaluation

Training

python DataPreProcess/process_librimix.py --in_dir=xxxx --out_dir=DataPreProcess/Libri2Mix
python audio_train.py --conf_dir=configs/tdanet.yml

Evaluation

python audio_test.py --conf_dir=Experiments/checkpoint/TDANet/conf.yml

Inference with Pretrained Model

import os
import torch
import look2hear.models

os.environ['CUDA_VISIBLE_DEVICES'] = "0"

model =  look2hear.models.BaseModel.from_pretrain("JusperLee/TDANetBest-2ms-LRS2").cuda()
test_data = torch.randn(1, 1, 16000).cuda()
out = model(test_data)
print(out.shape)

Results

Our model achieves the following performance on :

Demo Page

Demo

Reference

Speech-Separation-Paper-Tutorial

A must-read paper for speech separation based on neural networks

Conv-TasNet

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement

Dual-Path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Looking-to-Listen-at-the-Cocktail-Party

Executable code based on Google articles

AFRCNN-For-Speech-Separation

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

LibriSpace

Deep-Clustering-for-Speech-Separation

Pytorch implements Deep Clustering: Discriminative Embeddings For Segmentation And Separation

SPMamba

IIANet

This is the demo of our paper "IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation".

Calculate-SNR-SDR

Script to calculate SNR and SDR using python

LRS3-For-Speech-Separation

Multi-modal speech separation task data generation script on LRS3 data set.

CTCNet

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

UtterancePIT-Speech-Separation

According to funcwj's uPIT, the training code supporting multi-gpu is written, and the Dataloader is reconstructed.

AV-ConvTasNet

Unofficial Time Domain Audio Visual Speech Separation Implementation

Deep-Encoder-Decoder-Conv-TasNet

A PyTorch implementation of " AN EMPIRICAL STUDY OF CONV-TASNET "

DANet-For-Speech-Separation

Pytorch implement of DANet For Speech Separation

S4M

Official implementation of Efficient Speech Separation Framework Based on Neural State-Space Models

Look2hear

A toolkit for researchers in the multimodal sound separation.

speechbrain-docs-zh-cn

SpeechBrain中文文档

Arxiv-New-Paper-Server

Arxiv automatically obtains the latest article service.

My-Script-For-Audio-Process

Some convenient scripts for your own use

Jupyter Notebook

ExamOnline

This is a complete online exam system

Apollo

Music repair method to convert lossy MP3 compressed music to lossless music.

WeChatApp

Complete code of WeChat Mini Program

player

Android Homework(3)

GrabCut

Grass

ELF-SR

Time

My Android Project

Accelerator

Openmp Accelerator

Deep-Learning

Learn to deep learning the code of your own records.

JusperLee

jusperlee.github.io

Souhu-Competition-Dazuoye

TFACM

BigData-Homework-Yanwaizhiyi

RTFS-Net

Deep-learning-course

Store some necessary files

audio-paper-daily