Dual-path-RNN-Pytorch
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
If you have any questions, you can ask them through the issue.
If you find this project helpful, you can give me a star generously.
Demo Pages: Results of pure speech separation model
Plan
-
2020-02-01: Reading article “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”. Zhihu Article link "阅读笔记”Dual-path RNN for Speech Separation“". Blog Article link "阅读笔记《Dual-path RNN for speech separation》". Both articles are interpretations of the paper. If you have any questions, welcome to discuss with me
-
2020-02-02: Complete data preprocessing, data set code. Dataset Code: /data_loader/Dataset.py
-
2020-02-03: Complete Conv-TasNet Framework (Update /model/model.py, Trainer_Tasnet.py, Train_Tasnet.py)
-
2020-02-07: Complete Training code. (Update /model/model_rnn.py) and Test parameters and some details are being adjusted.
-
2020-02-08: Fixed the code's bug.
-
2020-02-11: Complete Testing code.
Dataset
We used the WSJ0 dataset as our training, test, and validation sets. Below is the data download link and mixed audio code for WSJ0.
Training
Training for Conv-TasNet model
- First, you need to generate the scp file using the following command. The content of the scp file is "filename && path".
python create_scp.py
- Then you can modify the training and model parameters through "config/Conv_Tasnet/train.yml".
cd config/Conv-Tasnet
vim train.yml
- Then use the following command in the root directory to train the model.
python train_Tasnet.py --opt config/Conv_Tasnet/train.yml
Training for Dual Path RNN model
- First, you need to generate the scp file using the following command. The content of the scp file is "filename && path".
python create_scp.py
- Then you can modify the training and model parameters through "config/Dual_RNN/train.yml".
cd config/Dual_RNN
vim train.yml
- Then use the following command in the root directory to train the model.
python train_rnn.py --opt config/Dual_RNN/train.yml
Inference
Conv-TasNet
You need to modify the default parameters in the test_tasnet.py file, including test files, test models, etc.
For multi-audio
python test_tasnet.py
For single-audio
python test_tasnet_wav.py
Dual-Path-RNN
You need to modify the default parameters in the test_dualrnn.py file, including test files, test models, etc.
For multi-audio
python test_dualrnn.py
For single-audio
python test_dualrnn_wav.py
Pretrain Model
Conv-TasNet
Dual-Path-RNN
Result
Conv-TasNet
Final Results: 15.8690 is 0.56 higher than 15.3 in the paper.
Dual-Path-RNN
Final Results: 18.98 is 0.1 higher than 18.8 in the paper.
Reference
- Luo Y, Chen Z, Yoshioka T. Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation[J]. arXiv preprint arXiv:1910.06379, 2019.
- Conv-TasNet code && Dual-RNN code