Deep Learning For Monaural Source Separation
Demo
Webpage: https://sites.google.com/site/deeplearningsourceseparation/
Experiments
MIR-1K experiment (singing voice separation)
-
Training code:
codes/mir1k/train_mir1k_demo.m
-
Demo
- Download a trained model
http://www.ifp.illinois.edu/~huang146/DNN_separation/model_400.mat
- Put the model at
codes/mir1k/demo
and go to the folder - Run:
codes/mir1k/demo/run_test_single_model.m
TIMIT experiment (speech separation)
-
Training code:
codes/timit/train_timit_demo.m
andcodes/timit/train_timit_demo_mini_clip.m
-
Demo
- Download a trained model
http://www.ifp.illinois.edu/~huang146/DNN_separation/timit_model_70.mat
- Put the model at
codes/timit/demo
and go to the folder - Run:
codes/timit/demo/run_test_single_model.m
TSP experiment (speech separation)
-
Training code:
codes/TSP/train_TSP_demo_mini_clip.m
-
Demo
- Download a trained model
http://www.ifp.illinois.edu/~huang146/DNN_separation/TSP_model_RNN1_win1_h300_l2_r0_64ms_1000000_softabs_linearout_RELU_logmel_trn0_c1e-10_c0.001_bsz100000_miter10_bf50_c0_d0_7650.mat
- Put the model at
codes/TSP/demo
and go to the folder - Run the demo code at
codes/TSP/demo/run_test_single_model.m
Denosing experiment
-
Put original
FCJF0, FDAW0', FDML0, FECD0, 'FETB0', 'FJSP0', 'FKFB0', 'FMEM0', 'FSAH0', 'FSJK1', 'FSMA0', 'FTBR0', 'FVFB0' 'FVMH0
of the original TIMIT data undercodes/denoising/Data/timit/
-
Training code:
codes/denoising/train_denoising_demo.m
-
Demo
- Download a trained model
http://www.ifp.illinois.edu/~huang146/DNN_separation/denoising_model_870.mat
- Put the model at
codes/denoising/demo
and go to the folder - Run the demo code at
codes/denoising/demo/run_test_single_model.m
Dependencies
-
The package is modified based on rnn-speech-denoising
-
The software depends on Mark Schmidt's minFunc package for convex optimization.
-
Additionally, we have included Mark Hasegawa-Johnson's HTK write and read functions that are used to handle the MFCC files.
-
We use HTK for computing features (MFCC, logmel) (HCopy).
-
We use signal processing functions from labrosa.
-
We use BSS Eval toolbox Version 2.0, 3.0 for evaluation.
-
We use MIR-1K for singing voice separation task.
-
We use TSP for speech separation task.
Work on your data:
-
To try the codes on your data, see mir1k, TSP settings - put your data into
codes/mir1k/Wavfile
orcodes/TSP/Data/
accordingly. -
Look at the unit test parameters below
codes/mir1k/train_mir1k_demo.m
,codes/TSP/train_TSP_demo_mini_clip.m
(with minibatch lbfgs, gradient clipping) -
Tune the parameters on the dev set and check the results.
Reference
-
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136–2147, Dec. 2015
-
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks," in International Society for Music Information Retrieval Conference (ISMIR) 2014.
-
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Deep Learning for Monaural Speech Separation," in IEEE International Conference on Acoustic, Speech and Signal Processing 2014.
Notes
The codes are tested using MATLAB R2015a