DNN-based source separation
A PyTorch implementation of DNN-based source separation.
New information
- v0.7.2
- Update jupyter notebooks.
Model
Modules
Module | Reference | Done |
---|---|---|
Depthwise-separable convolution | Xception: Deep Learning with Depthwise Separable Convolutions | |
Gated Linear Units (GLU) | Language Modeling with Gated Convolutional Networks | |
Sigmoid Linear Units (SiLU) | Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning | β |
Feature-wise Linear Modulation (FiLM) | FiLM: Visual Reasoning with a General Conditioning Layer | |
Point-wise Convolutional Modulation (PoCM) | LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation |
Method related to training
Method | Reference | Done |
---|---|---|
Pemutation invariant training (PIT) | Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks | |
One-and-rest PIT | Recursive Speech Separation for Unknown Number of Speakers | |
Probabilistic PIT | Probabilistic Permutation Invariant Training for Speech Separation | |
Sinkhorn PIT | Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm | |
Combination Loss | All for One and One for All: Improving Music Separation by Bridging Networks |
Example
LibriSpeech example using Conv-TasNet
You can check other tutorials in <REPOSITORY_ROOT>/egs/tutorials/
.
0. Preparation
cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare_librispeech.sh \
--librispeech_root <LIBRISPEECH_ROOT> \
--n_sources <#SPEAKERS>
1. Training
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh \
--exp_dir <OUTPUT_DIR>
If you want to resume training,
. ./train.sh \
--exp_dir <OUTPUT_DIR> \
--continue_from <MODEL_PATH>
2. Evaluation
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh \
--exp_dir <OUTPUT_DIR>
3. Demo
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh
Pretrained Models
You need gdown
to download pretrained models.
pip install gdown
You can load pretrained models.
from models.conv_tasnet import ConvTasNet
model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals")
See PRETRAINED.md
, egs/tutorials/hub/pretrained.ipynb
or click for details.
Time Domain Wrappers for Time-Frequency Domain Models
See egs/tutorials/hub/time-domain_wrapper.ipynb
or click .
Speech Separation by Pretrained Models
See egs/tutorials/hub/speech-separation.ipynb
or click .
Music Source Separation by Pretrained Models
See egs/tutorials/hub/music-source-separation.ipynb
or click .
If you want to separate your own music file, see below:
- MMDenseLSTM: See
egs/tutorials/mm-dense-lstm/separate_music.ipynb
or click . - Conv-TasNet: See
egs/tutorials/conv-tasnet/separate_music.ipynb
or click . - UMX: See
egs/tutorials/umx/separate_music.ipynb
or click . - X-UMX: See
egs/tutorials/x-umx/separate_music.ipynb
or click . - D3Net: See
egs/tutorials/d3net/separate_music.ipynb
or click .