• Stars
    star
    108
  • Rank 319,353 (Top 7 %)
  • Language
    Python
  • Created over 6 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

deep clustering method for single-channel speech separation

Deep clustering for single-channel speech separation

Implement of "Deep Clustering Discriminative Embeddings for Segmentation and Separation"

Requirements

see requirements.txt

Usage

  1. Configure experiments in .yaml files, for example: train.yaml

  2. Training:

    python ./train_dcnet.py --config conf/train.yaml --num-epoches 20 > train.log 2>&1 &
  3. Inference:

    python ./separate.py --num-spks 2 $mdl_dir/train.yaml $mdl_dir/final.pkl egs.scp
    

Experiments

Configure Epoch FM FF MM FF/MM AVG
config-1 25 11.42 6.85 7.88 7.36 9.54

Q & A

  1. The format of the .scp file?

    The format of the wav.scp file follows the definition in kaldi toolkit. Each line contains a key value pair, where key is a unique string to index audio file and the value is the path of the file. For example

    mix-utt-00001 /home/data/train/mix-utt-00001.wav
    ...
    mix-utt-XXXXX /home/data/train/mix-utt-XXXXX.wav
    
  2. How to prepare training dataset?

    Original paper use MATLAB scripts from create-speaker-mixtures.zip to simulate two- and three-speaker dataset. You can use you own data source (egs: Librispeech, TIMIT) and create mixtures, keeping clean sources at meanwhile.

Reference

  1. Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 31-35.
  2. Isik Y, Roux J L, Chen Z, et al. Single-channel multi-speaker separation using deep clustering[J]. arXiv preprint arXiv:1607.02173, 2016.

More Repositories

1

setk

Tools for Speech Enhancement integrated with Kaldi
Python
394
star
2

conv-tasnet

A PyTorch implementation of "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" (see recipes in aps framework https://github.com/funcwj/aps)
Python
207
star
3

CGMM-MVDR

Implementation of the CGMM-MVDR beamforming (for python version please refer to https://github.com/funcwj/setk)
Python
138
star
4

aps

A personal toolkit for single/multi-channel speech recognition & enhancement & separation.
Python
138
star
5

ge2e-speaker-verification

Pytorch implementation of "Generalized End-to-End Loss for Speaker Verification"
Python
101
star
6

uPIT-for-speech-separation

Speech separation with utterance-level PIT experiments
Python
100
star
7

voice-filter

A unofficial Pytorch implementation of Google's VoiceFilter
Python
97
star
8

cgmm-mask-estimator

Implementation of the offline method described in "Robust mvdr beamforming using time-frequency masks for online/offline asr in noise" (for python version please refer to https://github.com/funcwj/setk)
MATLAB
66
star
9

kaldi-python-io

A python IO interface for data accessing in kaldi
Python
38
star
10

chime4-nn-mask

Implementation of NN based mask estimator in pytorch
Python
30
star
11

pydecoder

A python wrapper for kaldi-online-decoder using Cython
C++
12
star
12

portable-decoder

A simple, portable decoder
C++
10
star
13

asr-utils

Some tools implemented in C++ for ASR
C++
7
star
14

raspberry-pi-kws

A QbyE-KWS Demo runs on Raspberry Pi
C++
5
star
15

android-kws

Source code for implementing KWS demo on android
Java
4
star