Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Solidity

Haskell

R

Groovy

Java

PowerShell

OCaml

Perl

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

TypeScript

Kotlin

Julia

PowerShell

Zig

F#

Perl

Ruby

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇮🇲 Isle of Man

🇲🇽 Mexico

🇦🇮 Anguilla

🇴🇲 Oman

🇭🇺 Hungary

🇻🇳 Vietnam

🇲🇩 Moldova

🇱🇾 Libya

All Countries Compare Countries

funcwj/deep-clustering

Stars
108
Rank 321,259 (Top 7 %)
Language
Python
Created over 6 years ago
Updated over 2 years ago

funcwj/deep-clustering

funcwj

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

deep clustering method for single-channel speech separation

Deep clustering for single-channel speech separation

Implement of "Deep Clustering Discriminative Embeddings for Segmentation and Separation"

Requirements

see requirements.txt

Usage

Configure experiments in .yaml files, for example: train.yaml

Training:

python ./train_dcnet.py --config conf/train.yaml --num-epoches 20 > train.log 2>&1 &

Inference:

python ./separate.py --num-spks 2 $mdl_dir/train.yaml $mdl_dir/final.pkl egs.scp

Experiments

Configure	Epoch	FM	FF	MM	FF/MM	AVG
config-1	25	11.42	6.85	7.88	7.36	9.54

Q & A

The format of the .scp file?

The format of the wav.scp file follows the definition in kaldi toolkit. Each line contains a key value pair, where key is a unique string to index audio file and the value is the path of the file. For example
```
mix-utt-00001 /home/data/train/mix-utt-00001.wav
...
mix-utt-XXXXX /home/data/train/mix-utt-XXXXX.wav
```
How to prepare training dataset?

Original paper use MATLAB scripts from create-speaker-mixtures.zip to simulate two- and three-speaker dataset. You can use you own data source (egs: Librispeech, TIMIT) and create mixtures, keeping clean sources at meanwhile.

Reference

Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 31-35.
Isik Y, Roux J L, Chen Z, et al. Single-channel multi-speaker separation using deep clustering[J]. arXiv preprint arXiv:1607.02173, 2016.

setk

Tools for Speech Enhancement integrated with Kaldi

conv-tasnet

A PyTorch implementation of "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" (see recipes in aps framework https://github.com/funcwj/aps)

CGMM-MVDR

Implementation of the CGMM-MVDR beamforming (for python version please refer to https://github.com/funcwj/setk)

aps

A personal toolkit for single/multi-channel speech recognition & enhancement & separation.

ge2e-speaker-verification

Pytorch implementation of "Generalized End-to-End Loss for Speaker Verification"

uPIT-for-speech-separation

Speech separation with utterance-level PIT experiments

voice-filter

A unofficial Pytorch implementation of Google's VoiceFilter

cgmm-mask-estimator

Implementation of the offline method described in "Robust mvdr beamforming using time-frequency masks for online/offline asr in noise" (for python version please refer to https://github.com/funcwj/setk)

kaldi-python-io

A python IO interface for data accessing in kaldi

chime4-nn-mask

Implementation of NN based mask estimator in pytorch

pydecoder

A python wrapper for kaldi-online-decoder using Cython

portable-decoder

A simple, portable decoder

asr-utils

Some tools implemented in C++ for ASR

raspberry-pi-kws

A QbyE-KWS Demo runs on Raspberry Pi

android-kws

Source code for implementing KWS demo on android