• Stars
    star
    191
  • Rank 202,877 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created almost 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Neural network supported GEV beamformer

Neural network based GEV beamformer

Introduction

This repository contains code to replicate the results for the 3rd CHiME challenge using a NN-GEV Beamformer.

Install

This code requires Python 3 to run (although most parts should be compatible with Python 2.7). Install the necessary modules:

pip install chainer
pip install tqdm
pip install SciPy
pip install scikit-learn
pip install librosa

Usage

  1. Extract the speech and noise images for the SimData using the modified Matlab script in CHiME3/tools/simulation

  2. Start the training for the BLSTM model using the GPU with id 0 and the data directory data:

    python train.py --chime_dir=../chime/data --gpu 0 data BLSTM
    

    This will first create the training data (i.e. the binary mask targets) and then run the training with early stopping. Instead of BLSTM it is also possible to specify FW to train a simple feed-forward model.

  3. Start the beamforming:

```
beamform.sh ../chime/data data/export_BLSTM data/BLSTM_model/best.nnet BLSTM
```

This will apply the beamformer to every utterance of the CHiME database and store the resulting audio file in ``data/export_BLSTM``. The model ``data/BLSTM_model/best.nnet`` is used to generate the masks.
  1. Start the kaldi baseline using the exported data.

If you want to use the beamformer with a different database, take a look at beamform.py and chime_data and modify it accordingly.

Results

With the new baseline, you should get the following results:

```
local/chime4_calc_wers.sh exp/tri3b_tr05_multi_noisy new_baseline exp/tri3b_tr05_multi_noisy/graph_tgpr_5k
compute dt05 WER for each location
-------------------
best overall dt05 WER 9.77% (language model weight = 12)
-------------------
dt05_simu WER: 9.81% (Average), 8.95% (BUS), 11.28% (CAFE), 8.55% (PEDESTRIAN), 10.44% (STREET)
-------------------
dt05_real WER: 9.73% (Average), 11.67% (BUS), 9.37% (CAFE), 8.41% (PEDESTRIAN), 9.47% (STREET)
-------------------
et05_simu WER: 10.67% (Average), 8.85% (BUS), 11.34% (CAFE), 11.02% (PEDESTRIAN), 11.47% (STREET)
-------------------
et05_real WER: 14.00% (Average), 19.01% (BUS), 13.37% (CAFE), 12.37% (PEDESTRIAN), 11.24% (STREET)
-------------------


./local/chime4_calc_wers_smbr.sh exp/tri4a_dnn_tr05_multi_noisy_smbr_i1lats new_baseline exp/tri4a_dnn_tr05_multi_noisy/graph_tgpr_5k
compute dt05 WER for each location
-------------------
best overall dt05 WER 5.87% (language model weight = 9) (Number of iterations = 4)
-------------------
dt05_simu WER: 5.62% (Average), 5.24% (BUS), 6.58% (CAFE), 4.91% (PEDESTRIAN), 5.77% (STREET)
-------------------
dt05_real WER: 6.11% (Average), 7.66% (BUS), 5.83% (CAFE), 5.10% (PEDESTRIAN), 5.87% (STREET)
-------------------
et05_simu WER: 7.26% (Average), 6.74% (BUS), 7.70% (CAFE), 7.38% (PEDESTRIAN), 7.23% (STREET)
-------------------
et05_real WER: 9.48% (Average), 14.06% (BUS), 8.22% (CAFE), 7.81% (PEDESTRIAN), 7.84% (STREET)
-------------------


local/chime4_calc_wers.sh exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore new_baseline_rnnlm_5k_h300_w0.5_n100 exp/tri4a_dnn_tr05_multi_noisy_smbr_lmrescore/graph_tgpr_5k
compute dt05 WER for each location
-------------------
best overall dt05 WER 4.02% (language model weight = 11)
-------------------
dt05_simu WER: 3.97% (Average), 3.66% (BUS), 4.65% (CAFE), 3.38% (PEDESTRIAN), 4.19% (STREET)
-------------------
dt05_real WER: 4.07% (Average), 5.34% (BUS), 3.61% (CAFE), 3.35% (PEDESTRIAN), 4.00% (STREET)
-------------------
et05_simu WER: 4.51% (Average), 4.09% (BUS), 4.61% (CAFE), 4.46% (PEDESTRIAN), 4.86% (STREET)
-------------------
et05_real WER: 6.46% (Average), 9.87% (BUS), 5.47% (CAFE), 5.14% (PEDESTRIAN), 5.34% (STREET)
-------------------
```

Citation

If you use this code for your experiments, please consider citing the following paper:

@inproceedings{Hey2016,
title = {NEURAL NETWORK BASED SPECTRAL MASK ESTIMATION FOR ACOUSTIC BEAMFORMING},
author = {J. Heymann, L. Drude, R. Haeb-Umbach},
year = {2016},
date = {2016-03-20},
booktitle = {Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
keywords = {},
pubstate = {forthcoming},
tppubtype = {inproceedings}
}

More Repositories

1

nara_wpe

Different implementations of "Weighted Prediction Error" for speech dereverberation
Python
473
star
2

pb_bss

Collection of EM algorithms for blind source separation of audio signals
Python
264
star
3

pb_chime5

Speech enhancement system for the CHiME-5 dinner party scenario
Python
108
star
4

sms_wsj

SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
Python
101
star
5

meeteval

MeetEval - A meeting transcription evaluation toolkit
Python
72
star
6

padertorch

A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
Python
71
star
7

pb_sed

Paderborn Sound Event Detection
Python
68
star
8

ci_sdr

Python
51
star
9

mms_msg

Multipurpose Multi Speaker Mixture Signal Generator
Python
42
star
10

paderbox

Paderbox: A collection of utilities for audio / speech processing
Python
37
star
11

graph_pit

Python
32
star
12

sed_scores_eval

Python
26
star
13

lazy_dataset

lazy_dataset: Process large datasets as if it was an iterable.
Python
17
star
14

LatticeWordSegmentation

Software to apply unsupervised word segmentation on lattices or text sequences using a nested hierarchical Pitman Yor language model
C++
17
star
15

paderwasn

Paderwasn is a collection of methods for acoustic signal processing in wireless acoustic sensor networks (WASNs).
Python
13
star
16

nhpylm

Python bindings for a c++ based implementation of the Nested Hierarchical Pitman-Yor Language model
C++
13
star
17

sins

Python
8
star
18

python_crashkurs

Jupyter Notebook
7
star
19

oaf

Jupyter notebooks for the lecture "Optimal and adaptive filters"
Jupyter Notebook
7
star
20

mnist

Makefile
6
star
21

dlp_mpi

Python
5
star
22

nachrichtentechnik

Jupyter noteboooks for the lecture "Nachrichtentechnik" (communications engineering) with explanations in german.
Jupyter Notebook
4
star
23

libriwasn

Tools and scripts for the LibriWASN data set from zenodo
Python
3
star
24

ham_radio

Python
3
star
25

speaker_reassignment

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
Python
3
star
26

upb_audio_tagging_2019

UPB system for the Kaggle competition "Freesound Audio Tagging 2019"
Python
2
star
27

asnsig

ASNSIG – A Signal Generator for Ad-Hoc Acoustic Sensor Networks in Smart Home Environments
Python
2
star
28

2019_ad_xidian

HTML
1
star