• Stars
    star
    108
  • Rank 321,259 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Speech enhancement system for the CHiME-5 dinner party scenario

pb_chime5: Front-End Processing for the CHiME-5 Dinner Party Scenario [pdf]

This repository includes all components of the CHiME-5 front-end presented by Paderborn University on the CHiME-5 workshop [PB2018CHiME5]. Using the baseline backend provided by the challenge organizers on the data enhanced with this multi-array front-end using the default parameters which differ slightly from the original paper a WER of 60.89 % was achieved on the development set. In combination with an acoustic model presented by the RWTH Aachen [Kitza2018] this multi-array front-end achieved the third best results during the challenge with 54.56 % on the development and 55.30 % on the evaluation set.

A later cooperation with Hitachi [Kanda2019] led to WER of 39.94 % on the development and 41.64 % on the evaluation set, using the multi-array front-end presented in this repository.

The best single system WERs with this enhancement are 41.6 % on the development and 43.2 % on the evaluation set reported in [Zorila2019].

The front-end consists out of WPE, a spacial mixture model that uses time annotations (GSS), beamforming and masking:

(System Overview)

The core code is located in the file pb_chime5/core.py. An example script to run the enhancement is in pb_chime5/scripts/run.py and can be executed with python -m pb_chime5.scripts.run with session_id=dev wpe=True wpe_tabs=2.

Challenge website: http://spandh.dcs.shef.ac.uk/chime_challenge/

Workshop website: http://spandh.dcs.shef.ac.uk/chime_workshop/

If you are using this code please cite the following paper (pdf, poster):

@inproceedings{PB2018CHiME5,
  author    = {Boeddeker, Christoph and Heitkaemper, Jens and Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold},
  title     = {{Front-End Processing for the CHiME-5 Dinner Party Scenario}},
  year      = {2018},
  booktitle = {CHiME5 Workshop},
}

Related work:

The RWTH/UPB System Combination for the CHiME 2018 Workshop (pdf)

@inproceedings{Kitza2018,
  author    = {Kitza, Markus and Michel, Wilfried and Boeddeker, Christoph and Heitkaemper, Jens and Menne, Tobias and Schl{\"u}ter, Ralf and Ney, Hermann and Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and others},
  title     = {The RWTH/UPB system combination for the CHiME 2018 workshop},
  year      = {2018}
  booktitle = {CHiME-5 Workshop},
}

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR (pdf, slides)

@Article{Kanda2019,
  author    = {Kanda, Naoyuki and Boeddeker, Christoph and Heitkaemper, Jens and Fujita, Yusuke and Horiguchi, Shota and Nagamatsu, Kenji and Haeb-Umbach, Reinhold},
  title     = {{Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR}},
  year      = {2019},
  booktitle = {Interspeech},
}

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription (pdf)

@inproceedings{Zorila2019,
  title     = {An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription},
  author    = {Zoril\u{a}, C\u{a}t\u{a}lin and Boeddeker, Christoph and Doddipatla, Rama and Haeb-Umbach, Reinhold},
  year={2019},
  booktitle = {2019 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
}

Towards a speaker diarization system for the CHiME 2020 dinner party transcription (pdf, slides, video)

@inproceedings{Boeddeker2018CHiME6,
  author    = {Boeddeker, Christoph and Cord-Landwehr, Tobias and Heitkaemper, Jens and Zoril\u{a}, C\u{a}t\u{a}lin and Hayakawa, Daichi and Li, Mohan and Liu, Min and Doddipatla, Rama and Haeb-Umbach, Reinhold},
  title     = {{Towards a speaker diarization system for the CHiME 2020 dinner party transcription}},
  year      = {2020},
  booktitle = {CHiME-6 Workshop},
}

Installation

Does not work with Windows.

Clone the repo with submodules

$ git clone https://github.com/fgnt/pb_chime5.git
$ cd pb_chime5
$ # Download submodule dependencies  # https://stackoverflow.com/a/3796947/5766934
$ git submodule init  
$ git submodule update

Use the environmental variable CHIME5_DIR to direct the repository to your chime5 data:

$ export CHIME5_DIR=/path/to/chime5/data/CHiME5

Install this package and pb_bss

$ pip install --user -e pb_bss/
$ pip install --user -e .

Create the database description file

$ make cache/chime5.json

It is assumed that the folder sacred in this git is the simulation folder. If you want to change the simulation dir, add a symlink to the folder where you want to store the simulation results: ln -s /path/to/sim/dir sacred

Start a testrun with

$ python -m pb_chime5.scripts.run test_run with session_id=dev

Start a simulation with 9 mpi workers (1 scheduler and 8 actual worker)

$ mpiexec -np 9 python -m pb_chime5.scripts.run with session_id=dev

You can replace mpiexec -np 9 with your HPC command to start a MPI program. It scalles up very well and is tested with 600 distributed cores.

CHiME-6 Track2: RTTM files

In Track 2 of CHiME-6 it is not allowed to use the human annotations for utterance starts and ends. Instead they must be estimated. As format they used RTTM files (For a description see https://github.com/nryant/dscore#rttm). Here an example line for such a file:

SPEAKER S09 1 65.58 1.75 <NA> <NA> P25 <NA> <NA>

Once you have an estmate for the utterance starts and ends, you can enhance the data with the following code:

python -m pb_chime5.scripts.kaldi_run_rttm with \
  storage_dir=path/to/save/enhanced/data \
  chime6_dir='/net/fastdb/chime6/CHiME6' \
  database_rttm="https://raw.githubusercontent.com/nateanl/chime6_rttm/master/dev_rttm" \
  activity_rttm="https://raw.githubusercontent.com/nateanl/chime6_rttm/master/dev_rttm" \
  session_id=dev \
  job_id=1 \
  number_of_jobs=1 \
  context_samples=160000 \
  bss_iterations=5 \
  multiarray='outer_array_mics'
  • storage_dir: Path where to store the enhanced data (<storage_dir>/audio/<dataset>/*.wav)
    • The enhanced data will be written to <storage_dir>/audio/<dataset>.
  • chime6_dir: Path to the CHiME-6 folder.
  • session_id dataset/session to enhance, e.g. dev, eval, train, S02, ...
  • database_rttm must contain the utterance starts and ends for the selected session_id. These starts and ends are used to write the audio files that can be used for ASR.
  • activity_rttm: Default is the same as database_rttm. May have more silence than database_rttm. e.g. activity_rttm has word start and end, while database_rttm has sentence start and end.
  • job_id and number_of_jobs: Control the subset you want to calculate. This option is intended for kaldi (e.g. run.pl). Alternatively, you could use mpiexec -np $number_of_jobs in front of the call to parallelize the enhancement. This should be slightly faster than kaldi.

FAQ

Q: I ran mpiexec -np 9 python -m pb_chime5.scripts.run with session_id=dev wpe=True wpe_tabs=2 and it generated 9 folders and the estimated duration is around 100 h. Is this right?

A: It is likely that your mpi4py installation does not work. Execute the following command and check if the output is correct:

$ mpiexec -np 3 python -c 'from mpi4py import MPI; print("My worker rank:", MPI.COMM_WORLD.rank, "Total workers:", MPI.COMM_WORLD.size)'
My worker rank: 2 Total workers: 3
My worker rank: 0 Total workers: 3
My worker rank: 1 Total workers: 3

Q: I want to use my own source activity detector. Can you give me a hint where to start?

At the end of pb_chime5/activity_alignment.py is some code how to generate finetuned time annotations from kaldi worn alignments. You have to change the worn_ali_path to worn alignments from kaldi and it will generate files (cache/word_non_sil_alignment/S??.pkl) for finetuned oracle time annotations. Using them for enhancement you have to change the activity_type to path and activity_path to the path of the finetuned time annotations e.g. python -m pb_chime5.scripts.run with activity_type=path activity_path=cache/word_non_sil_alignment.

More Repositories

1

nara_wpe

Different implementations of "Weighted Prediction Error" for speech dereverberation
Python
473
star
2

pb_bss

Collection of EM algorithms for blind source separation of audio signals
Python
264
star
3

nn-gev

Neural network supported GEV beamformer
Python
191
star
4

sms_wsj

SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
Python
101
star
5

meeteval

MeetEval - A meeting transcription evaluation toolkit
Python
72
star
6

padertorch

A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.
Python
71
star
7

pb_sed

Paderborn Sound Event Detection
Python
68
star
8

ci_sdr

Python
51
star
9

mms_msg

Multipurpose Multi Speaker Mixture Signal Generator
Python
42
star
10

paderbox

Paderbox: A collection of utilities for audio / speech processing
Python
37
star
11

graph_pit

Python
32
star
12

sed_scores_eval

Python
26
star
13

lazy_dataset

lazy_dataset: Process large datasets as if it was an iterable.
Python
17
star
14

LatticeWordSegmentation

Software to apply unsupervised word segmentation on lattices or text sequences using a nested hierarchical Pitman Yor language model
C++
17
star
15

paderwasn

Paderwasn is a collection of methods for acoustic signal processing in wireless acoustic sensor networks (WASNs).
Python
13
star
16

nhpylm

Python bindings for a c++ based implementation of the Nested Hierarchical Pitman-Yor Language model
C++
13
star
17

sins

Python
8
star
18

python_crashkurs

Jupyter Notebook
7
star
19

oaf

Jupyter notebooks for the lecture "Optimal and adaptive filters"
Jupyter Notebook
7
star
20

mnist

Makefile
6
star
21

dlp_mpi

Python
5
star
22

nachrichtentechnik

Jupyter noteboooks for the lecture "Nachrichtentechnik" (communications engineering) with explanations in german.
Jupyter Notebook
4
star
23

libriwasn

Tools and scripts for the LibriWASN data set from zenodo
Python
3
star
24

ham_radio

Python
3
star
25

speaker_reassignment

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
Python
3
star
26

upb_audio_tagging_2019

UPB system for the Kaggle competition "Freesound Audio Tagging 2019"
Python
2
star
27

asnsig

ASNSIG – A Signal Generator for Ad-Hoc Acoustic Sensor Networks in Smart Home Environments
Python
2
star
28

2019_ad_xidian

HTML
1
star