• Stars
    star
    190
  • Rank 198,028 (Top 4 %)
  • Language
    Python
  • Created about 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Speaker diarization scripts, based on AaltoASR

Speaker Diarization scripts README

This README describes the various scripts available for doing manual segmentation of media files, for annotation or other purposes, for speaker diarization, and converting from-to the file formats of several related tools.

The scripts are either in python2 or perl, but interpreters for these should be readily available.

Please send any questions/suggestions to [email protected]

Quick Start Using Docker

A pre-built docker container can be used to run the the scripts.

docker pull blabbertabber/aalto-speech-diarizer

In the following example, we use the container to diarize a meeting.wav file:

docker run -it blabbertabber/aalto-speech-diarizer bash
cd /speaker-diarization
curl -k -OL https://nono.io/meeting.wav  # sample .wav; substitute yours
./spk-diarization2.py meeting.wav        # substitute your .wav filename
cat stdout                               # browse output

Installation instructions

Most of these scripts depend on the aku tools that are part of the AaltoASR package that you can find here. You should compile that for your platform first, following these instructions.

In this speaker-diarization directory:

  • Add a symlink to the folder AaltoASR/
  • Add a symlink to the folder AaltoASR/build
  • Add a symlink to AaltoASR/build/aku/feacat
  • Make sure the ffmpeg executable is on path or add a symlink to it too.

For example, if you have cloned and built AaltoASR into the ../AaltoASR path (relative to speaker-diarization):

speaker-diarization$ ln -s ../AaltoASR ./
speaker-diarization$ ln -s ../AaltoASR/build ./
speaker-diarization$ ln -s ../AaltoASR/build/aku/feacat ./

Would work.

You probably want to use spk-diarization2.py since that one calls the 2 versions of some scrips, while spk-diarization.py uses an old, matlab-based VAD that is hard to configure and deprecated.

mseg.py

Script to help perform manual segmentation of a media file, it can be any media file type supported by mplayer. It's only dependency is a Python-mplayer wrapper that can be installed locally by executing:

$ pip install --user mplayer.py

After that executing it is just:

$ ./mseg.py /path/to/mediafile -o outputfile

The output file is optional. It also supports the invocation:

$ ./mseg.py /path/to/mediafile -o outputfile -i inputfile

To continue a previously saved segmentation session. Once in the program, the controls are:

  • Quit: esc or q
  • Pause: p
  • Mark position: space
  • Manually edit mark: e
  • Add manual mark: a
  • Remove mark: r
  • Faster speed: Up
  • Slower speed: Down
  • Rewind: Left
  • Fast Forward: Right
  • Scroll down marks: pgDwn
  • Scroll up marks: pgUp

The media file starts as paused, so to start reproduction just hit the p key.

mseg2elan.py

Script to convert from mseg output to Elan file format.

Usage:

$ ./mseg2elan.py msoutputfile -o outputfile

If outputfile is not specified, the output will be sent to the stdout. Once in Elan, segments can be easily fine tuned by changing to the segmentation mode, in Options->Segmentation Mode.

aku2elan.py

Script to convert from AKU recipes to Elan file format.

Usage:

$ ./aku2elan.py recipe -o outputfile

If outputfile is not specified, the output will be sent to the stdout. Once in Elan, segments can be easily fine tuned by changing to the segmentation mode, in Options->Segmentation Mode.

elan2aku.py

Script to convert from Elan file format to AKU recipes.

Usage:

$ ./elan2aku.py elanoutputfile -o akurecipe

If akurecipe is not specified, the output will be sent to the stdout.

mseg_to_textgrid.pl

Script to convert from mseg output to praat file format.

Usage:

$ perl mseg_to_textgrid.pl msfile > outputfile

If outputfile is not specified, the output will be sent to the stdout.

voice-detection2.py

Creates an AKU recipe from the generate_exp.py output (.exp files).

For full help, use:

$ ./voice-detection2.py -h

vad-performance.py

Rates the performance of a Voice Activity Detection recipe in AKU format, such as those created with voice-detection.py. To measure the performance, another recipe with ground truth should be provided.

For full help, use:

$ ./vad-performance.py -h

spk-change-detection.py

Performs speaker turn segmentation over audio, using a distance measure such as GLR, KL2 or BIC, and sliding or growing window. It requires an input recipe file in AKU format pointing to the audio files, and preferably with turns of speech/non-speech already processed, and a features file for each wav to process, in the format outputted by the feacat program of the AKU suite.

For full help, use:

$ ./spk-change-detection.py -h

spk-change-performance.py

Rates the performance of a speaker turn segmentation recipe in AKU format, such as those created with spk-change-detection.py. To measure the performance, another recipe with ground truth should be provided.

For full help, use:

$ ./spk-change-performance.py -h

spk-clustering.py

Performs speaker turn clustering over audio. It requires a speaker segmentation recipe in AKU format, such as those created with spk-change-detection.py, and a features file for each wav file to process, in the format outputted by the feacat program of the AKU suite.

For full help, use:

$ ./spk-clustering.py -h

spk-time.py

Calculates per-speaker speaking time from a speaker-tagged recipe in AKU format.

For full help, use:

$ ./spk-time.py -h

spk-diarization2.py

Performs full speaker diarization over media file. If the media is not a wav file it tries to convert it to wav using ffmpeg. It then calls generate_exp.py, voice-detection.py, spk-change-detection.py and spk-clustering.py in succession.

For full help, use:

$ ./spk-diarization2.py -h

Notes:

  • Paths for the other scripts and features must be provided.
  • Since this script is a convenient wrapper for the other scripts of the family, it doesn't have options for all the settings of the other scripts, just some defaults. If you want to tune them, edit this script directly.
  • Some scripts have a 2 version. Usage of that one is preferable.

Contributors

Brendan Cunnie (@saintbrendan, [email protected]) and Brian Cunnie (@cunnie, [email protected]) contributed the Dockerfile. Tran Tu (@tran2, [email protected]) added ffmpeg to it for non-wav files support.

More Repositories

1

morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
Python
175
star
2

AaltoASR

Aalto Automatic Speech Recognition tools
C++
83
star
3

subword-kaldi

Properly handle position-dependent phones in a subword lexicon FST
Python
31
star
4

interspeech2019_karhila_et_al

Compendium for the paper "Transparent pronunciation scoring using articulatorily weighted phoneme edit distance" by Karhila, Smolander, Ylinen & Kurimo submitted to Interspeech 2019
Jupyter Notebook
24
star
5

flatcat

Morfessor FlatCat
Python
12
star
6

finnish-forced-alignment

Python
10
star
7

finnish-parliament-scripts

Scripts for retrieving and aligning speech and meeting transcripts from the web portal of the Parliament of Finland (https://www.eduskunta.fi)
Python
9
star
8

Wav2vec2Interpretation

scripts and images for article "Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model"
Python
8
star
9

FinChat

FinChat corpus and evaluation set
Jupyter Notebook
7
star
10

exchange

Bigram exchange algorithm
C++
6
star
11

speechbrain-cl

Implementation of different curriculum learning (CL) methods for speechbrain's ASR recipes.
Python
5
star
12

AaltoASR-online-demo

C++
4
star
13

avsr

Audio-visual speech recognition models
Jupyter Notebook
4
star
14

fin-parl-models

Baseline Finnish models trained with Finnish Parliament Speech corpus
Shell
3
star
15

fi-parliament-tools

Tools for downloading and processing Finnish parliament data
Python
3
star
16

ner-asr

Named Entity Recognition for Finnish Language
Python
3
star
17

ftk

This toolkit contains programs for segmenting strings and training string segmentation models. It has been developed primarily for learning units for speech recognition, but can be used for other purposes as well.
C++
3
star
18

kaldi-sb-north-sme

Kaldi + SpeechBrain + W2V2 models for Northern Sami
Python
3
star
19

Topic-identification-for-spontaneous-Finnish-speech

Python
2
star
20

rl-klm

RL-KLM implementation that can be used to estimate task completion times for user interface.
Python
2
star
21

modules

Installation scripts for used modules in Aalto ASR research group
Shell
2
star
22

moodle-mod_digitala

DigiTala is a Moodle plugin for assessing L2 Finnish and Swedish speech automatically. Cite as: "von Zansen, A., Alanen, T., Al-Ghezi, R., Erkkilä, J., Harjunpää, T., Heijala, M., Kallio, H. (2022). DigiTala Moodle plugin. https://github.com/aalto-speech/moodle-mod_digitala "
PHP
2
star
23

say-it-again-kid-pronunciation-learning

Privacy policies for the language learning games developed in collaboration with University of Helsinki Cognitive Brain Research group.
HTML
2
star
24

sb-fin-parl-models

SpeechBrain baseline recipes for Finnish Parliament data
Python
1
star
25

ComParE2023

Code repository for the experiments conducted for the ComParE 2023 challenge.
Python
1
star
26

wdecoder

Decoders for AaltoASR acoustic models.
Lex
1
star
27

fin-parl-lahjoita-puhetta-s5

Speech Recognition experiments combining Lahjoita Puhetta with Finnish Parliament
Python
1
star
28

conversation-assistant

Conversation Assistant iOS-app and Kaldi ASR server for real-time automatic speech recognition in conversational situations.
Python
1
star
29

lahjoita-puhetta-metadata-classification

Python
1
star
30

finnish_chatbot

Python
1
star
31

FinnishXL

Code Base for Transformer-XL on Finnish Language
Python
1
star
32

aalto-asr-preprocessor

Aalto ASR preprocessing tool for preparing texts.
Python
1
star
33

speechbrain-lahjoita-puhetta-baseline

Baseline E2E AED model for Lahjoita Puhetta in SpeechBrain
Python
1
star
34

l2-speech-scoring-tools

Implementation of automatic speech rating systems for second language (L2) learners of Finnish and Finland Swedish
Jupyter Notebook
1
star
35

lahjoita-puhetta-baseline-wav2vec2

Baseline self-supervised Wav2Vec2 ASR system for Lahjoita puhetta corpus
Python
1
star
36

BizSpeech_SpeechBrain

Building an ASR system recipe for BizSpeech data using SpeechBrain.
Python
1
star
37

lahjoita-puhetta-resources

A collection of resources related to the Lahjoita puhetta speech corpus.
1
star
38

Compare2020

Aalto's solutions for the 2020 Computational Paralinguistics Challenges: Breathing & Masks
1
star