• Stars
    star
    858
  • Rank 53,134 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 8 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of links and notes on forced alignment tools

forced-alignment-tools

A collection of links and notes on forced alignment tools

Did I miss an aligner? Please open an issue or directly fork-commit-pullrequest.

Definition of Forced Alignment

Given an audio file containing speech, and the corresponding transcript, computing a forced alignment is the process of determining, for each fragment of the transcript, the time interval (in the audio file) containing the spoken text of the fragment.

A text fragment can have arbitrary granularity:

  • a paragraph,
  • a sentence,
  • a portion of a sentence (i.e., a group of words),
  • a word, or
  • a phoneme (i.e., a single sound).

For example, given this text file and this audio file, a force aligment at verse-level can be the following:

1                                                     => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
...
Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

Typical applications of forced alignment include Audio-eBooks, closed captioning, and automating the creation of training data for automated speech recognition systems.

Programs and Libraries

The following matrix contains open source programs and libraries for computing forced alignments that have been actually proven to install and run (albeit the installation procedure for some of them is pretty complex).

All tools, except aeneas, are based on speech recognition algorithms; all tools, except aeneas and gentle, are maintained by research groups or individuals in academia.

Most tools are based on the HTK, which is not free for commercial purposes, although a commercial license can be purchased from the University of Cambridge.

You can also download the raw data file in JSON format.

Name Algorithm Supported Language(s) Interface Code Language(s) License Documentation Mailing List/Forum Active Notes
aeneas DTW 30+ CLI, LIB, Web Python, C AGPL Y Y Y Not based on ASR
CMU Sphinx HMM (own), RNN 11 CLI, LIB C, Java, Python MIT-like Y Y Y
DARLA HMM (HTK) English Web ? ? Y N N? Based on Prosodylab-Aligner or YouTube ASR
FAVE-align HMM (HTK) English CLI, (Web) Python GPL Y Y Y acustic models from P2FA; GitHub code updated more frequently than Web
Gentle HMM (Kaldi) English CLI, Web Python MIT N N Y Based on Kaldi
Julius HMM (own) English, Japanese CLI, LIB C MIT-like Y Y N?
Kaldi HMM (own), DNN, RNN English CLI, LIB C++ Apache Y Y Y CUDA support
kaldi-dnn-ali-gop HMM(Kaldi), DNN(Kaldi nnet3) English CLI, LIB Shell Script, C++, Python GPL N N Y Work with other languages given kaldi acoustic models
LaBB-CAT HMM (HTK) English Web Java GPL Y Y Y
MAUS HMM (HTK) 21 CLI, Web C All rights reserved README Y Y
Montreal Forced Aligner HMM (Kaldi) English CLI Python MIT Y N Y Can train other languages
Penn Forced Aligner (P2FA) HMM (HTK) English CLI, Web Python ? README, Tutorial N N?
Prosodylab-Aligner HMM (HTK) English CLI Python MIT README, Tutorial N Y Can train other languages
SailAlign HMM (HTK) English, Greek, Spanish CLI Perl GPL README N N?
SPPAS HMM (Julius) 12+ CLI, GUI Python GPL Y Y Y Can train other language, several plugins

Additional Pointers

More Repositories

1

epub3reader

EPUB3Reader Android App
Java
229
star
2

penelope

Penelope is a multi-tool for creating, editing and converting dictionaries, especially for eReader devices
Python
200
star
3

export-kobo

A Python tool to export annotations and highlights from a Kobo SQLite file.
Python
123
star
4

ipapy

ipapy is a Python module to work with International Phonetic Alphabet (IPA) strings
Python
80
star
5

glyphIgo

glyphIgo is a Swiss Army knife for dealing with fonts and EPUB eBooks
Python
75
star
6

asciicast2gif

asciicast2gif converts asciicast files to animated GIF files
HTML
39
star
7

setup-festival-mbrola

A Bash script for Linux and Mac OS X to download/compile/install Festival, MBROLA, and voice files.
Scheme
25
star
8

python-on-windows

A step-by-step guide on installing Python and using the Command Prompt for Windows
18
star
9

yael

yael (Yet Another EPUB Library) is a Python library for reading, manipulating, and writing EPUB 2/3 files
Python
17
star
10

fsfs3

Scripts to convert "Free Software, Free Society: Selected Essays of Richard M. Stallman, 3rd Edition" from Texinfo to EPUB and MOBI
HTML
16
star
11

rb_smil_emulator

This JavaScript enables the tap-to-play function on those platforms that do not properly support the EPUB 3 Media Overlay specification (most notably, Apple iBooks).
JavaScript
12
star
12

icarus

icarus is a Sigil plugin to create EPUB 3 Audio-eBooks
Python
12
star
13

nvdb

A Python script to extract the plain-text data from the NVdB PDF file.
Python
8
star
14

wiktts

Mining MediaWiki dumps to create better TTS engines (using Machine Learning)
Python
5
star
15

elzzur

elzzur solves a Ruzzle board, listing all the valid words with their score
Python
5
star
16

espeakng.js-cdn

CDN files for espeakng.js, the Javascript port of eSpeak-ng TTS
JavaScript
5
star
17

Pushover-game

Mirror of the sf repo of the reimplementation of the Pushover puzzle game (Ocean, 1992)
Shell
4
star
18

ink2fxl

ink2fxl converts Inkscape SVG files to XHTML+(SVG)+CSS, XHTML+raster (layer-wise), or raster-only (layer-wise).
Python
4
star
19

exlibris

exlibris is a tool for adding an ex libris to EPUB eBooks, available as a CLI tool, a GUI tool, and Calibre plugin.
Python
3
star
20

espeakosx

A simple Bash script to compile eSpeak on Mac OS X
Shell
2
star
21

awesome-config

My configuration file and theme for the awesome desktop manager
Lua
1
star