• Stars
    star
    170
  • Rank 223,357 (Top 5 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.

py-kaldi-asr

Some simple wrappers around kaldi-asr intended to make using kaldi's online nnet3-chain decoders as convenient as possible. Kaldi's online GMM decoders are also supported.

Target audience are developers who would like to use kaldi-asr as-is for speech recognition in their application on GNU/Linux operating systems.

Constructive comments, patches and pull-requests are very welcome.

Getting Started

We recommend using pre-trained modules from the zamia-speech project to get started. There you will also find a tutorial complete with links to pre-built binary packages to get you up and running with free and open source speech recognition in a matter of minutes:

Zamia Speech Tutorial

Example Code

Simple wav file decoding:

from kaldiasr.nnet3 import KaldiNNet3OnlineModel, KaldiNNet3OnlineDecoder

MODELDIR    = 'data/models/kaldi-generic-en-tdnn_sp-latest'
WAVFILE     = 'data/dw961.wav'

kaldi_model = KaldiNNet3OnlineModel (MODELDIR)
decoder     = KaldiNNet3OnlineDecoder (kaldi_model)

if decoder.decode_wav_file(WAVFILE):

    s, l = decoder.get_decoded_string()

    print
    print u"*****************************************************************"
    print u"**", WAVFILE
    print u"**", s
    print u"** %s likelihood:" % MODELDIR, l
    print u"*****************************************************************"
    print

else:

    print "***ERROR: decoding of %s failed." % WAVFILE

Please check the examples directory for more example code.

Requirements

Setup Notes

Source

At the time of this writing kaldi-asr does not seem to have an official way to install it on a system.

So, for now we will rely on pkg-config to provide LIBS and CFLAGS for compilation: Create a file called kaldi-asr.pc somewhere in your PKG_CONFIG_PATH that provides this information - here is what such a file could look like (details depend on your OS environment):

kaldi_root=/opt/kaldi

Name: kaldi-asr
Description: kaldi-asr speech recognition toolkit
Version: 5.2
Requires: atlas
Libs: -L${kaldi_root}/tools/openfst/lib -L${kaldi_root}/src/lib -lkaldi-decoder -lkaldi-lat -lkaldi-fstext -lkaldi-hmm -lkaldi-feat -lkaldi-transform -lkaldi-gmm -lkaldi-tree -lkaldi-util -lkaldi-matrix -lkaldi-base -lkaldi-nnet3 -lkaldi-online2 -lkaldi-cudamatrix -lkaldi-ivector -lfst
Cflags: -I${kaldi_root}/src  -I${kaldi_root}/tools/openfst/include

make sure kaldi_root points to wherever your kaldi checkout lives in your filesystem.

ATLAS

You may need to install ATLAS headers even if you didn't need them to compile Kaldi.

$ sudo apt install libatlas-dev

License

My own code is Apache licensed unless otherwise noted in the script's copyright headers.

Some scripts and files are based on works of others, in those cases it is my intention to keep the original license intact. Please make sure to check the copyright headers inside for more information.

Author

Guenter Bartsch [email protected]
Kaldi 5.1 adaptation contributed by mariasmo https://github.com/mariasmo
Kaldi GMM model support contributed by David Zurow https://github.com/daanzu
Python > 3.5 support contributed by Jakob Kruse https://github.com/jakob1111996

More Repositories

1

zamia-speech

Open tools and data for cloudless automatic speech recognition
Python
443
star
2

zamia-ai

Free and open source A.I. system based on Python, TensorFlow and Prolog.
Prolog
159
star
3

drawilleplot

matplotlib backend for graph output in unicode terminals using drawille
Python
77
star
4

aqb

A BASIC Compiler and IDE for Amiga Computers
C
74
star
5

py-nltools

A collection of basic python modules for spoken natural language processing
Python
56
star
6

py-espeak-ng

Some simple wrappers around eSpeak NG intended to make using this excellent TTS for waveform and IPA generation as convenient as possible.
Python
38
star
7

kaldi-adapt-lm

Adapt Kaldi-ASR nnet3 chain models from Zamia-Speech.org to a different language model
Python
34
star
8

robinson

Tiny pyhton (cython) HTML layout engine with cairo surface rendering support
Python
20
star
9

zamia-prolog

Embeddable Prolog dialect implemented in pure Python. Stores its knowlegdebase using SQLAlchemy for scalability.
Python
19
star
10

py-vad-mh

Cython implementation of Moattar and Homayounpour's Voice Activity Detection (VAD) algorithm fast enough for real-time on an RPi 3.
Python
12
star
11

zerovox

zero-shot realtime TTS system, fully offline, free and open source
Python
12
star
12

zbrain

Infrastructure useful to create natural language processing systems based on transformer networks
Python
11
star
13

py-picotts

Python wrappers around SVOX Pico TTS
Python
9
star
14

py-marytts

Python MaryTTS HTTP client library
Python
8
star
15

zamia-tts

Tacotron based speech synthesizer
Python
8
star
16

tts-server

Simple REST-style HTTP TTS (text to speech) server based on MaryTTS, espeak and sequitur
Python
7
star
17

py-xsb

py-xsb is a Python - XSB bridge enabling querying XSB in your Python programs. It features a ctypes mapping of XSB's C-Interface as well as some higher level convenience wrappers.
Python
6
star
18

sparqlalchemy

Simple (buy hopefully reasonably efficient) implementation of an RDF triple store on top of SQLAlchemy
Python
4
star
19

cbmdisk

NODISKEMU based IEEE 488 + IEC floppy emulator with network/ftp support KiCAD schematic
C
4
star
20

FTS4

Amiga FTS4: File Transfer Serial, Version 4
C
3
star
21

g2p_de

A Simple Python Module for German Grapheme To Phoneme Conversion
Python
3
star
22

zamia-dist

Misc scripts and other files useful for building zamia/AI packages for various distributions
Shell
1
star
23

HTMLTerminal

HTML Terminal for RaspberryPi based embedded system using ZeroMQ for communication with host
Python
1
star