• Stars
    star
    421
  • Rank 102,337 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast and lightweight python-based CTC beam search decoder for speech recognition.

pyctcdecode

A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support similar to PaddlePaddle's decoder, but incorporating many new features such as byte pair encoding and real-time decoding to support models like Nvidia's Conformer-CTC or Facebook's Wav2Vec2.

pip install pyctcdecode

Main Features:

  • 🔥 hotword boosting
  • 🤖 handling of BPE vocabulary
  • 👥 multi-LM support for 2+ models
  • 🕒 stateful LM for real-time decoding
  •  native frame index annotation of words
  • 💨 fast runtime, comparable to C++ implementation
  • 🐍 easy-to-modify Python code

Quick Start:

from pyctcdecode import build_ctcdecoder

# specify alphabet labels as they appear in logits
labels = [
    " ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l",
    "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
]

# prepare decoder and decode logits via shallow fusion
decoder = build_ctcdecoder(
    labels,
    kenlm_model_path="/my/dir/kenlm_model.arpa",  # either .arpa or .bin file
    alpha=0.5,  # tuned on a val set
    beta=1.0,  # tuned on a val set
)
text = decoder.decode(logits)

If the vocabulary is BPE-based, pyctcdecode will automatically recognize that and handled token merging automatically.

(Note: the LM itself has no notion of this and is still word-based.)

labels = ["<unk>", "▁bug", "s", "▁bunny"]

decoder = build_ctcdecoder(
    labels,
    kenlm_model_path,
)
text = decoder.decode(logits)

Improve domain specificity by adding important contextual words ("hotwords") during inference:

hotwords = ["looney tunes", "anthropomorphic"]
text = decoder.decode(
    logits,
    hotwords=hotwords,
    hotword_weight=10.0,
)

(Note: pyctcdecode contains several free hyperparameters that can strongly influence error rate and wall time. Default values for these parameters were (merely) chosen in order to yield good performance for one particular use case. For best results, especially when working with languages other than English, users are encouraged to perform a hyperparameter optimization study on their own data.)

Batch support via multiprocessing:

import multiprocessing

with multiprocessing.get_context("fork").Pool() as pool:
    text_list = decoder.decode_batch(pool, logits_list)

Use pyctcdecode for a pretrained Conformer-CTC model:

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(
  model_name='stt_en_conformer_ctc_small'
)
logits = asr_model.transcribe(["my_file.wav"], logprobs=True)[0]

decoder = build_ctcdecoder(asr_model.decoder.vocabulary)
decoder.decode(logits)

The tutorials folder contains many well documented notebook examples on how to run speech recognition using pretrained models from Nvidia's NeMo and Huggingface/Facebook's Wav2Vec2.

For more details on how to use all of pyctcdecode's features, have a look at our main tutorial.

Why pyctcdecode?

In scientific computing, there’s often a tension between a language’s performance and its ease of use for prototyping and experimentation. Although C++ is the conventional choice for CTC decoders, we decided to try building one in Python. This choice allowed us to easily implement experimental features, while keeping runtime competitive through optimizations like caching and beam pruning. We compare the performance of pyctcdecode to an industry standard C++ decoder at various beam widths (shown as inline annotations), allowing us to visualize the trade-off of word error rate (y-axis) vs runtime (x-axis). For beam widths of 10 or greater, pyctcdecode yields strictly superior performance, with lower error rates in less time, see code here.

The use of Python allows us to easily implement features like hotword support with only a few lines of code.

pyctcdecode can return either a single transcript, or the full results of the beam search algorithm. The latter provides the language model state to enable real-time inference as well as word-based logit indices (frames) to enable word-based timing and confidence score calculations natively through the decoding process.

Additional features such as BPE vocabulary, as well as examples of pyctcdecode as part of a full speech recognition pipeline, can be found in the tutorials section.

Resources:

License:

Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2021-present Kensho Technologies, LLC. The present date is determined by the timestamp of the most recent commit in the repository.

More Repositories

1

graphql-compiler

Turn complex GraphQL queries into optimized database queries.
Python
550
star
2

orama

Plug and play React charts
JavaScript
123
star
3

qwikidata

Python tools for interacting with Wikidata
Python
121
star
4

pytest-annotate

Generate PyAnnotate annotations from your pytest tests.
Python
107
star
5

check-more-types

Lots and lots of predicates and checks for JavaScript (Node/Browser)
JavaScript
65
star
6

sequence_align

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.
Python
63
star
7

grift

A clean approach to app configuration
Python
54
star
8

bubs

Keras Implementation of Flair's Contextualized Embeddings
Python
27
star
9

ng-alertify

[DEPRECATED] AngularJS wrapper around alertify popup library
JavaScript
18
star
10

special_k

Safe serialization of ML models
Python
16
star
11

kwnlp-sql-parser

Utilities for parsing Wikipedia MySQL/MariaDB dumps.
Python
11
star
12

eagr

Python gRPC servers and clients, made friendlier
Python
11
star
13

game-of-graphql

Demo project for the GraphQL compiler using Game of Thrones data
Python
9
star
14

eslint-config

Shared ESLint config
JavaScript
5
star
15

wikiwhatsthis

An opensource R&D repo for the WikiWhatsThis project
Python
4
star
16

kwnlp-preprocessor

Download, parse, and convert raw Wikimedia data into standard formats.
Python
3
star
17

babel-preset

Babel preset to transpile ES2020, TS(X), and language proposals
JavaScript
2
star
18

prettier-config

Shared Prettier config
TypeScript
2
star
19

benchmarks-pipeline

Python
2
star
20

tsconfig

Shared TypeScript config used across Kensho projects
TypeScript
1
star
21

kwnlp-dump-downloader

Utilities for downloading and checking the status of Wikimedia dumps.
Python
1
star
22

kensho-service-auth

Example code and documentation for authenticating with Kensho services
Java
1
star