• Stars
    star
    1,117
  • Rank 41,475 (Top 0.9 %)
  • Language Cuda
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

FSA/FST algorithms, differentiable, with PyTorch compatibility.

k2

The vision of k2 is to be able to seamlessly integrate Finite State Automaton (FSA) and Finite State Transducer (FST) algorithms into autograd-based machine learning toolkits like PyTorch and TensorFlow. For speech recognition applications, this should make it easy to interpolate and combine various training objectives such as cross-entropy, CTC and MMI and to jointly optimize a speech recognition system with multiple decoding passes including lattice rescoring and confidence estimation. We hope k2 will have many other applications as well.

One of the key algorithms that we have implemented is pruned composition of a generic FSA with a "dense" FSA (i.e. one that corresponds to log-probs of symbols at the output of a neural network). This can be used as a fast implementation of decoding for ASR, and for CTC and LF-MMI training. This won't give a direct advantage in terms of Word Error Rate when compared with existing technology; but the point is to do this in a much more general and extensible framework to allow further development of ASR technology.

Implementation

A few key points on our implementation strategy.

Most of the code is in C++ and CUDA. We implement a templated class Ragged, which is quite like TensorFlow's RaggedTensor (actually we came up with the design independently, and were later told that TensorFlow was using the same ideas). Despite a close similarity at the level of data structures, the design is quite different from TensorFlow and PyTorch. Most of the time we don't use composition of simple operations, but rely on C++11 lambdas defined directly in the C++ implementations of algorithms. The code in these lambdas operate directly on data pointers and, if the backend is CUDA, they can run in parallel for each element of a tensor. (The C++ and CUDA code is mixed together and the CUDA kernels get instantiated via templates).

It is difficult to adequately describe what we are doing with these Ragged objects without going in detail through the code. The algorithms look very different from the way you would code them on CPU because of the need to avoid sequential processing. We are using coding patterns that make the most expensive parts of the computations "embarrassingly parallelizable"; the only somewhat nontrivial CUDA operations are generally reduction-type operations such as exclusive-prefix-sum, for which we use NVidia's cub library. Our design is not too specific to the NVidia hardware and the bulk of the code we write is fairly normal-looking C++; the nontrivial CUDA programming is mostly done via the cub library, parts of which we wrap with our own convenient interface.

The Finite State Automaton object is then implemented as a Ragged tensor templated on a specific data type (a struct representing an arc in the automaton).

Autograd

If you look at the code as it exists now, you won't find any references to autograd. The design is quite different to TensorFlow and PyTorch (which is why we didn't simply extend one of those toolkits). Instead of making autograd come from the bottom up (by making individual operations differentiable) we are implementing it from the top down, which is much more efficient in this case (and will tend to have better roundoff properties).

An example: suppose we are finding the best path of an FSA, and we need derivatives. We implement this by keeping track of, for each arc in the output best-path, which input arc it corresponds to. (For more complex algorithms an arc in the output might correspond to a sum of probabilities of a list of input arcs). We can make this compatible with PyTorch/TensorFlow autograd at the Python level, by, for example, defining a Function class in PyTorch that remembers this relationship between the arcs and does the appropriate (sparse) operations to propagate back the derivatives w.r.t. the weights.

Current state of the code

We have wrapped all the C++ code to Python with pybind11 and have finished the integration with PyTorch.

We are currently writing speech recognition recipes using k2, which are hosted in a separate repository. Please see https://github.com/k2-fsa/icefall.

Plans after initial release

We are currently trying to make k2 ready for production use (see the branch v2.0-pre).

Quick start

Want to try it out without installing anything? We have setup a Google Colab. You can find more Colab notebooks using k2 in speech recognition at https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html.

More Repositories

1

sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
C++
3,307
star
2

sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
C++
1,001
star
3

icefall

Python
906
star
4

sherpa

Speech-to-text server framework with next-gen Kaldi
C++
539
star
5

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Python
166
star
6

snowfall

Moved to https://github.com/k2-fsa/icefall
Python
143
star
7

fast_rnnt

A torch implementation of a recursion which turns out to be useful for RNN-T.
Python
136
star
8

text_search

Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
Python
53
star
9

kaldifst

Python wrapper for OpenFST and its extensions from Kaldi. Also support reading/writing ark/scp files
C++
41
star
10

multi_quantization

Python
40
star
11

next-gen-kaldi-wechat

32
star
12

kaldi-decoder

Decoders from Kaldi using OpenFst
C++
18
star
13

colab

Colab notebooks for Next-gen Kaldi
Jupyter Notebook
14
star
14

analyze_diagnostics

Scripts for analyzing the output of icefall's "diagnostics.py" code (--print-diagnostics=True option)
Perl
5
star
15

k2-fsa-www

Source for next-gen Kaldi home page.
JavaScript
4
star
16

sherpa-torch-cpp-makefile-example

C++
4
star
17

divide_lm

Python
4
star
18

sherpa-onnx-go

sherpa-onnx Go package for speech recognition without network access, supporting Linux, macOS, Windows
Go
2
star
19

sherpa-onnx-go-windows

sherpa-onnx Go package for Windows
C
1
star
20

sherpa-ncnn-go-linux

sherpa-ncnn Go package for Linux
C
1
star