Introduction
icefall contains ASR recipes for various datasets using https://github.com/k2-fsa/k2.
You can use https://github.com/k2-fsa/sherpa to deploy models trained with icefall.
You can try pre-trained models from within your browser without the need to download or install anything by visiting https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition See https://k2-fsa.github.io/icefall/huggingface/spaces.html for more details.
Installation
Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.
Recipes
Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.
We provide the following recipes:
- yesno
- LibriSpeech
- GigaSpeech
- Aishell
- Aishell2
- Aishell4
- TIMIT
- TED-LIUM3
- Aidatatang_200zh
- WenetSpeech
- Alimeeting
- TAL_CSASR
yesno
This is the simplest ASR recipe in icefall
and can be run on CPU.
Training takes less than 30 seconds and gives you the following WER:
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
We provide a Colab notebook for this recipe:
LibriSpeech
Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.
We provide 5 models for this recipe:
- conformer CTC model
- TDNN LSTM CTC model
- Transducer: Conformer encoder + LSTM decoder
- Transducer: Conformer encoder + Embedding decoder
- Transducer: Zipformer encoder + Embedding decoder
Conformer CTC Model
The best WER we currently have is:
test-clean | test-other | |
---|---|---|
WER | 2.42 | 5.73 |
We provide a Colab notebook to run a pre-trained conformer CTC model:
TDNN LSTM CTC Model
The WER for this model is:
test-clean | test-other | |
---|---|---|
WER | 6.59 | 17.69 |
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:
Transducer: Conformer encoder + LSTM decoder
Using Conformer as encoder and LSTM as decoder.
The best WER with greedy search is:
test-clean | test-other | |
---|---|---|
WER | 3.07 | 7.51 |
We provide a Colab notebook to run a pre-trained RNN-T conformer model:
Transducer: Conformer encoder + Embedding decoder
Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.
The best WER using modified beam search with beam size 4 is:
test-clean | test-other | |
---|---|---|
WER | 2.56 | 6.27 |
Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model:
k2 pruned RNN-T
Encoder | Params | test-clean | test-other |
---|---|---|---|
zipformer | 65.5M | 2.21 | 4.91 |
zipformer-small | 23.2M | 2.46 | 5.83 |
zipformer-large | 148.4M | 2.11 | 4.77 |
Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
k2 pruned RNN-T + GigaSpeech
test-clean | test-other | |
---|---|---|
WER | 1.78 | 4.08 |
Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
k2 pruned RNN-T + GigaSpeech + CommonVoice
test-clean | test-other | |
---|---|---|
WER | 1.90 | 3.98 |
Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
GigaSpeech
We provide two models for this recipe: Conformer CTC model and Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Conformer CTC
Dev | Test | |
---|---|---|
WER | 10.47 | 10.58 |
Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
Dev | Test | |
---|---|---|
greedy search | 10.51 | 10.73 |
fast beam search | 10.50 | 10.69 |
modified beam search | 10.40 | 10.51 |
Aishell
We provide three models for this recipe: conformer CTC model, TDNN LSTM CTC model, and Transducer Stateless Model,
Conformer CTC Model
The best CER we currently have is:
test | |
---|---|
CER | 4.26 |
TDNN LSTM CTC Model
The CER for this model is:
test | |
---|---|
CER | 10.16 |
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:
Transducer Stateless Model
The best CER we currently have is:
test | |
---|---|
CER | 4.38 |
We provide a Colab notebook to run a pre-trained TransducerStateless model:
Aishell2
We provide one model for this recipe: Transducer Stateless Model.
Transducer Stateless Model
The best WER we currently have is:
dev-ios | test-ios | |
---|---|---|
WER | 5.32 | 5.56 |
Aishell4
We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
The best CER we currently have is:
test | |
---|---|
CER | 29.08 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:
TIMIT
We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.
TDNN LSTM CTC Model
The best PER we currently have is:
TEST | |
---|---|
PER | 19.71% |
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:
TDNN LiGRU CTC Model
The PER for this model is:
TEST | |
---|---|
PER | 17.66% |
We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:
TED-LIUM3
We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Transducer Stateless: Conformer encoder + Embedding decoder
The best WER using modified beam search with beam size 4 is:
dev | test | |
---|---|---|
WER | 6.91 | 6.33 |
Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
We provide a Colab notebook to run a pre-trained Transducer Stateless model:
Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
The best WER using modified beam search with beam size 4 is:
dev | test | |
---|---|---|
WER | 6.77 | 6.14 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:
Aidatatang_200zh
We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
Dev | Test | |
---|---|---|
greedy search | 5.53 | 6.59 |
fast beam search | 5.30 | 6.34 |
modified beam search | 5.27 | 6.33 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:
WenetSpeech
We provide some models for this recipe: Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss and Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)
Dev | Test-Net | Test-Meeting | |
---|---|---|---|
greedy search | 7.80 | 8.75 | 13.49 |
modified beam search | 7.76 | 8.71 | 13.41 |
fast beam search | 7.94 | 8.74 | 13.80 |
Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)
Streaming:
Dev | Test-Net | Test-Meeting | |
---|---|---|---|
greedy_search | 8.78 | 10.12 | 16.16 |
modified_beam_search | 8.53 | 9.95 | 15.81 |
fast_beam_search | 9.01 | 10.47 | 16.28 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model:
Alimeeting
We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)
Eval | Test-Net | |
---|---|---|
greedy search | 31.77 | 34.66 |
fast beam search | 31.39 | 33.02 |
modified beam search | 30.38 | 34.25 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:
TAL_CSASR
We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.
Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
The best results for Chinese CER(%) and English WER(%) respectivly (zh: Chinese, en: English):
decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
---|---|---|---|---|---|---|
greedy_search | 7.30 | 6.48 | 19.19 | 7.39 | 6.66 | 19.13 |
modified_beam_search | 7.15 | 6.35 | 18.95 | 7.22 | 6.50 | 18.70 |
fast_beam_search | 7.18 | 6.39 | 18.90 | 7.27 | 6.55 | 18.77 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:
Deployment with C++
Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.
Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.
We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: