• Stars
    star
    324
  • Rank 129,708 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A CRF-based ASR Toolkit

CAT: CRF-based ASR Toolkit

CAT provides a complete workflow for CRF-based data-efficient end-to-end speech recognition.

Overview

CAT aims at combining the advantages of both the hybrid and the E2E ASR approches to achieve data-efficiency, by judiciously examining the pros and cons of modularity versus unified neural network, separate optimization versus joint optimization. CAT advocates global normalization modeling and discriminative training in the framework of Conditional Random Field (CRF), currently with Connectionist Temporal Classification (CTC) inspired state topology.

Features

  1. CAT contains a full-fledged CUDA/C/C++ implementation of CTC-CRF loss function binding to PyTorch.

  2. One-stop CTC/CTC-CRF/RNN-T/LM training & inference. See the templates.

  3. Flexible configuration with JSON. Check the guideline for configuration.

  4. Scalable and extensible. It is easy to be extended to train tens of thousands of speech data and add new models and tasks.

Installation

  1. Dependencies

    • CUDA compatible device, NVIDIA driver installed and CUDA lib available.

    • PyTorch: >=1.9.0 is required. Installation guide from PyTorch

    • Kaldi [optional, but recommended]: used for speech data preparation and some FST-related operations. This is optional for most of the basic functions. Required only when you want to conduct CTC-CRF training.

      Besides Kaldi, you could use torchaudio for feature extraction. Take a look at data.sh for how to prepare data with torchaudio.

  2. Clone and install CAT

    git clone https://github.com/thu-spmi/CAT.git && cd CAT
    # Get installation helping message
    ./install.sh -h
    # Install with default configurations
    #./install.sh

Getting started

To get started with this project, please refer to TEMPLATE for tutorial.

ASR results

dataset evaluation sets performance
AISHELL-1 dev / test 3.93 / 4.22
Commonvoice German test 9.8
Librispeech test-clean / test-other 1.94 / 4.39
Switchboard switchboard / callhome 6.9 / 14.5
THCHS30 test 6.01
Wenetspeech test-net / test-meeting 9.32 / 14.66
WSJ eval92 / dev93 2.77 / 5.68

Further reading

Citation

@inproceedings{xiang2019crf,
  title={CRF-based single-stage acoustic modeling with CTC topology},
  author={Xiang, Hongyu and Ou, Zhijian},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5676--5680},
  year={2019},
  organization={IEEE}
}

@inproceedings{an2020cat,
  title={CAT: A CTC-CRF based ASR toolkit bridging the hybrid and the end-to-end approaches towards data efficiency and low latency},
  author={An, Keyu and Xiang, Hongyu and Ou, Zhijian},
  booktitle={INTERSPEECH},
  pages={566--570},
  year={2020}
}

More Repositories

1

ASR-Benchmarks

An effort to track benchmarking results over widely-used datasets for ASR.
43
star
2

damd-multiwoz

Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context, AAAI 2020.
Python
43
star
3

LABES

Yichi Zhang et al. A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning. EMNLP 2020.
Python
21
star
4

ST-NAS

Efficient Neural Architecture Search via Straight-Through Gradients
Python
13
star
5

PARG

Paraphrase Augmented Task-Oriented Dialog Generation, ACL 2020
Python
7
star
6

MGA

Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems. SLT 2022.
Python
4
star
7

GUS

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems. EMNLP 2022 SereTOD Workshop.
Python
4
star
8

DASI

Python
3
star
9

Inclusive-NRF

Python
3
star
10

VLS-GPT

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog System
Python
3
star
11

MGA-dev

Markovian Generative Architectures for Efficient Task-Oriented Dialog Systems
Python
2
star
12

semi-EBM

Python
2
star
13

seq2seq-JAE

Semi-supervised Seq2seq Joint-stochastic-approximation Autoencoders with Applications to Semantic Parsing, SPL 2019.
Python
2
star
14

JSA-KRTOD

Python
2
star
15

JSA

Joint Stochastic Approximation and its Application to Learning Discrete Latent Variable Models, UAI 2020.
Python
1
star
16

RAG-CoT

Code for "An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought"
Python
1
star