• Stars
    star
    128
  • Rank 281,044 (Top 6 %)
  • Language
    Python
  • Created about 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mongolian speech recognition with PyTorch

An online demo trained with a Mongolian proprietary dataset (WER 8%): https://chimege.mn/.

In this repo, following papers are implemented:

This repo is partially based on:

Training

  1. Install PyTorch>=1.3 with conda
  2. Install remaining dependencies: pip install -r requirements.txt
  3. Download the Mongolian Bible dataset: cd datasets && python dl_mbspeech.py
  4. Pre compute the mel spectrograms: python preprop_dataset.py --dataset mbspeech
  5. Train: python train.py --model crnn --max-epochs 50 --dataset mbspeech --lr-warmup-steps 100
    • logs for the TensorBoard are saved in the folder logdir

Results

During the training, the ground truth and recognized texts are logged into the TensorBoard. Because the dataset contains only a single person, the predicted texts from the validation set should be already recognizable after few epochs:

EXPECTED:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

PREDICTED:

аливаа цус хувцсан дээр үсэрхэд цус усарсан хэсхийг та нар ариун газарт угаагтун

For fun, you can also generate an audio with a Mongolian TTS and try to recognize it. The following code generates an audio with the TTS of the Mongolian National University and does speech recognition on that generated audio:

# generate audio for 'Миний төрсөн нутаг Монголын сайхан орон'
wget -O test.wav "http://172.104.34.197/nlp-web-demo/tts?voice=1&text=Миний төрсөн нутаг Монголын сайхан орон."
# speech recognition on that TTS generated audio
python transcribe.py --checkpoint=logdir/mbspeech_crnn_sgd_wd1e-05/epoch-0050.pth --model=crnn test.wav
# will output: 'миний төрсөн нут мөнголын сайхан оөрулн'

It is also possible to use a KenLM binary model. First download it from tugstugi/mongolian-nlp. After that, install parlance/ctcdecode. Now you can transcribe with the language model:

python transcribe.py --checkpoint=path/to/checkpoint --lm=mn_5gram.binary --alpha=0.3 test.wav

Contribute

If you are Mongolian and want to help us, please record your voice on Common Voice.