• Stars
    star
    146
  • Rank 252,769 (Top 5 %)
  • Language
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of Audio and Speech pre-trained models.

Maintenance GitHub GitHub GitHub

Audio and Speech Pre-trained Models

NLP logo

What is pre-trained Model?

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.

Other Pre-trained Models

Framework

Model visualization

You can see visualizations of each model's network architecture by using Netron.

NLP logo

Tensorflow

Model Name Description Framework
Wavenet This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation. Tensorflow
Lip Reading Cross Audio-Visual Recognition using 3D Architectures in TensorFlow Tensorflow
MusicGenreClassification Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University. Tensorflow
Audioset Models and supporting code for use with AudioSet. Tensorflow
DeepSpeech Automatic speech recognition. Tensorflow

Keras

Model Name Description Framework
Ultrasound nerve segmentation This tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation. Keras

PyTorch

Model Name Description Framework
espnet End-to-End Speech Processing Toolkit espnet.github.io/espnet PyTorch
TTS Deep learning for Text2Speech PyTorch
Neural Sequence labeling model Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. PyTorch
waveglow A Flow-based Generative Network for Speech Synthesis. PyTorch
deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models. PyTorch
deepspeech2 Implementation of DeepSpeech2 using Baidu Warp-CTC. Creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function. PyTorch
loop A method to generate speech across multiple speakers. PyTorch
audio Simple audio I/O for pytorch. PyTorch
speech PyTorch ASR Implementation. PyTorch
samplernn-pytorch PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. PyTorch
torch_waveglow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis. PyTorch

MXNet

Model Name Description Framework
deepspeech This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using MXNet
mxnet-audio Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet. MXNet

Caffe

Model Name Description Framework
Speech Recognition Speech Recognition with the caffe deep learning framework. Caffe

Contributions

Your contributions are always welcome!! Please have a look at contributing.md

License

MIT License