Audio and Speech Pre-trained Models
What is pre-trained Model?
A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.
Other Pre-trained Models
Framework
Model visualization
You can see visualizations of each model's network architecture by using Netron .
Tensorflow
Model Name
Description
Framework
Wavenet
This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation.
Tensorflow
Lip Reading
Cross Audio-Visual Recognition using 3D Architectures in TensorFlow
Tensorflow
MusicGenreClassification
Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University.
Tensorflow
Audioset
Models and supporting code for use with AudioSet.
Tensorflow
DeepSpeech
Automatic speech recognition.
Tensorflow
Keras
Model Name
Description
Framework
Ultrasound nerve segmentation
This tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation.
Keras
PyTorch
Model Name
Description
Framework
espnet
End-to-End Speech Processing Toolkit espnet.github.io/espnet
PyTorch
TTS
Deep learning for Text2Speech
PyTorch
Neural Sequence labeling model
Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation.
PyTorch
waveglow
A Flow-based Generative Network for Speech Synthesis.
PyTorch
deepvoice3_pytorch
PyTorch implementation of convolutional networks-based text-to-speech synthesis models.
PyTorch
deepspeech2
Implementation of DeepSpeech2 using Baidu Warp-CTC. Creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.
PyTorch
loop
A method to generate speech across multiple speakers.
PyTorch
audio
Simple audio I/O for pytorch.
PyTorch
speech
PyTorch ASR Implementation.
PyTorch
samplernn-pytorch
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.
PyTorch
torch_waveglow
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis.
PyTorch
MXNet
Model Name
Description
Framework
deepspeech
This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using
MXNet
mxnet-audio
Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet.
MXNet
Caffe
Model Name
Description
Framework
Speech Recognition
Speech Recognition with the caffe deep learning framework.
Caffe
Contributions
Your contributions are always welcome!!
Please have a look at contributing.md
License
MIT License