Tensorflow Speech Recognition
Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks.
Replaces caffe-speech-recognition, see there for some background.
DeepSpeech
Update Mozilla releasedThey achieve good error rates. Free Speech is in good hands, go there if you are an end user. For now this project is only maintained for educational purposes.
Ultimate goal
Create a decent standalone speech recognition for Linux etc. Some people say we have the models but not enough training data. We disagree: There is plenty of training data (100GB here and 21GB here on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...
Sample spectrogram, Karen uttering 'zero' with 160 words per minute.
Installation
clone code
git clone https://github.com/pannous/tensorflow-speech-recognition
cd tensorflow-speech-recognition
git clone https://github.com/pannous/layer.git
git clone https://github.com/pannous/tensorpeers.git
pyaudio
http://www.portaudio.com/
requirements portaudio fromgit clone https://git.assembla.com/portaudio.git
./configure --prefix=/path/to/your/local
make
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib
export LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib
export CPATH=$CPATH:/path/to/your/local/include
source ~/.bashrc
install pyaudio
pip install pyaudio
Getting started
Toy examples:
./number_classifier_tflearn.py
./speaker_classifier_tflearn.py
Some less trivial architectures:
./densenet_layer.py
Later:
./train.sh
./record.py
Update: Nervana demonstrated that it is possible for 'independents' to build speech recognizers that are state of the art.
Fun tasks for newcomers
- Watch video : https://www.youtube.com/watch?v=u9FPqkuoEJ8
- Understand and correct the corresponding code: lstm-tflearn.py
- Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...
Extensions
Extensions to current tensorflow which are probably needed:
- WarpCTC on the GPU see issue
- Incremental collaborative snapshots ('P2P learning') !
- Modular graphs/models + persistance
Even though this project is far from finished we hope it gives you some starting points.
Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to [email protected]