• Stars
    star
    186
  • Rank 207,316 (Top 5 %)
  • Language
    HTML
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is the end-to-end Speech Recognition neural network, deployed in Keras. This was my final project for Artificial Intelligence Nanodegree @Udacity.

Project Overview

In this notebook, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline!

ASR Pipeline

We begin by investigating the LibriSpeech dataset that will be used to train and evaluate your models. Your algorithm will first convert any raw audio to feature representations that are commonly used for ASR. You will then move on to building neural networks that can map these audio features to transcribed text. After learning about the basic types of layers that are often used for deep learning-based approaches to ASR, you will engage in your own investigations by creating and testing your own state-of-the-art models. Throughout the notebook, we provide recommended research papers for additional reading and links to GitHub repositories with interesting implementations.

Project Instructions

Getting Started

  1. Clone the repository, and navigate to the downloaded folder.
git clone https://github.com/udacity/AIND-VUI-Capstone.git
cd AIND-VUI-Capstone
  1. Create (and activate) a new environment with Python 3.6 and the numpy package.

    • Linux or Mac:
    conda create --name aind-vui python=3.5 numpy
    source activate aind-vui
    
    • Windows:
    conda create --name aind-vui python=3.5 numpy scipy
    activate aind-vui
    
  2. Install TensorFlow.

    • Option 1: To install TensorFlow with GPU support, follow the guide to install the necessary NVIDIA software on your system. If you are using the Udacity AMI, you can skip this step and only need to install the tensorflow-gpu package:
    pip install tensorflow-gpu==1.1.0
    
    • Option 2: To install TensorFlow with CPU support only,
    pip install tensorflow==1.1.0
    
  3. Install a few pip packages.

pip install -r requirements.txt
  1. Switch Keras backend to TensorFlow.

    • Linux or Mac:
    KERAS_BACKEND=tensorflow python -c "from keras import backend"
    
    • Windows:
    set KERAS_BACKEND=tensorflow
    python -c "from keras import backend"
    
  2. Obtain the libav package.

    • Linux: sudo apt-get install libav-tools
    • Mac: brew install libav
    • Windows: Browse to the Libav website
      • Scroll down to "Windows Nightly and Release Builds" and click on the appropriate link for your system (32-bit or 64-bit).
      • Click nightly-gpl.
      • Download most recent archive file.
      • Extract the file. Move the usr directory to your C: drive.
      • Go back to your terminal window from above.
    rename C:\usr avconv
    set PATH=C:\avconv\bin;%PATH%
    
  3. Obtain the appropriate subsets of the LibriSpeech dataset, and convert all flac files to wav format.

    • Linux or Mac:
    wget http://www.openslr.org/resources/12/dev-clean.tar.gz
    tar -xzvf dev-clean.tar.gz
    wget http://www.openslr.org/resources/12/test-clean.tar.gz
    tar -xzvf test-clean.tar.gz
    mv flac_to_wav.sh LibriSpeech
    cd LibriSpeech
    ./flac_to_wav.sh
    
    • Windows: Download two files (file 1 and file 2) via browser and save in the AIND-VUI-Capstone directory. Extract them with an application that is compatible with tar and gz such as 7-zip or WinZip. Convert the files from your terminal window.
    move flac_to_wav.sh LibriSpeech
    cd LibriSpeech
    powershell ./flac_to_wav.sh
    
  4. Create JSON files corresponding to the train and validation datasets.

cd ..
python create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json
python create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json
  1. Create an IPython kernel for the aind-vui environment. Open the notebook.
python -m ipykernel install --user --name aind-vui --display-name "aind-vui"
jupyter notebook vui_notebook.ipynb
  1. Before running code, change the kernel to match the aind-vui environment by using the drop-down menu. Then, follow the instructions in the notebook.

select aind-vui kernel

NOTE: While some code has already been implemented to get you started, you will need to implement additional functionality to successfully answer all of the questions included in the notebook. Unless requested, do not modify code that has already been included.

Amazon Web Services

If you do not have access to a local GPU, you could use Amazon Web Services to launch an EC2 GPU instance. Please refer to the Udacity instructions for setting up a GPU instance for this project.

Evaluation

Your project will be reviewed by a Udacity reviewer against the CNN project rubric. Review this rubric thoroughly, and self-evaluate your project before submission. All criteria found in the rubric must meet specifications for you to pass.

Project Submission

When you are ready to submit your project, collect the following files and compress them into a single archive for upload:

  • The vui_notebook.ipynb file with fully functional code, all code cells executed and displaying output, and all questions answered.
  • An HTML or PDF export of the project notebook with the name report.html or report.pdf.
  • The sample_models.py file with all model architectures that were trained in the project Jupyter notebook.
  • The results/ folder containing all HDF5 and pickle files corresponding to trained models.

Alternatively, your submission could consist of the GitHub link to your repository.

Project Rubric

Files Submitted

Criteria Meets Specifications
Submission Files The submission includes all required files.

STEP 2: Model 0: RNN

Criteria Meets Specifications
Trained Model 0 The submission trained the model for at least 20 epochs, and none of the loss values in model_0.pickle are undefined. The trained weights for the model specified in simple_rnn_model are stored in model_0.h5.

STEP 2: Model 1: RNN + TimeDistributed Dense

Criteria Meets Specifications
Completed rnn_model Module The submission includes a sample_models.py file with a completed rnn_model module containing the correct architecture.
Trained Model 1 The submission trained the model for at least 20 epochs, and none of the loss values in model_1.pickle are undefined. The trained weights for the model specified in rnn_model are stored in model_1.h5.

STEP 2: Model 2: CNN + RNN + TimeDistributed Dense

Criteria Meets Specifications
Completed cnn_rnn_model Module The submission includes a sample_models.py file with a completed cnn_rnn_model module containing the correct architecture.
Trained Model 2 The submission trained the model for at least 20 epochs, and none of the loss values in model_2.pickle are undefined. The trained weights for the model specified in cnn_rnn_model are stored in model_2.h5.

STEP 2: Model 3: Deeper RNN + TimeDistributed Dense

Criteria Meets Specifications
Completed deep_rnn_model Module The submission includes a sample_models.py file with a completed deep_rnn_model module containing the correct architecture.
Trained Model 3 The submission trained the model for at least 20 epochs, and none of the loss values in model_3.pickle are undefined. The trained weights for the model specified in deep_rnn_model are stored in model_3.h5.

STEP 2: Model 4: Bidirectional RNN + TimeDistributed Dense

Criteria Meets Specifications
Completed bidirectional_rnn_model Module The submission includes a sample_models.py file with a completed bidirectional_rnn_model module containing the correct architecture.
Trained Model 4 The submission trained the model for at least 20 epochs, and none of the loss values in model_4.pickle are undefined. The trained weights for the model specified in bidirectional_rnn_model are stored in model_4.h5.

STEP 2: Compare the Models

Criteria Meets Specifications
Question 1 The submission includes a detailed analysis of why different models might perform better than others.

STEP 2: Final Model

Criteria Meets Specifications
Completed final_model Module The submission includes a sample_models.py file with a completed final_model module containing a final architecture that is not identical to any of the previous architectures.
Trained Final Model The submission trained the model for at least 20 epochs, and none of the loss values in model_end.pickle are undefined. The trained weights for the model specified in final_model are stored in model_end.h5.
Question 2 The submission includes a detailed description of how the final model architecture was designed.

Suggestions to Make your Project Stand Out!

(1) Add a Language Model to the Decoder

The performance of the decoding step can be greatly enhanced by incorporating a language model. Build your own language model from scratch, or leverage a repository or toolkit that you find online to improve your predictions.

(2) Train on Bigger Data

In the project, you used some of the smaller downloads from the LibriSpeech corpus. Try training your model on some larger datasets - instead of using dev-clean.tar.gz, download one of the larger training sets on the website.

(3) Try out Different Audio Features

In this project, you had the choice to use either spectrogram or MFCC features. Take the time to test the performance of both of these features. For a special challenge, train a network that uses raw audio waveforms!

Special Thanks

We have borrowed the create_desc_json.py and flac_to_wav.sh files from the ba-dls-deepspeech repository, along with some functions used to generate spectrograms.

More Repositories

1

chatbot-startkit

This repository holds files for the simple chatbot wrote in TensorFlow 1.4, with attention mechanism and bucketing.
Python
59
star
2

tesla-stocks-prediction

The implementation of LSTM in TensorFlow used for the stock prediction.
Jupyter Notebook
58
star
3

image-search-engine

End-to-end image search engine based on the Deep learning techniques.
Jupyter Notebook
44
star
4

clustering-python

Different clustering approaches applied on different problemsets
Jupyter Notebook
39
star
5

cnn-raccoon

Create interactive dashboards for your Convolutional Neural Networks with a single line of code!
Python
31
star
6

cnn-lstm-network

Tensorflow implementation of embed CNN-LSTM network for sentiment analysis task.
Jupyter Notebook
26
star
7

ml_tutor

Machine Learning Tutor Python library
Python
23
star
8

search-book-by-cover-server

This is the main repository for the Book Search project. This engine allows you to search database of books by simply uploading an image of a cover.
Python
20
star
9

dataset-dimensionality-reduction-python

Here I've demonstrated how and why should we use PCA, KernelPCA, LDA and t-SNE for dimensionality reduction when we work with higher dimensional datasets.
Jupyter Notebook
19
star
10

breast-cancer-classification

This project is to test classification algorithms wrote from scratch in python using only numpy. Algorithms wrote in this project: KNN, Logistic Regression and Naive Bayes classifier.
Jupyter Notebook
17
star
11

regression-python

In this repository you can find many different, small, projects which demonstrate regression techniques using python programming language
Jupyter Notebook
15
star
12

Speech-commands-recognition

Recognizing common speech commands using Keras and Tensorflow.
Python
11
star
13

fully-connected-nn

The fully connected neural network implemented in Numpy, from scratch, in Tensorflow and in Keras. The bonus code: Implementation of many different activation functions, in python, weight inits.
Jupyter Notebook
10
star
14

ads-strategy-reinforcement-learning

The example of using reinforcement learning algorithms in the business, specifically finding what ads to use in our campaign.
Jupyter Notebook
7
star
15

pixelrnn

Implementation of a, simple, recurrent neural network used for a image classification.
Jupyter Notebook
7
star
16

rnn-from-scratch

Simple recurrent neural network for text generation. Based on https://gist.github.com/karpathy/d4dee566867f8291f086
Python
7
star
17

residual-network

The implementation of ResNet using Tensorflow. This implementation is based on this paper: https://arxiv.org/pdf/1603.05027.pdf
Jupyter Notebook
5
star
18

deep-convolutional-highway-network

This is the implementation of Highway version of Deep convolutional network, based on https://arxiv.org/pdf/1505.00387.pdf.
Jupyter Notebook
5
star
19

generative-adversarial-network

This is the implementation of simple GAN using TensorFlow as a framwork.
Jupyter Notebook
4
star
20

movie-recommender-pandas

This is movie recommendation system with pandas back-end. There are a few things you can do with it. Search for movie, find movie what to watch based on genre and when you have watched a movie to find other movies similar to it.
Python
4
star
21

fully-connected-highway-network

The imeplementation of Fully Connected Highway network using TensorFlow. The imeplementation is based on this paper: https://arxiv.org/pdf/1505.00387.pdf
Jupyter Notebook
3
star
22

classification-python

This repository contains projects done by some of classification techniques. Python.
Jupyter Notebook
3
star
23

ai-course-for-executives

Python
3
star
24

search-book-by-cover-android-app

This is the main repository for the Book Search Android front-end part of the project. This engine allows you to search database of books by simply uploading an image of a cover.
Java
3
star
25

dsgo-flask-churn

DSGo Virtual Flask deployment
Jupyter Notebook
2
star
26

dsgo-streamlit

DataScience Go Virtual conference - Building data applications (Streamlit version)
Python
2
star
27

PyTorch-tutorials

This repository contains different models implemented using PyTorch.
Jupyter Notebook
2
star
28

cnn-tensorflow-keras

Implementation of simple Convolutional Neural Network in TensorFlow and Keras. The dataset used is MNIST.
Jupyter Notebook
2
star
29

matchbox-intro-to-ml

This repository contains all source files for the Introduction to practical ML course organized with MatchBox
2
star
30

sentiment-analysis

In this mini-project I have implemented LSTM network to classify movie reviews for sentiment analysis task.
Jupyter Notebook
2
star
31

air-cargo-planning-ai

The air cargo problem is one of many problems that we can easily solve using AI. Particularly here we are solving this problem using planning techniques from AI domain. This is a project from Artificial Intelligence Nanodegree @Udacity.
Python
2
star
32

Isolation-AI-agent

This is the second project for Artificial Intelligence Nanodegree @Udacity . This is the game Isolation playing agent.
Python
1
star
33

python-training

Jupyter Notebook
1
star
34

simulated-annealing

One of many projects done in Artificial Intelligence Nanodegree @Udacity. This project was to use Simulated Annealing algorithm to solve Traveling salesman problem.
Jupyter Notebook
1
star
35

autoencoders-tensorflow

Different versions and their implementations of Autoencoders using Tensorflow.
Jupyter Notebook
1
star
36

pacman-AI

This is the Pacman AI agent implemented as a search problem with DFS, UCS, BFS, A* with different heuristic functions. This project was a part of Artificial Intelligence Nanodegree @Udacity.
Python
1
star
37

rnn-sherlock-holmes-book

This project is a part of AIND @Udacity. Here the RNN (Recurrent neural network) is used to generate text. The RNN is implemented using Keras.
HTML
1
star