deep-listening

Deep learning experiments for audio classification

A full write-up, including technical explanations and design decisions, as well as a summary of results achieved can be found within the associated Project Report.

This project consists of several Jupyter notebooks that implement deep learning audio classifiers.

1-us8k-ffn-extract-explore.ipynb

this notebook contains code to extract and visualise audio files from the UrbanSound8K data set
the feature extraction process uses audio processing metrics from the librosa library, which reduces each recording to 193 data points
as the audio information is highly abstracted, (we can not process successive frames using a receptive field), these features are intended to be fed into a feed-forward neural network (FFN)

2-us8k-ffn-train-predict.ipynb

this notebook contains the code to load previously extracted features and feed them into a 3-layer FFN, implemented using Tensorflow and Keras
also included is some code to evaluate model performance, and to generate predictions from individual samples, demonstrating how a trained model would be used to identify the nature of live recordings

3-us8k-cnn-extract-train.ipynb

this notebook extracts audio features suitable for input into a classic 2-layer Convolutional Neural Network (CNN)
much more of the audio data is preserved in this approach, as the saved numpy feature data is over 2GB I haven't included it with this repository, but you can use the code in this notebook to extract it from the original UrbanSound8K data set

4-us8k-cnn-salamon.ipynb

this notebook implements an alternative CNN, similar to one described by Salamon and Bello

5-ffbird-cnn.ipynb

this notebook uses the Salamon and Bello CNN to process the FreeField1010 data set of field recordings, with the goal of recognising the presence of birdsong.
the data set is not part of this repository, so if you want to run this code you'll need to download the data yourself (see instructions in the notebook)

7-us8k-rnn-extract-train.ipynb

this uses a Recurrent Neural Network to classify Mel-frequency cepstral coefficients (MFCC) features.

Do get in touch if you've any questions, (me @ jaroncollis . com)

jaron/deep-listening

jaron

Reviews

Repository Details