• Stars
    star
    1,162
  • Rank 40,165 (Top 0.8 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Speech Emotion Analyzer

  • The idea behind creating this project was to build a machine learning model that could detect emotions from the speech we have with each other all the time. Nowadays personalization is something that is needed in all the things we experience everyday.

  • So why not have a emotion detector that will guage your emotions and in the future recommend you different things based on your mood. This can be used by multiple industries to offer different services like marketing company suggesting you to buy products based on your emotions, automotive industry can detect the persons emotions and adjust the speed of autonomous cars as required to avoid any collisions etc.

Analyzing audio signals

Β©Fabien_Ringeval_PhD_Thesis.

Datasets:

Made use of two different datasets:

  1. RAVDESS. This dataset includes around 1500 audio file input from 24 different actors. 12 male and 12 female where these actors record short audios in 8 different emotions i.e 1 = neutral, 2 = calm, 3 = happy, 4 = sad, 5 = angry, 6 = fearful, 7 = disgust, 8 = surprised.
    Each audio file is named in such a way that the 7th character is consistent with the different emotions that they represent.

  2. SAVEE. This dataset contains around 500 audio files recorded by 4 different male actors. The first two characters of the file name correspond to the different emotions that the potray.

Audio files:

Tested out the audio files by plotting out the waveform and a spectrogram to see the sample audio files.
Waveform

Spectrogram

Feature Extraction

The next step involves extracting the features from the audio files which will help our model learn between these audio files. For feature extraction we make use of the LibROSA library in python which is one of the libraries used for audio analysis.

  • Here there are some things to note. While extracting the features, all the audio files have been timed for 3 seconds to get equal number of features.
  • The sampling rate of each file is doubled keeping sampling frequency constant to get more features which will help classify the audio file when the size of dataset is small.

The extracted features looks as follows



These are array of values with lables appended to them.

Building Models

Since the project is a classification problem, Convolution Neural Network seems the obivious choice. We also built Multilayer perceptrons and Long Short Term Memory models but they under-performed with very low accuracies which couldn't pass the test while predicting the right emotions.

Building and tuning a model is a very time consuming process. The idea is to always start small without adding too many layers just for the sake of making it complex. After testing out with layers, the model which gave the max validation accuracy against test data was little more than 70%


Predictions

After tuning the model, tested it out by predicting the emotions for the test data. For a model with the given accuracy these are a sample of the actual vs predicted values.


Testing out with live voices.

In order to test out our model on voices that were completely different than what we have in our training and test data, we recorded our own voices with dfferent emotions and predicted the outcomes. You can see the results below: The audio contained a male voice which said "This coffee sucks" in a angry tone.



As you can see that the model has predicted the male voice and emotion very accurately in the image above.

NOTE: If you are using the model directly and want to decode the output ranging from 0 to 9 then the following list will help you.

0 - female_angry
1 - female_calm
2 - female_fearful
3 - female_happy
4 - female_sad
5 - male_angry
6 - male_calm
7 - male_fearful
8 - male_happy
9 - male_sad

Conclusion

Building the model was a challenging task as it involved lot of trail and error methods, tuning etc. The model is very well trained to distinguish between male and female voices and it distinguishes with 100% accuracy. The model was tuned to detect emotions with more than 70% accuracy. Accuracy can be increased by including more audio files for training.

More Repositories

1

Image-Caption-Generator

The LSTM model generates captions for the input images after extracting features from pre-trained VGG-16 model. (Computer Vision, NLP, Deep Learning, Python)
Jupyter Notebook
61
star
2

Shopping-Cart-Spring-MVC-Hibernate-Application

A shopping cart spring-mvc application with hibernate. Application sends user email confirmation on registration and it also gives user to generate their final bill as a PDF with PDF-View.
Java
21
star
3

Document_Classification

Python code for classification of documents into different classes using machine learning
Jupyter Notebook
18
star
4

Food-Wastage-Management

Food Wastage Management: Providing a simpler means of collecting the leftover food from restaurants and distributing the same to those in need
Java
11
star
5

Medical-Record-Analysis-Predictions

Built different models to predict the risk for Prudential Life Insurance to its possible applicants based on their medical history
Jupyter Notebook
6
star
6

IMDB_Score_Prediction

Predicting the IMDB scores of movies based on various meta-data available for analysis
Jupyter Notebook
4
star
7

Prudential-Life-Insurance-Assessment

Built different models to predict the risk for Prudential Life Insurance to its possible applicants based on their medical history
R
3
star
8

Reuters_2012-17_News_Headline_Analysis

Topic-Modelling Reuters news headlines to find topic clusters for the past 7 years using LDA and t-SNE
Jupyter Notebook
3
star
9

SuperMarket_Optimization

Jupyter Notebook
2
star
10

python_analysis

2
star
11

Analysis-using-Python

Analysis using Natural Language Toolkit(NLTK), Numpy, Pandas and Matplotlib
Jupyter Notebook
2
star
12

IMDB-Database-Modelling-and-Analysis

Modeled sample movie database(IMDB) using Toad Data Modeler, implemented various queries, triggers and stored procedure in MS-SQL, Oracle, Hive and generated Tableau reports for the same
2
star
13

NYC-Vehicle-Collision-Analysis

Used vehicle collision data from NYC Open Data to analyse various factors for collisions, identifying the vehicle type involved in the crashes and using weather dataset to identify how snowfall and precipitation had caused increase in collisons
Jupyter Notebook
2
star
14

MessengerChatbot

The idea behind creating the chatbot is to build a self sustainable program that can be used to interact and take decisions on its own without the human interference. This application will try minimize the waiting time and emotional interference while dealing with certain situations.
Jupyter Notebook
1
star