• Stars
    star
    200
  • Rank 195,325 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ”‰ ๐Ÿ‘ฆ ๐Ÿ‘งVoice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)

Build Status

Voice-based-gender-recognition

Voice based gender recognition using:

  • The Free ST American English Corpus dataset (SLR45)
  • Mel-frequency cepstrum coefficients (MFCC)
  • Gaussian mixture models (GMM)

Dataset

The The Free ST American English Corpus dataset (SLR45) can be found on SLR45. It is a free American English corpus by Surfingtech, containing utterances from 10 speakers (5 females and 5 males). Each speaker has about 350 utterances.

Theory

Voice features extraction

The Mel-Frequency Cepstrum Coefficients (MFCC) are used here, since they deliver the best results in speaker verification. MFCCs are commonly derived as follows:

  1. Take the Fourier transform of (a windowed excerpt of) a signal.
  2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
  3. Take the logs of the powers at each of the mel frequencies.
  4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
  5. The MFCCs are the amplitudes of the resulting spectrum.

Gaussian Mixture Model

According to D. Reynolds in Gaussian_Mixture_Models: A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocal-tract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori(MAP) estimation from a well-trained prior model.

Workflow graph


  • For a more detailed explanation, please refer to this blog that I have written.


Dependencies

This script require the follwing modules/libraries:

Libs can be installed as follows:

pip install -r requirements.txt

Code & scripts

  • Run.py : This is the main script and it will run the whole cycle (Data management > Models training > Genders identification)
  • DataManager.py: This script is responsible for the extracting and strcturing the data.
  • ModelsTrainer.py:This script is responsible for training the Gaussian Mixture Models (GMM).
  • GenderIdentifier.py:This script is responsible for Testing the system by identifying the genders of the testing set.
  • FeaturesExtractor.py:This script is responsible for extracting the MFCC features from the .wav files.

Results and disscussion

  • The system results in a 95% accuracy of gender detection.
  • The code can be further optimized using multi-threading, acceleration libs and multi-processing.
  • The accuracy can be further improved using GMM normalization aka a UBM-GMM system.

More Repositories

1

spafe

๐Ÿ”‰ spafe: Simplified Python Audio Features Extraction
Python
451
star
2

SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.
HTML
287
star
3

pydiogment

๐Ÿ“ฃ Python library for audio augmentation
Python
83
star
4

Voice-based-speaker-identification

๐Ÿ”‰ ๐Ÿ‘ฆ ๐Ÿ‘ง ๐Ÿ‘ฉ ๐Ÿ‘จ Speaker identification using voice MFCCs and GMM
Python
52
star
5

CoinMarketCapScraper

a small python scraper to scrape historical data from the CoinMarketCap website and convert it to csv files . This is an initial step for a data mining process to develop a predictive model of cryptocurrencies prices.
CSS
18
star
6

sphinxcontrib-pdfembed

Sphinx extension to embedd a pdf file viewer in documentation webpages
Python
18
star
7

SuperKogito.github.io

๐Ÿ’ฌ ๐Ÿ”— personal blog & website
Jupyter Notebook
7
star
8

Port-scanner

A light weight port scanner with a small gui using python 3 and tkinter.
Python
5
star
9

conky-cryptoTrio

A conky theme displaying real-time prices of Bitcoin (BTC), Ethereum (ETH), Litecoin (LTC).
Shell
5
star
10

Texhide

A light-weight stenographic gui-tool to hide secret messages in images.
Python
4
star
11

Morse-coder

Morse code generator and player with a gui interface.
Python
3
star
12

COVID-19-study

A study of COVID-19 in Tunisia and Germany (scripts can be used for other countries.)
Jupyter Notebook
3
star
13

conky-cryptoList

A conky theme displaying a list of Top10 crypto-currenciences on Coinmarketcap and their associated price and 24h-change.
Shell
3
star
14

pyvimaps

Simple top layer geodata vizualization/mapping library.
Python
3
star
15

ReadTex

Text to speech converter and player based on google api.
Python
2
star
16

SuperKogito

my profile readme
2
star
17

Diabetes-detection-using-machine-learning

Diabetes detection using machine learning
Python
2
star
18

chai3d_haptic_multiplayer_pingpong

Implementation of a multiplayer (TCP / UDP) pingpong game played with haptic device (Novint Falcon)
C++
2
star
19

ASR-datasets

a collection of ASR-datasets
CSS
2
star
20

fastft

Implementation of [Librosa](https://github.com/librosa/librosa) like [STFT](https://en.wikipedia.org/wiki/Short-time_Fourier_transform) using [FFTW](https://www.fftw.org/)
C
2
star
21

Cryptos

Just a light AES-128bits encrypter and hash-generator (SHA-256) with a gui interface (tkinter).
Python
2
star
22

maps-of-tunisia

A collection of data geovisualization for Tunisia
JavaScript
2
star
23

Crypto_implementations_example

various cryptographic implementations example using Contiki Os on TI cc2650 sensortag
C
2
star
24

Job-interviews-and-technical-tasks

A collection of job interviews and technical tasks I went through.
Python
1
star