• Stars
    star
    103
  • Rank 333,046 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CNN 1D vs 2D audio classification

audio_classification

Description of the approach : https://medium.com/@CVxTz/audio-classification-a-convolutional-neural-network-approach-b0a4fce8f6c

Requirement : Keras, tensorflow, numpy, librosa

Audio Classification : A Convolutional Neural Network Approach

Audio Classification can be used for audio scene understanding which in turn is important so that an artificial agent is able to understand and better interact with its environment.
This is the motivation for this blog post, I will present two different ways that you can go about doing audio classification based on convolutions.

We will base our experiments on the dataset available at (https://www.kaggle.com/c/freesound-audio-tagging) which is a data-set of annotated audio segments of different lengths and out of 41 classes like “Acoustic_guitar”, “Applause”, “Bark” …

First Approach : Raw audio wave and 1D convolutions

The most straightforward way to do it is to feed the raw wave to a cascade of 1D convolutions and finally produce the class probabilities.

Second Approach : Log-Mel spectrogram

One more advanced approach to audio classification is using Mel-spectrogram instead of raw audio wave.

Mel spectrogram “is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.” — https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Mel spectrogram transform the input raw sequence to a 2D feature map where one dimension represents time and the other one represents frequency and the values represents amplitude.

Results (Mean Average Precision @ 3 ) :

1D : 0.754

2D : 0.849

Average predictions of the two :0.883

2D mel Model outperforms the 1D raw wave model but the average of the two outperforms each individual model significantly. This is probably because each model learns different representations and make different kind of mistakes and by averaging them each model corrects the errors of the other in some way.

Code to reproduce the results is available at : https://github.com/cvxtz/audio_classification

More Repositories

1

image_search_engine

Image search engine
Python
232
star
2

time_series_forecasting

Python
194
star
3

ECG_Heartbeat_Classification

CNN for heartbeat classification
Python
148
star
4

EEG_classification

EEG Sleep stage classification using CNN with Keras
Python
145
star
5

medical_image_segmentation

Medical image segmentation ( Eye vessel segmentation)
Python
124
star
6

recommender_transformer

Python
82
star
7

graph_classification

Learning from graph data using Keras
Python
64
star
8

music_genre_classification

music genre classification : LSTM vs Transformer
Python
61
star
9

rubiks_cube

Rubik's Cube solver using reinforcement learning
Python
53
star
10

kinship_prediction

Deep Neural Networks for Kinship prediction using face photos
Python
47
star
11

face_age_gender

Can we predict the age and gender of someone given a picture of their face ?
Python
42
star
12

COLA_pytorch

COLA contrastive pre-training method implemented in PyTorch
Python
40
star
13

DeepTabular

Python
37
star
14

fingerprint_denoising

U-Net for fingerprint denoising
Python
26
star
15

IntegratedGradientsPytorch

Integrated gradients attribution method implemented in PyTorch
Python
23
star
16

sudoku_solver

Solving a Sudoku Puzzle from a screenshot
Python
22
star
17

llm-serve-tutorial

Python
21
star
18

RL

RL algorithm implementations from scratch.
Python
17
star
19

distill-llm

Python
17
star
20

FastImageClassification

A Step-By-Step tutorial to build and deploy an image classification API
Python
14
star
21

Recommender_keras

Basic recommendation system for Movilens dataset using Keras
Python
12
star
22

xumi

Python
9
star
23

ner_playground

Python
6
star
24

celery_ml_deploy

Python
6
star
25

knowledge_distillation

Knowledge Distillation
Python
5
star
26

gcp_model_deploy_example

Python
5
star
27

handwriting_forensics

Python
5
star
28

TagSuggestionImages

Suggest multiple Tags/Labels that better fit an image
Python
4
star
29

active_learning

Active Learning Applied to image and tabular data
Python
4
star
30

code_search

Python
4
star
31

nicegui_tutorial

Python
4
star
32

learning_to_abstain

Know what you don't know
Python
3
star
33

malignancy_detection

malignancy detection using CNNs with Keras
Python
3
star
34

LLM-Voice

Python
3
star
35

ReconstructionAuxLoss

Improve Neural Network's Generalization Performance By Adding an Unsupervised Auxiliary Loss - Pytorch Lightning
Python
2
star
36

bleach_bot

Python
2
star
37

doc-llm

Python
2
star
38

interpretable_nlp

Python
1
star
39

streamlit_demo

Python
1
star
40

prefect_mlops

1
star
41

dimensionality_reduction

HTML
1
star
42

ToyImageClassificationDataset

Toy Image Classification Dataset Annotated with Labelme
Python
1
star
43

constrained_llm_generation

Python
1
star