• Stars
    star
    614
  • Rank 73,061 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Music auto-tagging models and trained weights in keras/theano

Music Auto-Tagger

Music auto-tagger using keras

WARNING! Alternatives available

..because MusicTaggerCNN and MusicTaggerCRNN is based on an old (and a bit incorrect) implementation of Batch Normalization of old Keras (thanks god it worked anyway), it's quite tricky to fix.

Keras Versions

  • use keras == 1.0.6 for MusicTaggerCNN.
  • use 1.2 >= keras > 1.0.6 for MusicTaggerCRNN.
  • use 1.2 >= keras >= 1.1 for compact_cnn.

The prerequisite -- READ IT!

  • You need keras to run example.py.
    • To use your own audio file, you need librosa.
  • The input data shape is (None, channel, height, width), i.e. following theano convention. If you're using tensorflow as your backend, you should check out ~/.keras/keras.json if image_dim_ordering is set to th, i.e.
"image_dim_ordering": "th",
  • To use compact_cnn, You need to install Kapre.

Files (1)

For MusicTaggerCNN and MusicTaggerCRNN.

Files (2)

For compact_cnn

Structures

Left: compact_cnn CNN, music_tager_cnn. Right: music_tagger_crnn alt text

MusicTaggerCNN

  • 5-layer 2D Convolutions
  • num_parameter: 865,950
  • AUC score of 0.8654
  • WARNING with keras >1.0.6, this model does not work properly. Please use MusicTaggerCRNN until it is updated! (FYI: with 3M parameter, a deeper ConvNet showed 0.8595 AUC.)

MusicTaggerCRNN

  • 4-layer 2D Convolutions + 2 GRU
  • num_parameter: 396,786
  • AUC score: 0.8662

How was it trained?

['rock', 'pop', 'alternative', 'indie', 'electronic', 'female vocalists', 
'dance', '00s', 'alternative rock', 'jazz', 'beautiful', 'metal', 
'chillout', 'male vocalists', 'classic rock', 'soul', 'indie rock',
'Mellow', 'electronica', '80s', 'folk', '90s', 'chill', 'instrumental',
'punk', 'oldies', 'blues', 'hard rock', 'ambient', 'acoustic', 'experimental',
'female vocalist', 'guitar', 'Hip-Hop', '70s', 'party', 'country', 'easy listening',
'sexy', 'catchy', 'funk', 'electro' ,'heavy metal', 'Progressive rock',
'60s', 'rnb', 'indie pop', 'sad', 'House', 'happy']

Which is the better predictor?

  • UPDATE: The most efficient computation, use compact_cnn. Otherwise read below.
  • Training: MusicTaggerCNN is faster than MusicTaggerCRNN (wall-clock time)
  • Prediction: They are more or less the same.
  • Memory Usage: MusicTaggerCRNN have smaller number of trainable parameters. Actually you can even decreases the number of feature maps. The MusicTaggerCRNN still works quite well in the case - i.e., the current setting is a little bit rich (or redundant). With MusicTaggerCNN, you will see the performance decrease if you reduce down the parameters.

Therefore, if you just wanna use the pre-trained weights, use MusicTaggerCNN. If you wanna train by yourself, it's up to you. I would use MusicTaggerCRNN after downsizing it to, like, 0.2M parameters (then the training time would be similar to MusicTaggerCNN) in general. To reduce the size, change number of feature maps of convolution layers.

Which is the better feature extractor?

By setting include_top=False, you can get 256-dim (MusicTaggerCNN) or 32-dim (MusicTaggerCRNN) feature representation.

In general, I would recommend to use MusicTaggerCRNN and 32-dim feature as for predicting 50 tags, 256 features actually sound bit too large. I haven't looked into 256-dim feature but only 32-dim features. I thought of using PCA to reduce the dimension more, but ended up not applying it because mean(abs(recovered - original) / original) are .12 (dim: 32->16), .05 (dim: 32->24) - which don't seem good enough.

Probably the 256-dim features are redundant (which then you can reduce them down effectively with PCA), or they just include more information than 32-dim ones (e.g., features in different hierarchical levels). If the dimension size would not matter, it's worth choosing 256-dim ones.

Usage

$ python example_tagging.py
$ python example_feat_extract.py

Result

theano, MusicTaggerCRNN

data/bensound-cute.mp3
[('jazz', '0.444'), ('instrumental', '0.151'), ('folk', '0.103'), ('Hip-Hop', '0.103'), ('ambient', '0.077')]
[('guitar', '0.068'), ('rock', '0.058'), ('acoustic', '0.054'), ('experimental', '0.051'), ('electronic', '0.042')]

data/bensound-actionable.mp3
[('jazz', '0.416'), ('instrumental', '0.181'), ('Hip-Hop', '0.085'), ('folk', '0.085'), ('rock', '0.081')]
[('ambient', '0.068'), ('guitar', '0.062'), ('Progressive rock', '0.048'), ('experimental', '0.046'), ('acoustic', '0.046')]

data/bensound-dubstep.mp3
[('Hip-Hop', '0.245'), ('rock', '0.183'), ('alternative', '0.081'), ('electronic', '0.076'), ('alternative rock', '0.053')]
[('metal', '0.051'), ('indie', '0.028'), ('instrumental', '0.027'), ('electronica', '0.024'), ('hard rock', '0.023')]

data/bensound-thejazzpiano.mp3
[('jazz', '0.299'), ('instrumental', '0.174'), ('electronic', '0.089'), ('ambient', '0.061'), ('chillout', '0.052')]
[('rock', '0.044'), ('guitar', '0.044'), ('funk', '0.033'), ('chill', '0.032'), ('Progressive rock', '0.029')]

And...

Reproduce the experiment

  • A repo for split setting for an identical setting of experiments in two papers.
  • Audio file: find someone around you who happened to have the preview clips. or you have to crawl the files. I would recommend you to crawl your colleagues...

Credits

More Repositories

1

kapre

kapre: Keras Audio Preprocessors
Python
922
star
2

transfer_learning_music

Transfer learning for music classification and regression tasks
Jupyter Notebook
255
star
3

dl4mir

Deep learning for MIR
Jupyter Notebook
236
star
4

torchaudio-contrib

A test bed for updates and new features | pytorch/audio
Python
169
star
5

lstm_real_book

LSTM source code to generate jazz chord progressions
Python
130
star
6

DrummerNet

Supplementary material of "Deep Unsupervised Drum Transcription", ISMIR 2019
TeX
123
star
7

LSTMetallica

LSTM to generate drum tracks based on Metallica's midi drum tracks
Python
107
star
8

ismir-2019-posters

76
star
9

residual_block_keras

Residual network block in Keras
Python
72
star
10

magnatagatune-list

List of automatic music tagging research articles that are evaluated against MagnaTagATune Dataset
64
star
11

keras_STFT_layer

Do STFT in Keras
Jupyter Notebook
63
star
12

keras_callbacks_example

Keras callback example
Python
56
star
13

MSD_split_for_tagging

Python
52
star
14

Auralisation

Auralisation of learned features in CNN (for audio)
Python
42
star
15

awesome-audio-study-materials-for-korean

39
star
16

music4all_contrib

Jupyter Notebook
32
star
17

data-science-handbook

데이터 과학 핸드북
Jupyter Notebook
18
star
18

perceptual_weighting

Loudness compensation for time-frequency representation
Python
17
star
19

ismir2016-ldb-audio-captioning-model-keras

Audio captioning RNN model in Keras
Python
15
star
20

keras_cropping_layer

Keras cropping layer implementation
Python
13
star
21

icassp_2017

12
star
22

tokenizer-vs-tokenizer

11
star
23

UrbanSound8K-preprocessing

Jupyter Notebook
11
star
24

frequency-aware-conv2d-layer-pytorch

Python
9
star
25

awesome-conscious-AIs

8
star
26

machine_learning_eng2kor

Machine learning eng2kor word dictionary
4
star
27

openmic-2018-tfrecord

Python
3
star
28

FMA_convnet_features

FMA convnet features
3
star
29

magnatagatune

yeah
C++
3
star
30

DLR

Python
2
star
31

MSD-to-MB-mapping

Million Song Dataset to MusicBrainz (AcousticBrainz) mapping files
1
star
32

compact_cnn

a landing page for compact cnn
1
star
33

embedding

C++
1
star