• Stars
    star
    907
  • Rank 48,297 (Top 1.0 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

kapre: Keras Audio Preprocessors

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

Tflite compatbility

The STFT layer is not tflite compatible (due to tf.signal.stft). To create a tflite compatible model, first train using the normal kapre layers then create a new model replacing STFT and Magnitude with STFTTflite, MagnitudeTflite. Tflite compatible layers are restricted to a batch size of 1 which prevents use of them during training.

# assumes you have run the one-shot example above.
from kapre import STFTTflite, MagnitudeTflite
model_tflite = Sequential()

model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model_tflite.add(MagnitudeTflite())
model_tflite.add(MagnitudeToDecibel())  
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))
model_tflite.add(BatchNormalization())
model_tflite.add(ReLU())
model_tflite.add(GlobalAveragePooling2D())
model_tflite.add(Dense(10))
model_tflite.add(Softmax())

# load the trained weights into the tflite compatible model.
model_tflite.set_weights(model.get_weights())

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}

More Repositories

1

music-auto_tagging-keras

Music auto-tagging models and trained weights in keras/theano
Python
614
star
2

transfer_learning_music

Transfer learning for music classification and regression tasks
Jupyter Notebook
253
star
3

dl4mir

Deep learning for MIR
Jupyter Notebook
234
star
4

torchaudio-contrib

A test bed for updates and new features | pytorch/audio
Python
167
star
5

lstm_real_book

LSTM source code to generate jazz chord progressions
Python
128
star
6

DrummerNet

Supplementary material of "Deep Unsupervised Drum Transcription", ISMIR 2019
TeX
111
star
7

LSTMetallica

LSTM to generate drum tracks based on Metallica's midi drum tracks
Python
105
star
8

ismir-2019-posters

76
star
9

residual_block_keras

Residual network block in Keras
Python
72
star
10

keras_STFT_layer

Do STFT in Keras
Jupyter Notebook
63
star
11

magnatagatune-list

List of automatic music tagging research articles that are evaluated against MagnaTagATune Dataset
62
star
12

keras_callbacks_example

Keras callback example
Python
56
star
13

MSD_split_for_tagging

Python
51
star
14

awesome-audio-study-materials-for-korean

39
star
15

Auralisation

Auralisation of learned features in CNN (for audio)
Python
39
star
16

music4all_contrib

Jupyter Notebook
31
star
17

data-science-handbook

데이터 κ³Όν•™ ν•Έλ“œλΆ
Jupyter Notebook
18
star
18

perceptual_weighting

Loudness compensation for time-frequency representation
Python
16
star
19

ismir2016-ldb-audio-captioning-model-keras

Audio captioning RNN model in Keras
Python
15
star
20

keras_cropping_layer

Keras cropping layer implementation
Python
13
star
21

icassp_2017

12
star
22

UrbanSound8K-preprocessing

Jupyter Notebook
11
star
23

frequency-aware-conv2d-layer-pytorch

Python
10
star
24

awesome-conscious-AIs

8
star
25

tokenizer-vs-tokenizer

7
star
26

machine_learning_eng2kor

Machine learning eng2kor word dictionary
4
star
27

openmic-2018-tfrecord

Python
3
star
28

FMA_convnet_features

FMA convnet features
3
star
29

magnatagatune

yeah
C++
3
star
30

DLR

Python
2
star
31

MSD-to-MB-mapping

Million Song Dataset to MusicBrainz (AcousticBrainz) mapping files
1
star
32

embedding

C++
1
star
33

compact_cnn

a landing page for compact cnn
1
star