• Stars
    star
    135
  • Rank 268,630 (Top 6 %)
  • Language
    Python
  • Created almost 6 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Benchmark popular audio i/o packages

Python Audio-Loading Benchmark

The aim of his repository is to evaluate the loading performance of various audio I/O packages interfaced from python.

This is relevant for machine learning models that today often process raw (time domain) audio and assembling a batch on the fly. It is therefore important to load the audio as fast as possible. At the same time a library should ideally support a variety of uncompressed and compressed audio formats and also is capable of loading only chunks of audio (seeking). The latter is especially important for models that cannot easily work with samples of variable length (convnets).

Tested Libraries

Library Version Short-Name/Code Out Type Supported codecs Excerpts/Seeking
scipy.io.wavfile 1.7.1 scipy Numpy PCM (only 16 bit) ❌
scipy.io.wavfile memmap 1.7.1 scipy_mmap Numpy PCM (only 16 bit) βœ…
soundfile (libsndfile) 0.10.0 soundfile Numpy PCM, Ogg, Flac βœ…
pydub 0.25.1 pydub Python Array PCM, MP3, OGG or other FFMPEG/libav supported codec ❌
aubio 0.4.9 aubio Numpy Array PCM, MP3, OGG or other avconv supported code βœ…
audioread (FFMPEG) 2.1.9 ar_ffmpeg Numpy Array all of FFMPEG ❌
librosa 0.8.1 librosa Numpy Array relies on audioread βœ…
tensorflow tf.io.audio.decode_wav 2.6.0 tf_decode_wav Tensorflow Tensor PCM (only 16 bit) ❌
tensorflow-io from_audio 0.20.0 tfio_fromaudio Tensorflow Tensor PCM, Ogg, Flac βœ…
torchaudio (sox_io) 0.9.0 torchaudio PyTorch Tensor all codecs supported by Sox βœ…
torchaudio (soundfile) 0.9.0 torchaudio PyTorch Tensor all codecs supported by Soundfile βœ…
soxbindings 0.9.0 soxbindings Numpy Tensor all codecs supported by Soundfile βœ…
stempeg 0.2.3 stempeg Numpy Tensor all codecs supported by FFMPEG βœ…

Not included

Results

The benchmark loads a number of (single channel) audio files of different length (between 1 and 151 seconds) and measures the time until the audio is converted to a tensor. Depending on the target tensor type (either numpy, pytorch or tensorflow) a different number of libraries were compared. E.g. when the output type is numpy and the target tensor type is tensorflow, the loading time included the cast operation to the target tensor. Furthermore, multiprocessing was disabled for data loaders. So especially for deep learning applications the loading speed doesn't necessarily reprent the batch loading speed.

**All results shown below, depict loading time **in seconds**.

Load to Numpy Tensor

Load to PyTorch Tensor

Load to Tensorflow Tensor

Getting metadata information

In addition to loading the file, one might also be interested in extracting metadata. To benchmark this we asked for every file to provide metadata for sampling rate, channels, samples, and duration. All in consecutive calls, which means the file is not allowed to be opened once and extract all metadata together. Note, that we have excluded pydub from the benchmark results on metadata as it was significantly slower than the other tools.

Running the Benchmark

Generate sample data

To test the loading speed, we generate different durations of random (noise) audio data and encode it either to PCM 16bit WAV, MP3 CBR, or MP4. The data is generated by using a shell script. To generate the data in the folder AUDIO, run

generate_audio.sh

Setting up using Docker

Build the docker container using

docker build -t audio_benchmark .

It installs all the package requirements for all audio libraries. Afterwards, mount the data directory into the docker container and run run.sh inside the container, e.g.:

docker run -v /home/user/repos/python_audio_loading_benchmark/:/app \
    -it audio_benchmark:latest /bin/bash run.sh

Setting up in a virtual environment

Create a virtual environment, install the necessary dependencies and run the benchmark with

virtualenv --python=/usr/bin/python3 --no-site-packages _env
source _env/bin/activate
pip install -r requirements.txt
pip install git+https://github.com/pytorch/audio.git

Benchmarking

Run the benchmark with

bash run.sh

and plot the result with

python plot.py

This generates PNG files in the results folder. The data is generated by using a shell script. To generate the data in the folder AUDIO, run generate_audio.sh.

Authors

@faroit, @hagenw

Contribution

We encourage interested users to contribute to this repository in the issue section and via pull requests. Particularly interesting are notifications of new tools and new versions of existing packages. Since benchmarks are subjective, I (@faroit) will reran the benchmark on our server again.

More Repositories

1

awesome-python-scientific-audio

Curated list of python software and packages related to scientific research in audio
1,445
star
2

CountNet

Deep Neural Network for Speaker Count Estimation
Python
144
star
3

stempeg

Python I/O for STEM audio files
Python
87
star
4

reproducible-audio-research

List of Reproducible Audio Research Papers
70
star
5

dsdtools

Parse and process the demixing secrets dataset (DSD100)
Python
47
star
6

magiclock

Use haptic feedback to feel the MIDI clock beat underneath your magic trackpad
Objective-C
39
star
7

freezefx

Python audio freeze effect
Jupyter Notebook
28
star
8

commonfate

Python
17
star
9

nsynth-convert

NSynth for the rest of us
Jupyter Notebook
13
star
10

peaq-python

C
11
star
11

sisec-mus-website

Vue
8
star
12

chromeleiter

Realtime Chromagram on a Launchpad
C++
7
star
13

electracity

An Audacity replacement using Electron and Waveform-Playlist
Vue
6
star
14

stft-istft-experiments

find one stft to rule them all
Python
6
star
15

thesis

Ph.D. Thesis LaTeX Code
TeX
6
star
16

deejaypeg

πŸ”ˆ ⃕ πŸ–Ό
Python
5
star
17

dsd100mat

Parse, process and evaluate the demixing secrets dataset (DSD100)
MATLAB
5
star
18

sweety

an intranet shopping system utilizing barcodes
Ruby
3
star
19

defense-slides

Ph.D. Defense Slides
JavaScript
3
star
20

keras2tikz

Generate tikz code for DNN layer diagrams
Python
2
star
21

SplitStems

Splits .mp4 Stems format into individual tracks
Shell
2
star
22

mdx-submissions21

TeX
2
star
23

udons

🍜 baseline model for hear challenge
Python
2
star
24

beta-nmf

Python and C++ implementations of Beta NMF. As described in http://perso.telecom-paristech.fr/~fevotte/Journals/neco09_is-nmf.pdf
C++
1
star
25

midihack

JavaScript
1
star
26

website

Shell
1
star
27

dsd100-loudness

Cross loudness measurements for dsd100 vocal tracks
Python
1
star
28

sisec-mus-results

SISEC MUS 2016 evaluation
Python
1
star