• Stars
    star
    216
  • Rank 182,117 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Audio transformations library for PyTorch

PyTorch Audio Augmentations

CI status codecov Downloads DOI

Audio data augmentations library for PyTorch for audio in the time-domain. The focus of this repository is to:

  • Provide many audio transformations in an easy Python interface.
  • Have a high test coverage.
  • Easily control stochastic (sequential) audio transformations.
  • Make every audio transformation differentiable with PyTorch's nn.Module.
  • Optimise audio transformations for CPU and GPU.

It supports stochastic transformations as used often in self-supervised, semi-supervised learning methods. One can apply a single stochastic augmentation or create as many stochastically transformed audio examples from a single interface.

This package follows the conventions set out by torchvision and torchaudio, with audio defined as a tensor of [channel, time], or a batched representation [batch, channel, time]. Each individual augmentation can be initialized on its own, or be wrapped around a RandomApply interface which will apply the augmentation with probability p.

Usage

We can define a single or several audio augmentations, which are applied sequentially to an audio waveform.

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.001, max_snr=0.005)], p=0.3),
    RandomApply([Gain()], p=0.2),
    HighLowPass(sample_rate=sr), # this augmentation will always be applied in this aumgentation chain!
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can also define a stochastic augmentation on multiple transformations. The following will apply both polarity inversion and white noise with a probability of 80%, a gain of 20%, and delay and reverb with a probability of 50%:

transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion(), Noise(min_snr=0.001, max_snr=0.005)], p=0.8),
    RandomApply([Gain()], p=0.2),
    RandomApply([Delay(sample_rate=sr), Reverb(sample_rate=sr)], p=0.5)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape = [num_channels, num_samples]
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape = [4, num_channels, num_samples]

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

More Repositories

1

SimCLR

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.
Python
742
star
2

CLMR

Official PyTorch implementation of Contrastive Learning of Musical Representations
Python
302
star
3

BYOL

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Python
128
star
4

eurovision-dataset

The Eurovision Song Contest Dataset is a freely-available dataset containing audio features, metadata, contest ranking and voting data of 1735 songs that have competed in the Eurovision Song Contests between 1956 and 2023.
Python
86
star
5

contrastive-predictive-coding

PyTorch implementation of Representation Learning with Contrastive Predictive Coding by Van den Oord et al. (2018)
Python
80
star
6

godfather

The Godfather resource for GTA:Network's online modification for GTA:V. The mod can be downloaded at: https://gtanet.work
JavaScript
30
star
7

Context-Aware-Sequential-Recommendation

This is the Github repository containing the code for the Context-Aware Sequential Recommendation project for the Information Retrieval 2 course at the University of Amsterdam
Python
11
star
8

crypto-data-scraper

Crypto data scraper using Websockets and MongoDB to receive real-time data from cryptocurrency exchanges and save it for historic analysis (machine learning, etc).
Python
10
star
9

gpt-2-lyrics

Using GPT-2 to generate lyrics
Python
6
star
10

midi-controller

MIDI controller made with React and Flask, for use with Ableton or other DAWs
JavaScript
5
star
11

atom-latex-online

Atom Latex Online package
JavaScript
3
star
12

thesis

My Master's Thesis
TeX
3
star
13

sat_sudoku_solver

SAT solver for Sudoku's for the UvA MSc AI course Knowledge Representation
Jupyter Notebook
2
star
14

flask-socketio-bootstrap4-boilerplate

Boilerplate for a Flask webserver, with SocketIO and Bootstrap 4 integrated.
JavaScript
2
star
15

global_food_prices

Data visualization project for UvA on the Global Food Prices dataset.
HTML
2
star
16

weebo

An intelligent personal assistant inspired by the Weebo robot from the popular 1997 movie Flubber.
JavaScript
2
star
17

search_engine

Search engine for arxiv submissions
JavaScript
2
star
18

qualitative_reasoning

Qualitative Reasoning assignment VU
Python
2
star
19

personal-website

My personal website written in the Gatsby framework with a Ghost backend
JavaScript
1
star
20

dutch_jurisdiction_elastic_search

Elastic Search for Dutch jurisdiction archive (rechtspraak.nl)
Python
1
star
21

juce-simple-eq

Simple EQ made in JUCE 6
C++
1
star
22

SETUP-smartlappen

SETUP x Smartlappen project
HTML
1
star
23

homelab

My Homelab built on Docker
Shell
1
star
24

ai-music-presentation

Presentation on Music an AI (Mon 22 January 2018)
Jupyter Notebook
1
star