• Stars
    star
    731
  • Rank 61,995 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Collection of audio-focused loss functions in PyTorch

auraloss

A collection of audio-focused loss functions in PyTorch.

[PDF]

Setup

pip install auraloss

If you want to use MelSTFTLoss() or FIRFilter() you will need to specify the extra install (librosa and scipy).

pip install auraloss[all]

Usage

import torch
import auraloss

mrstft = auraloss.freq.MultiResolutionSTFTLoss()

input = torch.rand(8,1,44100)
target = torch.rand(8,1,44100)

loss = mrstft(input, target)

NEW: Perceptual weighting with mel scaled spectrograms.

bs = 8
chs = 1
seq_len = 131072
sample_rate = 44100

# some audio you want to compare
target = torch.rand(bs, chs, seq_len)
pred = torch.rand(bs, chs, seq_len)

# define the loss function
loss_fn = auraloss.freq.MultiResolutionSTFTLoss(
    fft_sizes=[1024, 2048, 8192],
    hop_sizes=[256, 512, 2048],
    win_lengths=[1024, 2048, 8192],
    scale="mel",
    n_bins=128,
    sample_rate=sample_rate,
    perceptual_weighting=True,
)

# compute
loss = loss_fn(pred, target)

Citation

If you use this code in your work please consider citing us.

@inproceedings{steinmetz2020auraloss,
    title={auraloss: {A}udio focused loss functions in {PyTorch}},
    author={Steinmetz, Christian J. and Reiss, Joshua D.},
    booktitle={Digital Music Research Network One-day Workshop (DMRN+15)},
    year={2020}
}

Loss functions

We categorize the loss functions as either time-domain or frequency-domain approaches. Additionally, we include perceptual transforms.

Loss function Interface Reference
Time domain
Error-to-signal ratio (ESR) auraloss.time.ESRLoss() Wright & Välimäki, 2019
DC error (DC) auraloss.time.DCLoss() Wright & Välimäki, 2019
Log hyperbolic cosine (Log-cosh) auraloss.time.LogCoshLoss() Chen et al., 2019
Signal-to-noise ratio (SNR) auraloss.time.SNRLoss()
Scale-invariant signal-to-distortion
ratio (SI-SDR)
auraloss.time.SISDRLoss() Le Roux et al., 2018
Scale-dependent signal-to-distortion
ratio (SD-SDR)
auraloss.time.SDSDRLoss() Le Roux et al., 2018
Frequency domain
Aggregate STFT auraloss.freq.STFTLoss() Arik et al., 2018
Aggregate Mel-scaled STFT auraloss.freq.MelSTFTLoss(sample_rate)
Multi-resolution STFT auraloss.freq.MultiResolutionSTFTLoss() Yamamoto et al., 2019*
Random-resolution STFT auraloss.freq.RandomResolutionSTFTLoss() Steinmetz & Reiss, 2020
Sum and difference STFT loss auraloss.freq.SumAndDifferenceSTFTLoss() Steinmetz et al., 2020
Perceptual transforms
Sum and difference signal transform auraloss.perceptual.SumAndDifference()
FIR pre-emphasis filters auraloss.perceptual.FIRFilter() Wright & Välimäki, 2019

* Wang et al., 2019 also propose a multi-resolution spectral loss (that Engel et al., 2020 follow), but they do not include both the log magnitude (L1 distance) and spectral convergence terms, introduced in Arik et al., 2018, and then extended for the multi-resolution case in Yamamoto et al., 2019.

Examples

Currently we include an example using a set of the loss functions to train a TCN for modeling an analog dynamic range compressor. For details please refer to the details in examples/compressor. We provide pre-trained models, evaluation scripts to compute the metrics in the paper, as well as scripts to retrain models.

There are some more advanced things you can do based upon the STFTLoss class. For example, you can compute both linear and log scaled STFT errors as in Engel et al., 2020. In this case we do not include the spectral convergence term.

stft_loss = auraloss.freq.STFTLoss(
    w_log_mag=1.0, 
    w_lin_mag=1.0, 
    w_sc=0.0,
)

There is also a Mel-scaled STFT loss, which has some special requirements. This loss requires you set the sample rate as well as specify the correct device.

sample_rate = 44100
melstft_loss = auraloss.freq.MelSTFTLoss(sample_rate, device="cuda")

You can also build a multi-resolution Mel-scaled STFT loss with 64 bins easily. Make sure you pass the correct device where the tensors you are comparing will be.

loss_fn = auraloss.freq.MultiResolutionSTFTLoss(
    scale="mel", 
    n_bins=64,
    sample_rate=sample_rate,
    device="cuda"
)

If you are computing a loss on stereo audio you may want to consider the sum and difference (mid/side) loss. Below we have shown an example of using this loss function with the perceptual weighting and mel scaling for further perceptual relevance.

target = torch.rand(8, 2, 44100)
pred = torch.rand(8, 2, 44100)

loss_fn = auraloss.freq.SumAndDifferenceSTFTLoss(
    fft_sizes=[1024, 2048, 8192],
    hop_sizes=[256, 512, 2048],
    win_lengths=[1024, 2048, 8192],
    perceptual_weighting=True,
    sample_rate=44100,
    scale="mel",
    n_bins=128,
)

loss = loss_fn(pred, target)

Development

Run tests locally with pytest.

python -m pytest

More Repositories

1

ai-audio-startups

Community list of startups working with AI in audio and music technology
1,543
star
2

pyloudnorm

Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
Python
635
star
3

dasp-pytorch

Differentiable audio signal processors in PyTorch
Python
226
star
4

steerable-nafx

Steerable discovery of neural audio effects
Jupyter Notebook
201
star
5

micro-tcn

Efficient neural networks for analog audio effect modeling
Python
150
star
6

ronn

Randomized overdrive neural networks
Jupyter Notebook
137
star
7

wavebeat

End-to-end beat and downbeat tracking in the time domain.
Python
118
star
8

pymixconsole

Headless multitrack mixing console in Python
Python
116
star
9

AutomaticMixingPapers

Important papers and associated code on automatic mixing research
HTML
102
star
10

automix-toolkit

Models and datasets for training deep learning automatic mixing models
Python
95
star
11

IIRNet

Direct design of biquad filter cascades with deep learning by sampling random polynomials.
Python
83
star
12

NeuralReverberator

Reverb synthesis via a spectral autoencoder
Python
80
star
13

flowEQ

β-VAE for intelligent control of a five band parametric EQ
MATLAB
67
star
14

bela-zlc

Zero-latency convolution on Bela platform
C++
26
star
15

MixCNN

Convolutional Neural Network for multitrack mix leveling
Python
18
star
16

neural-2a

Neural network model of the analog LA-2A dynamic range compressor
CMake
17
star
17

findio

The Spotify search you don't need and never wanted
HTML
13
star
18

computational-music-creativity

Materials for the Computational Music Creativity course at UPF-MTG (Spring 2020)
TeX
12
star
19

PhaseAnalyzer

C++ plugin built with the JUCE Framework to provide insight about the relative phase relationship of audio signals
C++
10
star
20

pyloudnorm-eval

Evaluation of a number of loudness meter implementations
Python
10
star
21

Cinuosity

Novel playlist generation and music discovery in Spotify
JavaScript
9
star
22

mids

Implementation of content-based audio search algorithm.
Python
8
star
23

auxCord

Sync Spotify accounts to build tailored playlists
JavaScript
7
star
24

youtube-audio-dl

Utility to automate download and normalization of YouTube audio streams
Python
6
star
25

amida

audio mixing interface for data acquisition
Python
5
star
26

pyreqs

Easily build requirements.txt files automatically
Python
4
star
27

machine-learning

Materials for the Machine Learning course at UPF-MTG (Winter 2019)
Jupyter Notebook
4
star
28

consynthance

Studying consonance as a result of vocal similarity
Jupyter Notebook
4
star
29

arte

generative artwork created with canvas-sketch
JavaScript
3
star
30

LDA-Music

LDA topic modeling of raw audio data for music suggestions
Python
3
star
31

ML4AP

Slides for my talk Applications of machine learning for assistive and creative audio plugins
JavaScript
3
star
32

cavae

Covert art variational autoencoder for generating new cover art
Python
3
star
33

aes-presenters-145th

Analysis of papers and presenters at the 145th AES Convention in NYC
Python
2
star
34

AudioTechTalks-S19

Materials and associated code for audio technology talks at Clemson University - Spring 2019
JavaScript
2
star
35

aes-stats-147th

Analysis of papers from the 147th AES Convention in NYC
Python
2
star
36

macOS-laptop

Setup script for config and installation on a fresh macOS machine
Shell
2
star
37

tempnetic

Tempo estimation
Python
2
star
38

sBucket

Build large Spotify playlists using user top tracks and seed track recommendations
Python
1
star
39

ev-sound-analysis

Analyzing audio from electric vehicles to determine FMVSS 141 compliance
Python
1
star
40

personal-website

Personal website built with Angular 7 and Bootstrap 4
HTML
1
star
41

LoudnessHistory

An analysis of the perceived loudness of music over time.
Python
1
star