• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    Python
  • Created almost 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A test bed for updates and new features | pytorch/audio

SUNSETTING torcnaudio-contrib

We made some progress in this repo and contributed to the original repo pytorch/audio which satisfied us. We happily stopped working here :) Please visit pytorch/audio for any issue/request!

.

.

.

.

.

.

.

.

.

.

(We keep the existing content as below )

torchaudio-contrib

Goal: To propose audio processing Pytorch codes with nice and easy-to-use APIs and functionality.

👐 This should be seen as a community based proposal and the basis for a discussion we should have inside the pytorch audio user community. Everyone should be welcome to join and discuss.

Our motivation is:

  • API design: Clear, readible names for class/functions/arguments, sensible default values, and shapes.
  • Fast processing on GPU
  • Methodology: Both layer and functional
    • Layers (nn.Module) for reusability and easier use
    • and identical implementation with Functionals
  • Simple installation
  • Multi-channel support

Contribution

Making things quicker and open! We're -contrib repo, hence it's easy to enter but hard to graduate.

  1. Make a new Issue for a potential PR
  2. Until it's in a good shape,
    1. Make a PR with following the current conventions and unittest
    2. Review-merge.
  3. Based on it, make a PR to torch/audio

Discussion on how to contribute - #37

Current issues/future work

  • Better module/sub-module hierarchy
  • Complex number support
  • More time-frequency representations
  • Signal processing modules, e.g., vocoder
  • Augmentation

API suggestions

Notes

  • Audio signals can be multi-channel
  • STFT: short-time Fourier transform, outputing a complex-numbered representation
  • Spectrogram: magnitudes of STFT
  • Melspectrogram: mel-filterbank applied to spectrogram

Shapes

  • audio signals: (batch, channel, time)
    • E.g., STFT input shape
    • Based on torch.stft input shape
  • 2D representations: (batch, channel, freq, time)
    • E.g., STFT output shape
    • Channel-first, following torch convention.
    • Then, (freq, time), following torch.stft

Overview

STFT

class STFT(fft_len=2048, hop_len=None, frame_len=None, window=None, pad=0, pad_mode="reflect", **kwargs)
def stft(signal, fft_len, hop_len, window, pad=0, pad_mode="reflect", **kwargs)

MelFilterbank

class MelFilterbank(num_bands=128, sample_rate=16000, min_freq=0.0, max_freq=None, num_bins=1025, htk=False)
def create_mel_filter(num_bands, sample_rate, min_freq, max_freq, num_bins, to_hertz, from_hertz)

Spectrogram

def Spectrogram(fft_len=2048, hop_len=None, frame_len=None, window=None, pad=0, pad_mode="reflect", power=1., **kwargs)

Creates an nn.Sequential:

>>> Sequential(
>>>  (0): STFT(fft_len=2048, hop_len=512, frame_len=2048)
>>>  (1): ComplexNorm(power=1.0)
)

Melspectrogram

def Melspectrogram(num_bands=128, sample_rate=16000, min_freq=0.0, max_freq=None, num_bins=None, htk=False, mel_filterbank=None, **kwargs)

Creates an nn.Sequential:

>>> Sequential(
>>>  (0): STFT(fft_len=2048, hop_len=512, frame_len=2048)
>>>  (1): ComplexNorm(power=2.0)
>>>  (2): ApplyFilterbank()
)

AmplitudeToDb/amplitude_to_db

class AmplitudeToDb(ref=1.0, amin=1e-7)
def amplitude_to_db(x, ref=1.0, amin=1e-7)

Arguments names and the default value of ref follow librosa. The default value of amin however follows Keras's float32 Epsilon, which seems making sense.

DbToAmplitude/db_to_amplitude

class DbToAmplitude(ref=1.0)
def db_to_amplitude(x, ref=1.0)

MuLawEncoding/mu_law_encoding

class MuLawEncoding(n_quantize=256)
def mu_law_encoding(x, n_quantize=256)

MuLawDecoding/mu_law_decoding

class MuLawDecoding(n_quantize=256)
def mu_law_decoding(x_mu, n_quantize=256)

A Big Issue - Remove SoX Dependency

We propose to remove the SoX dependency because:

  • Many audio ML tasks don’t require the functionality included in Sox (filtering, cutting, effects)
  • Many issues in torchaudio are related to the installation with respect to Sox. While this could be simplified by a conda build or a wheel, it will continue being difficult to maintain the repo.
  • SOX doesn’t support MP4 containers, which makes it unusable for multi-stream audio
  • Loading speed is good with torchaudio but e.g. for wav, its not faster than other libraries (including cast to torch tensor) -- as in the graph below. See more detailed benchmarks here.

Proposal

Introduce I/O backends and move the functions that depend on _torch_sox to a backend_sox.py, which is not required to install. Additionally, we could then introduce more backends like scipy.io or pysoundfile. Each backend then imports the (optional) lib within the backend file and each backend includes a minimum spec such as:

import _torch_sox

def load(...)
    # returns audio, rate
def save(...)
    # write file
def info(...)
    # returns metadata without reading the full file  

Backend proposals

  • scipy.io or soundfile as default for wav files
  • aubio or audioread for mp3 and mp4

Installation

pip install -e .

Importing

import torchaudio_contrib

Authors

Keunwoo Choi, Faro Stöter, Kiran Sanjeevan, Jan Schlüter

More Repositories

1

kapre

kapre: Keras Audio Preprocessors
Python
922
star
2

music-auto_tagging-keras

Music auto-tagging models and trained weights in keras/theano
Python
614
star
3

transfer_learning_music

Transfer learning for music classification and regression tasks
Jupyter Notebook
255
star
4

dl4mir

Deep learning for MIR
Jupyter Notebook
236
star
5

lstm_real_book

LSTM source code to generate jazz chord progressions
Python
130
star
6

DrummerNet

Supplementary material of "Deep Unsupervised Drum Transcription", ISMIR 2019
TeX
123
star
7

LSTMetallica

LSTM to generate drum tracks based on Metallica's midi drum tracks
Python
107
star
8

ismir-2019-posters

76
star
9

residual_block_keras

Residual network block in Keras
Python
72
star
10

magnatagatune-list

List of automatic music tagging research articles that are evaluated against MagnaTagATune Dataset
64
star
11

keras_STFT_layer

Do STFT in Keras
Jupyter Notebook
63
star
12

keras_callbacks_example

Keras callback example
Python
56
star
13

MSD_split_for_tagging

Python
52
star
14

Auralisation

Auralisation of learned features in CNN (for audio)
Python
42
star
15

awesome-audio-study-materials-for-korean

39
star
16

music4all_contrib

Jupyter Notebook
32
star
17

data-science-handbook

데이터 과학 핸드북
Jupyter Notebook
18
star
18

perceptual_weighting

Loudness compensation for time-frequency representation
Python
17
star
19

ismir2016-ldb-audio-captioning-model-keras

Audio captioning RNN model in Keras
Python
15
star
20

keras_cropping_layer

Keras cropping layer implementation
Python
13
star
21

icassp_2017

12
star
22

tokenizer-vs-tokenizer

11
star
23

UrbanSound8K-preprocessing

Jupyter Notebook
11
star
24

frequency-aware-conv2d-layer-pytorch

Python
9
star
25

awesome-conscious-AIs

8
star
26

machine_learning_eng2kor

Machine learning eng2kor word dictionary
4
star
27

openmic-2018-tfrecord

Python
3
star
28

FMA_convnet_features

FMA convnet features
3
star
29

magnatagatune

yeah
C++
3
star
30

DLR

Python
2
star
31

MSD-to-MB-mapping

Million Song Dataset to MusicBrainz (AcousticBrainz) mapping files
1
star
32

compact_cnn

a landing page for compact cnn
1
star
33

embedding

C++
1
star