• Stars
    star
    1,008
  • Rank 45,589 (Top 0.9 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Audio processing by using pytorch 1D convolution network

nnAudio

nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.

Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is still very difficult to install under the Windows environment due to sox. nnAudio is a more compatible audio processing tool across different operating systems since it relies mostly on PyTorch convolutional neural network. The name of nnAudio comes from torch.nn

Installation

pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation

or

pip install nnAudio==0.3.1

Documentation

https://kinwaicheuk.github.io/nnAudio/index.html

Comparison with other libraries

Feature nnAudio torch.stft kapre torchaudio tf.signal torch-stft librosa
Trainable
Differentiable
Linear frequency STFT
Logarithmic frequency STFT
Inverse STFT
Griffin-Lim
Mel
MFCC
CQT
VQT
Gammatone
CFP1
GPU support

: Fully support ☑️: Developing (only available in dev version) : Not support

1 Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music

News & Changelog

To view the full changelog, please go to CHANGELOG.md

version 0.3.1 (24 Dec 2021):

  1. Added VQT feature #113

version 0.3.0 (19 Nov 2021):

  1. Changed module naming. nnAudio.Spectrogram will be replaced by nnAudio.features in the future releases. Currently, various spectrogram types are accessible via both methods.

How to cite nnAudio

The paper for nnAudio is avaliable on IEEE Access

K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.

BibTex

@ARTICLE{9174990, author={K. W. {Cheuk} and H. {Anderson} and K. {Agres} and D. {Herremans}}, journal={IEEE Access}, title={nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks}, year={2020}, volume={8}, number={}, pages={161981-162003}, doi={10.1109/ACCESS.2020.3019084}}

Call for Contributions

nnAudio is a fast-growing package. With the increasing number of feature requests, we welcome anyone who is familiar with digital signal processing and neural network to contribute to nnAudio. The current list of pending features includes:

  1. Invertible Constant Q Transform (CQT)

(Quick tips for unit test: cd inside Installation folder, then type pytest. You need at least 1931 MiB GPU memory to pass all the unit tests)

Alternatively, you may also contribute by:

  1. Making a better demonstration code or tutorial

Dependencies

Numpy >= 1.14.5

Scipy >= 1.2.0

PyTorch >= 1.6.0 (Griffin-Lim only available after 1.6.0)

Python >= 3.6

librosa = 0.7.0 (Theoretically nnAudio depends on librosa. But we only need to use a single function mel from librosa.filters. To save users troubles from installing librosa for this single function, I just copy the chunk of functions corresponding to mel in my code so that nnAudio runs without the need to install librosa)

Other similar libraries

Kapre

torch-stft

More Repositories

1

demucs_lightning

Demucs Lightning: A PyTorch lightning version of Demucs with Hydra and Tensorboard features
Python
82
star
2

AudioLoader

PyTorch Dataset for Speech and Music audio
Python
73
star
3

Triplet-net-keras

Implementation of Triplet Neural Network on keras
Jupyter Notebook
71
star
4

pytorch-triplet-loss

Triplet-net and triplet loss implementation on pytorch
Jupyter Notebook
40
star
5

ReconVAT

ReconVAT: a semi-supervised automatic music transcription (AMT) model
Python
33
star
6

Jointist

Official Implementation of Jointist
Python
30
star
7

pytorch_template

Template that combines PyTorch Lightning and Hydra
Python
13
star
8

IJCNN2020_music_transcription

source code for the paper publised in IJCNN 2020 "The Impact of Audio Input Representations on Neural Network based Music Transcription"
Jupyter Notebook
13
star
9

pytorch_musicnet

Complete implementation of MusicNet in Pytorch
Jupyter Notebook
10
star
10

slakh_loader

A PyTorch Dataset for Slakh2100
Python
7
star
11

source-separation

Python
7
star
12

IJCNN2020_music_emotion

Source code for the paper published in IJCNN 2020 "Regression-based Music Emotion Prediction using Triplet Neural Networks"
Jupyter Notebook
5
star
13

ICASSP2023_downloader

Python
2
star
14

MCE2018

Source code for Multi-target speaker detection and identification Challenge Evaluation (MCE2018)
Jupyter Notebook
2
star
15

Spectrogram-Normalization

Ways to normalize Spectrograms
Python
1
star
16

game-theory-python

Python scripts for simulating game theory problems
Jupyter Notebook
1
star
17

IJCNN2021.github.io

source code and additional information for IJCNN2021
HTML
1
star
18

Learning

Learning machine learning by doing little experiments with pytorch
Jupyter Notebook
1
star