Getting Started in 'ML-Audio'

Suggestions for students.

About

Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment, so this page is intended to serve as a series of suggestions for those who may find themselves "on their own" in their interest in this area. It was started by @drscotthawley and Ryan Miller, but is intended to serve and evolve with the community.

This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)

Active Practictioners to Follow

Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.

Audio ML Twitter list by Fabian-Robert Stöter (@faroit). <-- Follow these people!

Quick Quotes

Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"

Best Practices

"Tips for Publishing Research Code" courtesy of Papers with Code

General Reference Information

Machine Learning Glossary - A reference resource for common ML math topics, definitions, concepts, etc.
Notes on Music Information Retreival

Online Training (ML+audio Specific)

Valerio Velardo's "Deep Learning for Audio"
Jordi Pons' "Deep neural networks for music" teaching materials

Online Training (More General, Courses)

Rebecca Fiebrink's Machine Learning for Musicians and Artists on Kadenze -- No actual audio DSP, but great for concepts, interactive and fun (no math!)
Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
Andrew Ng's ML Course on Coursera (Good all-around ML course)
Fast.ai (Can get you up and running fast)
Neural Network Programming - Deep Learning with PyTorch. Learn how to code an image predictor neural network in Pytorch. Provides practical NN fundamentals
Foundations of Machine Learning taught by David Rosenberg

Tutorials

Andrew Trask's "Anyone Can Learn To Code an LSTM-RNN in Python"
Machine Learning & Deep Learning Fundamentals (Good high level intro to ML concepts and how neural networks operate)

Talks (at conferences)

Talks we found helpful/inspiring (and are hopefully still relevant). TODO: add more recent talks!

Paris Smaragdis at SANE 2015: "NMF? Neural Nets? It’s all the same..."
Ron Weiss at SANE 2015: "Training neural network acoustic models on waveforms"
Jordi Pons at DLBCN 2018: "Training neural audio classifiers with few data"
Sander Dieleman at ISMIR 2019: "Generating Music in the Waveform Domain"

Key Papers / Codes

(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )

Keunwoo Choi et al, "Automatic tagging using deep convolutional neural networks" (ISMIR 2016 Best Paper)
SampleRNN
WaveNet
WaveRNN, i.e. "Efficient Neural Audio Synthesis"
GANSynth
Wave-U-Net

Demos

(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)

Chris Donahue's WaveGAN Demo
Scott Hawley's SignalTrain Demo
Neil Zeghidour and David Grangier's Wavesplit
David Samuel, Aditya Ganeshan, and Jason Naradowsky's Meta-TasNet

Packages & Libraries

awesome-python-scientific-audio Curated list of python software and packages related to scientific research in audio
Librosa Great package for various kinds of audio analysis and manipulation
Audiomentations, data augmentation for audio
tf.signal: signal processing for TensorFlow
fastai_audio (and fastai2_audio), audio libraries for Fast.ai library/MOOC. Primarily for image, text & tabular data processing, there are efforts to add audio. (Work in progress.)

Tools / GUIs / Gists

Jesse Engel's gist to plot "rainbowgrams"

Books

Neural Networks and Deep Learning online book. How drscotthawley first started reading.
Open-Source Tools & Data for Music Source Separation by By Ethan Manilow, Prem Seetharaman, and Justin Salamon (2020). An online, interactive book with Python examples!
List of Books Recommended by ML expert Juergen Schmidthuber for students entering his lab. (Probably pretty demanding material.)

Python:

learnpython.org
Python notebooks for fundamentals of music processing

Signal Processing Topics

Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
An Interactive Introduction to Fourier Transforms by Jez Swanson. (so good!)
Yuge Shi's "Gaussian Processes, Not Quite for Dummies" (GPs get used for much more than signal processing, but are also promising there; feel free to suggest a different category for this content)

Statistics / Math Topics

Gradient Descent
Principal Component Analysis: "PCA From Scratch" by @drscotthawley

Datasets (raw audio)

One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:

NSynth Musical Instruments
GTZAN Genre Collection (Note critique by Bob Sturm)
Fraunhofer IDMT Guitar/Bass Effects
Urban Sound Dataset
FreeSound Annotator (formerly FreeSound Datasets)
FSD50K dataset (from FreeSound)
AudioSet
Birdvox-Full-Night
SignalTrain LA2A
Kaggle Heartbeat Sounds
Electric Guitars by Renato Profeta ("Guitars AI") of Fraunhofer IDMT
Search for other audio datasets at Kaggle (list)
A collated list of MIR datasets can be found here, which is the source for audiocontentanalysis.org,but only some are raw audio
Another list of "audio datasets" by Christopher Dossman
...your dataset here...

DIY Audio Dataset-Making:

(Inspired by Nathan Sepulveda)

Searchable resources:

FreeSound: https://freesound.org/
Internet Archive audio: https://archive.org/details/audio
https://search.audioburst.com/ - speech only. you're searching transcripts.
https://www.mp3juices.cc/ - searches YouTube, lets you download MP3 by pressing a button for each one.
https://sounds.com/ from National Instruments, but it won't be free!
https://www.findsounds.com/ meh.

Scrapers

https://github.com/carlthome/audio-scraper: "Scrape audio from YouTube and SoundCloud with a simple command-line interface", e.g. audio-scraper "acoustic guitar". It's 5 years old, but it still works in 2021!

Other DIY Audio Dataset Tricks

Depending on your application, you might be able to get away with using samples produced by virtual instruments (i.e. MIDI).
If you don't have a lot of labels or targets, you can still pretrain your represenations & weights using autoregressive predictions (even for different audio domains) -- this amounts to doing your own Transfer Learning even without a pretrained model. (This strategy was used by FastAI's text language model system "ULMFit")

Cleaning Audio Datasets?

With images, you can quickly look at many of them almost at once. With audio, you have to listen to each one. But take a cue from fast.ai's Jeremy Howard:

"It's easier to clean a dataset once you've trained a model."

So we can train the model, and then look for high-loss / low-confidence ratings for certain samples: those should be the ones we should check first.

Could even start with someone else's pretrained model and look for anomalies when running inference on your data, i.e. similar inputs should yield similar outputs, so if they don't...?

Length of audio?

You might be able to find short samples of exactly what you need, but it's also common to have the desired audio be just a part of a much longer clip. How to segment it and keep just what you want? You could use other people's models, e.g. for detecting speech or guitars:

Delete what you don't want: Audio you might get off YouTube needs to be segmented in order to make it useful -- the stuff you don't want needs to be cut out. If you're looking for musical audio, you could use a speech detector (there are lots of them available) and then delete or ignore all the speech.
What if all you want is the guitar solo, not the whole song? Someone else's pretrained model for detecting guitars could help you.

Are we classifying or regressing?

Standards are a lot higher for regression systems, e.g. phase errors / time alignment issues probably won't matter to a classifier, but might for a regression model, depending on the goal. What about clipping, distortion,...? This will depend on what you're trying to do.

"Major" ML-Audio Research/Development Groups

Universities:

(or, "Where should I apply for grad school?")

QMUL (London)
UPF (Barcelona)
CRRMA (Stanford, San Francisco)
IRCAM (Paris)
NYU (New York)

Industry:

("Where can I get an internship/job"?)

Google Magenta
Google Perception (speech publications)
Adobe
Spotify
Increasingly, everywhere. ;-)

Conferences

("Which conference(s) should I go to?" -- asked by student on the day this doc began)

Audio-Specific

**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences

Audio Engineering Society (AES)
ASA
Digital Audio Effects (DAFx)
ICASSP
ISMIR
SANE
Web Audio Conference (WAC)
SMC
LVA/ICA
Audio Mostly
WIMP
DCASE
CSMC
MuMe
ICMC
CMMR
IBAC
MLSP
Interspeech
FMA

General ML

ICLR
ICML
NeurIPS
IJCNN

Journals

("Where can I get published?")

IEEE TASLP
JAES
CMJ
JNMR
TISMIR
JASA
EURASIP Journal on Audio Speech and Music Processing

In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.

Competitions / Benchmarks

Some are yearly, some may be defunct but still interesting.

MIREX
SiSEC (Signal Separation Evaluation Campaign)
Kaggle Heartbeat Sounds

Contributors

Ryan Miller, RJ Skerry-Ryan, Dave Moffat, Jesse Engel, Iver Jordal

If you want your name listed here, you may. ;-)

drscotthawley/ml-audio-start

drscotthawley

Reviews

Repository Details