Getting Started in 'ML-Audio'
Suggestions for students.
About
Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment, so this page is intended to serve as a series of suggestions for those who may find themselves "on their own" in their interest in this area. It was started by @drscotthawley and Ryan Miller, but is intended to serve and evolve with the community.
- This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)
Active Practictioners to Follow
Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.
- Audio ML Twitter list by Fabian-Robert Stรถter (@faroit). <-- Follow these people!
Quick Quotes
- Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"
Best Practices
"Tips for Publishing Research Code" courtesy of Papers with Code
General Reference Information
- Machine Learning Glossary - A reference resource for common ML math topics, definitions, concepts, etc.
- Notes on Music Information Retreival
Online Training (ML+audio Specific)
- Valerio Velardo's "Deep Learning for Audio"
- Jordi Pons' "Deep neural networks for music" teaching materials
Online Training (More General, Courses)
- Rebecca Fiebrink's Machine Learning for Musicians and Artists on Kadenze -- No actual audio DSP, but great for concepts, interactive and fun (no math!)
- Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
- Andrew Ng's ML Course on Coursera (Good all-around ML course)
- Fast.ai (Can get you up and running fast)
- Neural Network Programming - Deep Learning with PyTorch. Learn how to code an image predictor neural network in Pytorch. Provides practical NN fundamentals
- Foundations of Machine Learning taught by David Rosenberg
Tutorials
- Andrew Trask's "Anyone Can Learn To Code an LSTM-RNN in Python"
- Machine Learning & Deep Learning Fundamentals (Good high level intro to ML concepts and how neural networks operate)
Talks (at conferences)
Talks we found helpful/inspiring (and are hopefully still relevant). TODO: add more recent talks!
- Paris Smaragdis at SANE 2015: "NMF? Neural Nets? Itโs all the same..."
- Ron Weiss at SANE 2015: "Training neural network acoustic models on waveforms"
- Jordi Pons at DLBCN 2018: "Training neural audio classifiers with few data"
- Sander Dieleman at ISMIR 2019: "Generating Music in the Waveform Domain"
Key Papers / Codes
(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )
- Keunwoo Choi et al, "Automatic tagging using deep convolutional neural networks" (ISMIR 2016 Best Paper)
- SampleRNN
- WaveNet
- WaveRNN, i.e. "Efficient Neural Audio Synthesis"
- GANSynth
- Wave-U-Net
Demos
(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)
- Chris Donahue's WaveGAN Demo
- Scott Hawley's SignalTrain Demo
- Neil Zeghidour and David Grangier's Wavesplit
- David Samuel, Aditya Ganeshan, and Jason Naradowsky's Meta-TasNet
Packages & Libraries
- awesome-python-scientific-audio Curated list of python software and packages related to scientific research in audio
- Librosa Great package for various kinds of audio analysis and manipulation
- Audiomentations, data augmentation for audio
- tf.signal: signal processing for TensorFlow
- fastai_audio (and fastai2_audio), audio libraries for Fast.ai library/MOOC. Primarily for image, text & tabular data processing, there are efforts to add audio. (Work in progress.)
Tools / GUIs / Gists
- Jesse Engel's gist to plot "rainbowgrams"
Books
- Neural Networks and Deep Learning online book. How drscotthawley first started reading.
- Open-Source Tools & Data for Music Source Separation by By Ethan Manilow, Prem Seetharaman, and Justin Salamon (2020). An online, interactive book with Python examples!
- List of Books Recommended by ML expert Juergen Schmidthuber for students entering his lab. (Probably pretty demanding material.)
Computer-Related Topics
Python:
- learnpython.org
- Python notebooks for fundamentals of music processing
Signal Processing Topics
- Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
- An Interactive Introduction to Fourier Transforms by Jez Swanson. (so good!)
- Yuge Shi's "Gaussian Processes, Not Quite for Dummies" (GPs get used for much more than signal processing, but are also promising there; feel free to suggest a different category for this content)
Statistics / Math Topics
- Gradient Descent
- Principal Component Analysis: "PCA From Scratch" by @drscotthawley
Datasets (raw audio)
One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:
- NSynth Musical Instruments
- GTZAN Genre Collection (Note critique by Bob Sturm)
- Fraunhofer IDMT Guitar/Bass Effects
- Urban Sound Dataset
- FreeSound Annotator (formerly FreeSound Datasets)
- FSD50K dataset (from FreeSound)
- AudioSet
- Birdvox-Full-Night
- SignalTrain LA2A
- Kaggle Heartbeat Sounds
- Electric Guitars by Renato Profeta ("Guitars AI") of Fraunhofer IDMT
- Search for other audio datasets at Kaggle (list)
- A collated list of MIR datasets can be found here, which is the source for audiocontentanalysis.org,but only some are raw audio
- Another list of "audio datasets" by Christopher Dossman
- ...your dataset here...
DIY Audio Dataset-Making:
(Inspired by Nathan Sepulveda)
Searchable resources:
-
FreeSound: https://freesound.org/
-
Internet Archive audio: https://archive.org/details/audio
-
https://search.audioburst.com/ - speech only. you're searching transcripts.
-
https://www.mp3juices.cc/ - searches YouTube, lets you download MP3 by pressing a button for each one.
-
https://sounds.com/ from National Instruments, but it won't be free!
Scrapers
- https://github.com/carlthome/audio-scraper: "Scrape audio from YouTube and SoundCloud with a simple command-line interface", e.g.
audio-scraper "acoustic guitar"
. It's 5 years old, but it still works in 2021!
Other DIY Audio Dataset Tricks
- Depending on your application, you might be able to get away with using samples produced by virtual instruments (i.e. MIDI).
- If you don't have a lot of labels or targets, you can still pretrain your represenations & weights using autoregressive predictions (even for different audio domains) -- this amounts to doing your own Transfer Learning even without a pretrained model. (This strategy was used by FastAI's text language model system "ULMFit")
Cleaning Audio Datasets?
With images, you can quickly look at many of them almost at once. With audio, you have to listen to each one. But take a cue from fast.ai's Jeremy Howard:
"It's easier to clean a dataset once you've trained a model."
So we can train the model, and then look for high-loss / low-confidence ratings for certain samples: those should be the ones we should check first.
Could even start with someone else's pretrained model and look for anomalies when running inference on your data, i.e. similar inputs should yield similar outputs, so if they don't...?
Length of audio?
You might be able to find short samples of exactly what you need, but it's also common to have the desired audio be just a part of a much longer clip. How to segment it and keep just what you want? You could use other people's models, e.g. for detecting speech or guitars:
- Delete what you don't want: Audio you might get off YouTube needs to be segmented in order to make it useful -- the stuff you don't want needs to be cut out. If you're looking for musical audio, you could use a speech detector (there are lots of them available) and then delete or ignore all the speech.
- What if all you want is the guitar solo, not the whole song? Someone else's pretrained model for detecting guitars could help you.
Are we classifying or regressing?
Standards are a lot higher for regression systems, e.g. phase errors / time alignment issues probably won't matter to a classifier, but might for a regression model, depending on the goal. What about clipping, distortion,...? This will depend on what you're trying to do.
"Major" ML-Audio Research/Development Groups
Universities:
(or, "Where should I apply for grad school?")
- QMUL (London)
- UPF (Barcelona)
- CRRMA (Stanford, San Francisco)
- IRCAM (Paris)
- NYU (New York)
Industry:
("Where can I get an internship/job"?)
- Google Magenta
- Google Perception (speech publications)
- Adobe
- Spotify
- Increasingly, everywhere. ;-)
Conferences
("Which conference(s) should I go to?" -- asked by student on the day this doc began)
Audio-Specific
**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences
- Audio Engineering Society (AES)
- ASA
- Digital Audio Effects (DAFx)
- ICASSP
- ISMIR
- SANE
- Web Audio Conference (WAC)
- SMC
- LVA/ICA
- Audio Mostly
- WIMP
- DCASE
- CSMC
- MuMe
- ICMC
- CMMR
- IBAC
- MLSP
- Interspeech
- FMA
General ML
- ICLR
- ICML
- NeurIPS
- IJCNN
Journals
("Where can I get published?")
In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.
Competitions / Benchmarks
Some are yearly, some may be defunct but still interesting.
- MIREX
- SiSEC (Signal Separation Evaluation Campaign)
- Kaggle Heartbeat Sounds
Contributors
Ryan Miller, RJ Skerry-Ryan, Dave Moffat, Jesse Engel, Iver Jordal
If you want your name listed here, you may. ;-)