• Stars
    star
    171
  • Rank 222,266 (Top 5 %)
  • Language
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Suggestions for those interested in developing audio applications of machine learning

Getting Started in 'ML-Audio'

Suggestions for students.

About

Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment, so this page is intended to serve as a series of suggestions for those who may find themselves "on their own" in their interest in this area. It was started by @drscotthawley and Ryan Miller, but is intended to serve and evolve with the community.

  • This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)

Active Practictioners to Follow

Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.

Quick Quotes

  • Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"

Best Practices

"Tips for Publishing Research Code" courtesy of Papers with Code

General Reference Information

Online Training (ML+audio Specific)

Online Training (More General, Courses)

Tutorials

Talks (at conferences)

Talks we found helpful/inspiring (and are hopefully still relevant). TODO: add more recent talks!

Key Papers / Codes

(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )

Demos

(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)

Packages & Libraries

Tools / GUIs / Gists

Books

Python:

Signal Processing Topics

Statistics / Math Topics

Datasets (raw audio)

One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:

DIY Audio Dataset-Making:

(Inspired by Nathan Sepulveda)

Searchable resources:

Scrapers

  • https://github.com/carlthome/audio-scraper: "Scrape audio from YouTube and SoundCloud with a simple command-line interface", e.g. audio-scraper "acoustic guitar". It's 5 years old, but it still works in 2021!

Other DIY Audio Dataset Tricks

  • Depending on your application, you might be able to get away with using samples produced by virtual instruments (i.e. MIDI).
  • If you don't have a lot of labels or targets, you can still pretrain your represenations & weights using autoregressive predictions (even for different audio domains) -- this amounts to doing your own Transfer Learning even without a pretrained model. (This strategy was used by FastAI's text language model system "ULMFit")

Cleaning Audio Datasets?

With images, you can quickly look at many of them almost at once. With audio, you have to listen to each one. But take a cue from fast.ai's Jeremy Howard:

"It's easier to clean a dataset once you've trained a model."

So we can train the model, and then look for high-loss / low-confidence ratings for certain samples: those should be the ones we should check first.

Could even start with someone else's pretrained model and look for anomalies when running inference on your data, i.e. similar inputs should yield similar outputs, so if they don't...?

Length of audio?

You might be able to find short samples of exactly what you need, but it's also common to have the desired audio be just a part of a much longer clip. How to segment it and keep just what you want? You could use other people's models, e.g. for detecting speech or guitars:

  • Delete what you don't want: Audio you might get off YouTube needs to be segmented in order to make it useful -- the stuff you don't want needs to be cut out. If you're looking for musical audio, you could use a speech detector (there are lots of them available) and then delete or ignore all the speech.
  • What if all you want is the guitar solo, not the whole song? Someone else's pretrained model for detecting guitars could help you.

Are we classifying or regressing?

Standards are a lot higher for regression systems, e.g. phase errors / time alignment issues probably won't matter to a classifier, but might for a regression model, depending on the goal. What about clipping, distortion,...? This will depend on what you're trying to do.

"Major" ML-Audio Research/Development Groups

Universities:

(or, "Where should I apply for grad school?")

  • QMUL (London)
  • UPF (Barcelona)
  • CRRMA (Stanford, San Francisco)
  • IRCAM (Paris)
  • NYU (New York)

Industry:

("Where can I get an internship/job"?)

Conferences

("Which conference(s) should I go to?" -- asked by student on the day this doc began)

Audio-Specific

**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences

  • Audio Engineering Society (AES)
  • ASA
  • Digital Audio Effects (DAFx)
  • ICASSP
  • ISMIR
  • SANE
  • Web Audio Conference (WAC)
  • SMC
  • LVA/ICA
  • Audio Mostly
  • WIMP
  • DCASE
  • CSMC
  • MuMe
  • ICMC
  • CMMR
  • IBAC
  • MLSP
  • Interspeech
  • FMA

General ML

  • ICLR
  • ICML
  • NeurIPS
  • IJCNN

Journals

("Where can I get published?")

In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.

Competitions / Benchmarks

Some are yearly, some may be defunct but still interesting.

Contributors

Ryan Miller, RJ Skerry-Ryan, Dave Moffat, Jesse Engel, Iver Jordal

If you want your name listed here, you may. ;-)

More Repositories

1

panotti

A multi-channel neural network audio classifier using Keras
Python
269
star
2

audio-classifier-keras-cnn

Audio Classifier in Keras using Convolutional Neural Network
Python
158
star
3

signaltrain

learning audio effects with neural networks
Python
98
star
4

aeiou

(ML) audio engineering i/o utils
Jupyter Notebook
52
star
5

DLAIE

Materials for Hawley's Deep Learning & AI Ethics course
Jupyter Notebook
34
star
6

fad_pytorch

Frechet Audio Distance evaluation in PyTorch
Python
33
star
7

audio-algebra

alchemy with embeddings
Jupyter Notebook
33
star
8

SHAART

SHAART is a Python-based audio analysis toolkit, for educational purposes
Python
27
star
9

vibrary

Vibrary is a GUI client for a user-trainable neural network tool to help producers find audio files on their hard drives
C++
17
star
10

midi-player

Python launcher of animated MIDI player by @cifkao & @magenta
Python
16
star
11

vicregaddon

A lightweight and modular parallel PyTorch implementation of VICReg (intended for audio)
Python
13
star
12

blog_fastpages

Scott H. Hawley's Blog
Jupyter Notebook
10
star
13

PolarPatternPlotter

iOS app for measuring sound directivity of loudspeakers and microphones
Swift
10
star
14

SoundFieldsForever

App Suite for Visualizing Sound in 3D
C#
7
star
15

devblog3

another dev blog attempt
Jupyter Notebook
6
star
16

fastproaudio

End-to-end audio deep learning with fastai
Jupyter Notebook
5
star
17

FaceOSC-iOS

Port to iOS of Christopher Baker's FaceOSC for Kyle McDonald's ofxFaceTracker, with output set to be compatible with Rebecca Fiebrink's Wekinator
Objective-C++
5
star
18

espiownage

Ownage of ESPI image inference. (Pronounced like "espionage" but with a little "own" in the middle.)
Jupyter Notebook
4
star
19

NASH_time_align

Learning to do Time Alignment. Built within the fastproaudio repo, cf. https://drscotthawley.github.io/fastproaudio/time_align.html
Jupyter Notebook
4
star
20

botograder

An autograder for jupyter notebooks
Python
3
star
21

image-capture-opencv

Python utility for image capture, frame subtraction, (e.g. for ESPI)
Python
3
star
22

prefigure

Run configuration management utils: combines configparser, argparse, and wandb.API
Python
3
star
23

DataLoader

FanFic account of DJ DataLoader
2
star
24

SPNet

Object detection for ESPI images of oscillating steelpan drums
Python
2
star
25

room-shape

'Deep' neural network learns (boxy) room shape given mode frequencies, or vice versa
Python
2
star
26

talks

talks I've given
Jupyter Notebook
2
star
27

blog

my new blog site
Jupyter Notebook
2
star
28

TTTT

Trying To Teach Transformers - dump/playground for code
Jupyter Notebook
1
star
29

machinelearningfoundations

A series of tutorials I'm writing for myself and students
Jupyter Notebook
1
star
30

blog_quarto

Quarto version of former fastpages blog
Jupyter Notebook
1
star
31

add-menu-popover-demo

Swift: Example of a popover window with a navigation controller and multiple pages for user selection
Swift
1
star
32

mrspuff

A library for Deep Learning education. (deep learning <=> having a school at the bottom of the ocean)
Jupyter Notebook
1
star
33

oplas

Official repository for "Operational Latent Spaces"
Jupyter Notebook
1
star