• Stars
    star
    288
  • Rank 143,818 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 2 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Transcribe music into lead sheets!

Sheet Sage transcribes your favorite pop song into a lead sheet containing the melody and chords!

Quickstart: Transcribe a song

First, ensure you are running Linux and have Docker installed. Then, run this one time setup command, which will download a ~4GB Docker container and ~100MB of data to a cache directory (~/.sheetsage by default).

ROOT=https://raw.githubusercontent.com/chrisdonahue/sheetsage/main; wget $ROOT/prepare.sh && wget $ROOT/sheetsage.sh && chmod +x *.sh && ./prepare.sh

Once this setup completes, transcribing a song is as simple as running:

./sheetsage.sh https://www.youtube.com/watch?v=fHI8X4OXluQ

This will create a directory output/<UUID> containing a PDF with the lead sheet (along with the corresponding LilyPond file), and a MIDI file containing audio-aligned melody and harmony.

You can also run Sheet Sage on a local file:

./sheetsage.sh my_song.mp3

Improving results

Hopefully, the above command will produce reasonable results for most Western music (especially pop) without additional configuration. If the results are unsatisfactory, there are a few things you can try.

Transcribing a shorter segment

Sheet Sage was trained on short (~24 second) segments of music. If you only want to transcribe a segment of that duration or less, you will likely get better results by specifying the starting timestamp of that segment. For example, try:

./sheetsage.sh -s 17 --legacy_behavior <YOUR_SONG>

This will configure Sheet Sage to transcribe a 24s segment from the detected downbeat closest to 17s. Hopefully, the results will be improved over that same segment within the context of the full song.

Fixing tempo issues

Sheet Sage runs beat detection on your input so that it can output a reasonable sheet music score. Occasionally, the beat detection system will detect the tempo as twice or half as fast as the actual tempo, leading to quarter notes rendering as half notes or eighth notes. To fix this, first estimate the tempo (no need to be super precise) and then pass this estimate to Sheet Sage:

./sheetsage.sh --beats_per_minute_hint 120 <YOUR_SONG>

Fixing downbeat issues

In order to draw bar lines in the output, Sheet Sage must detect not only beats but also downbeats. Thanks to madmom, the detection of the former is quite good, but the latter can be brittle, leading to poor transcriptions. If Sheet Sage is detecting the wrong beats as downbeats for your song, you can override the automatic detection by forcing Sheet Sage to interpret your input segment timestamps as downbeats, which should improve transcription quality:

./sheetsage.sh -s 17.0 -e 40.0 --segment_hints_are_downbeats <YOUR_SONG>

If the downbeats are still incorrect, try nudging your timestamps until the correct downbeat is selected.

Using Jukebox for general quality improvements (GPU required)

Our paper demonstrates that using features from OpenAI Jukebox can improve transcription performance. To enable this feature, you must first ensure you have a GPU w/ at least 12GB memory and CUDA installed (nvidia-smi should list your GPU). Then, run ./prepare.sh -j which will download ~10GB of additional files (most of this is the Jukebox model) to ~/.sheetsage. Then, run:

./sheetsage.sh -j <YOUR_SONG>

Note that this will likely take several minutes to complete - consider transcribing a shorter segment when using Jukebox (see above)

HookTheory dataset

Sheet Sage was trained on a new dataset of 50 hours of aligned melody and harmony annotations derived from Hooktheory's TheoryTab DB, which we release alongside this system under a CC BY-NC-SA 3.0 license. The dataset can be downloaded here as a simple, MIR-friendly JSON format (20MB) (no audio is included). Click here for a standalone IPython notebook demonstrating how to explore the dataset.

The dataset is a simple JSON object where each annotation is keyed by its HookTheory ID. We pre-split the data (see the split field) into train, validation, and testing subsets in a 8:1:1 ratio stratified by artist name. The tags field contains various high-level tags; for training melody transcription models, we recommend filtering down to annotations that contain the AUDIO_AVAILABLE and MELODY tags, and filtering out annotations that contain the TEMPO_CHANGES tag. In the alignment field, we include both the original user-specified alignment from HookTheory and our refined alignment (see our paper for details); your system may use either (or neither!) during training.

To simplify evaluation and decouple the melody transcription task from our internal format, we also release the test set annotations as standard audio-aligned MIDI files using our refined alignments: Hooktheory_Test_MIDI.tar.gz. To evaluate a new melody transcription system, use only the information available in Hooktheory_Test_Segments.json as input (audio identifier and, optionally, segment boundaries) and have your system output a MIDI file (named using the dataset key) to a directory. Then, run python -m sheetsage.eval Hooktheory_Test_MIDI.tar.gz <MY_OUTPUT_DIR> --allow_abstain to evaluate your system. Along with the evaluation metrics, we recommend reporting the percentage of annotations you were able to evaluate on at time of publication, and evaluating all models you are comparing on the same set of audio files.

Note that, as our primary focus of this work was on melody transcription, we do not provide any official recommendations for how to evaluate chord recognition models on this HookTheory data, though we strongly encourage followup work to establish standards for doing so!

A full list of all of the files we release as part of this dataset is as follows (bolded files constitute the "canonical" dataset we release as part of our paper, non-bolded files are auxiliary):

Development and advanced usage

To get started with development, start by running:

git clone [email protected]:chrisdonahue/sheetsage.git --single-branch

Because Sheet Sage has a considerable number of (fairly brittle) dependencies, development through Docker is strongly recommended. To do so, navigate to the docker directory and run the ./run.sh script, which will launch a development container in the background (any changes to the library on the host will propagate to the container). Then run ./shell.sh to tunnel into the container.

Once in the container, try running python -m unittest discover -v to run the test cases. You may also need to run python -m sheetsage.assets to download all development assets.

Transcription with Jupyter

Especially when using Jukebox, it can be convenient to run transcription in a Jupyter notebook (to avoid setting up the models every time you transcribe a new song). To do so, navigate to the docker directory and run ./run.sh followed by ./notebook.sh. Open one of the displayed links (should include a token parameter) in your browser, and interact with the Inference.ipynb file.

Licensing considerations

While all of the code in this repository is released under a permissive MIT license, the "Sheet Sage" system as a whole contains numerous additional licensing considerations that especially affect commercial use.

Because they are trained on user contributions to HookTheory, the transcription models underneath the hood of Sheet Sage (specifically, all of the files referenced in sheetsage.json which are downloaded by the prepare.sh script) are released under CC BY-NC-SA 3.0. Moreover, Sheet Sage makes use of madmom, Jukebox, and Melisma, which all have additional licensing terms affecting commercial use.

Please ensure your use of Sheet Sage complies with all licensing terms.

Attribution

If you use this code or dataset in your research, please cite us via the following BibTeX:

@inproceedings{donahue2022melody,
  title={Melody transcription via generative pre-training},
  author={Donahue, Chris and Thickstun, John and Liang, Percy},
  booktitle={ISMIR},
  year={2022}
}

More Repositories

1

wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
Python
1,323
star
2

nesmdb

The NES Music Database: use machine learning to compose music for the Nintendo Entertainment System!
Python
454
star
3

LakhNES

Generate 8-bit chiptunes with deep learning
Python
332
star
4

ddc

Dance Dance Convolution dataset tools and models
Python
211
star
5

ilm

Easily fine tune GPT-2 to fill in missing text
Python
196
star
6

music-cocreation-tutorial

Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js
Jupyter Notebook
104
star
7

sdgan

Official implementation of "Semantically Decomposing the Latent Spaces of Generative Adversarial Networks"
Python
95
star
8

opengl_spectrogram

using JUCE to create a 3D spectrogram drawn with OpenGL
C++
39
star
9

neural-loops

Make musical loops in the browser using WaveGAN, GANSynth, and MusicVAE
JavaScript
34
star
10

ddc_onset

Music onset detector from Dance Dance Convolution packaged as a lightweight PyTorch module
Python
31
star
11

midi2key_linux

Simple script to convert MIDI inputs to hotkeys on linux
Shell
13
star
12

fall23-phd-prospectives

Info for prospective PhD students for Chris Donahue's lab at CMU starting Fall 23.
12
star
13

piano-transcribe-batch

Uses Magenta's Onsets and Frames piano transcription model to transcribe a batch of solo piano recordings
JavaScript
11
star
14

piano-genie-research-demo

This is the old Piano Genie demo. For a shiny new one, go to http://piano-genie.glitch.me
TypeScript
10
star
15

gdrive-wget

Generate wget commands for Google Drive links!
JavaScript
5
star
16

ject

JUCE extended convolution techniques GUI
C++
5
star
17

wavegan_examples

Sound examples for WaveGAN
HTML
3
star
18

gpsynth

audio synthesis using genetic programming
C++
2
star
19

chrisdonahue.github.io

Jupyter Notebook
1
star
20

advoc_examples

HTML
1
star
21

sheetsage-lbd

Sound examples for Sheet Sage at ISMIR 2021 late breaking demos https://archives.ismir.net/ismir2021/latebreaking/000049.pdf
TeX
1
star
22

js_audio_examples

repository for open source JS/Web Audio API computer music tutorials
JavaScript
1
star