• Stars
    star
    332
  • Rank 126,957 (Top 3 %)
  • Language
    Python
  • Created over 5 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generate 8-bit chiptunes with deep learning

LakhNES: Generate 8-bit music with machine learning

LakhNES (paper, music examples) is a deep neural network capable of generating music that can be played by the audio synthesis chip on the Nintendo Entertainment System (NES). It was trained on music composed for the NES by humans. Our model takes advantage of transfer learning: we pre-train on the heterogeneous Lakh MIDI dataset before fine tuning on the NES Music Database target domain.

Using this codebase

Generating new chiptunes

This codebase primarily functions to allow for the generation of musical material using the pre-trained LakhNES model. LakhNES outputs sequences of musical events which need to be separately synthesized into 8-bit audio. The steps required are as follows:

  1. Set up your model environment
  2. Set up your audio synthesis environment
  3. Download a pre-trained checkpoint
  4. Generate and listen to chiptunes

Evaluating pre-trained checkpoints

This codebase also allows you to evaluate pre-trained models to reproduce the paper results. The steps required for this use case are as follows:

  1. Set up your model environment
  2. Download the pre-trained checkpoints
  3. Run the eval script

Training new checkpoints

With this codebase you can also train a new model (though the documentation for this is still being improved):

  1. Set up your model environment
  2. Download the data
  3. Train a new model

Model environment

The model environment requires Python 3 and Pytorch. The development version of Pytorch was 1.0.1.post2, but hopefully the newest version will continue to work (see this section for a sanity check).

We recommend using virtualenv as you will need a separate environment to perform audio synthesis.

cd LakhNES
virtualenv -p python3 --no-site-packages LakhNES-model
source LakhNES-model/bin/activate
pip install torch==1.0.1.post2 torchvision==0.2.2.post3

Synthesis environment

LakhNES requires the Python package nesmdb to synthesize chiptune audio. Unfortunately, nesmdb does not support Python 3 (which the rest of this codebase depends on).

We strongly recommend using virtualenv to install nesmdb and run it is a local RPC server. To do this, run the following commands from this repository:

cd LakhNES
virtualenv -p python2.7 --no-site-packages LakhNES-synth
source LakhNES-synth/bin/activate
pip install nesmdb
pip install pretty_midi
python data/synth_server.py 1337

This will expose an RPC server on port 1337 with two methods: tx1_to_wav and tx2_to_wav. Both take a TX1/TX2 input file path, a WAV output file path, and optionally a MIDI downsampling rate. A lower rate speeds up synthesis but will mess up the rhythms (if not specified, no downsampling will occur).

(Optional) Test your synthesis environment on human-composed music

If you wish to test your synthesis environment on human-composed music, you first need to download the data. Then, if you have both your model and synthesis environments ready, you can synthesize a chiptune from Kirby's Adventure:

source LakhNES-model/bin/activate
python data/synth_client.py data/nesmdb_tx1/train/191_Kirby_sAdventure_02_03PlainsLevel.tx1.txt plains_tx1.wav 48
aplay plains_tx1.wav
python data/synth_client.py data/nesmdb_tx2/train/191_Kirby_sAdventure_02_03PlainsLevel.tx2.txt plains_tx2.wav 48
aplay plains_tx2.wav

Download checkpoints

Here we provide all of the Transformer-XL checkpoints used for the results in our paper. We recommend using the LakhNES checkpoint which was pretrained on Lakh MIDI for 400k batches before fine tuning on NES-MDB. However, the others can also produce interesting results (in particular NESAug).

  • (147 MB) (Recommended) Download LakhNES (400k steps Lakh pre-training)
  • (147 MB) Download Lakh200k (200k steps Lakh pre-training)
  • (147 MB) Download Lakh100k (100k steps Lakh pre-training)
  • (147 MB) Download NESAug (No Lakh pre-training but uses data augmentation)
  • (147 MB) Download NES (No Lakh pre-training or data augmentation)
  • (147 MB) Download Lakh400kPretrainOnly (LakhNES model without NES-MDB finetuning)

Generate new chiptunes

To generate new chiptunes, first set up your model environment, download a checkpoint, and start your synthesis server. Then, run the following:

source LakhNES-model/bin/activate
python generate.py \
	<MODEL_DIR> \
	--out_dir ./generated \
	--num 1
python data/synth_client.py ./generated/0.tx1.txt ./generated/0.tx1.wav
aplay ./generated/0.tx1.wav

We've also included the IPython notebooks we used to create the continuations of human-composed chiptunes (continuations.ipynb) and rhythm accompaniment examples (accompany_rhythm.ipynb) as heard on our examples page.

Download data


To adapt music data to the Transformer architecture, we process MIDI files (top) into an event-based representation akin to language (bottom). Each event is musically meaningful such as a note starting or time advancing.

LakhNES is first trained on Lakh MIDI and then fine tuned on NES-MDB. The MIDI files from these datasets are first converted into a list of musical events to adapt them to the Transformer architecture.

The NES-MDB dataset has been preprocessed into two event-based formats: TX1 and TX2. The TX1 format only has composition information: the notes and their timings. The TX2 format has expressive information: dynamics and timbre information.

You can get the data in TX1 (used in our paper) and TX2 (not used in our paper) formats here:

Other instructions in this README assume that you have moved (at least one of) these bundles to the LakhNES/data folder and tar xvfz them there.

Reproduce paper results

If you download all of the above checkpoints and tar xvfz them under LakhNES/model/pretrained, you can reproduce the exact numbers from our paper (Table 2 and Figure 3):

source LakhNES-model/bin/activate
cd model
./reproduce_paper_eval.sh

This should take a few minutes and yield valid PPLs of [4.099, 3.175, 2.911, 2.817, 2.800] and test PPLs of [3.501, 2.741, 2.545, 2.472, 2.460] in order.

Train LakhNES

I (Chris) admit it. My patch of the official Transformer-XL codebase (which lives under the model subdirectory) is among the ugliest code I've ever written. Instructions about how to use it are forthcoming, though the adventurous among you are welcome to try before then. For now, I focused on making the pretrained checkpoints easy to use. I hope that will suffice for now.

One asset of our training pipeline, the code which adapts Lakh MIDI to NES MIDI for transfer learning, is somewhat more polished. It can be found at LakhNES/data/adapt_lakh_to_nes.py.

User study

Information about how to use the code for our Amazon Mechanical Turk user study (under LakhNES/userstudy) is forthcoming.

Attribution

If you use this work in your research, please cite us via the following BibTeX:

@inproceedings{donahue2019lakhnes,
  title={LakhNES: Improving multi-instrumental music generation with cross-domain pre-training},
  author={Donahue, Chris and Mao, Huanru Henry and Li, Yiting Ethan and Cottrell, Garrison W. and McAuley, Julian},
  booktitle={ISMIR},
  year={2019}
}

More Repositories

1

wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
Python
1,323
star
2

nesmdb

The NES Music Database: use machine learning to compose music for the Nintendo Entertainment System!
Python
454
star
3

sheetsage

Transcribe music into lead sheets!
Python
288
star
4

ddc

Dance Dance Convolution dataset tools and models
Python
211
star
5

ilm

Easily fine tune GPT-2 to fill in missing text
Python
196
star
6

music-cocreation-tutorial

Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js
Jupyter Notebook
104
star
7

sdgan

Official implementation of "Semantically Decomposing the Latent Spaces of Generative Adversarial Networks"
Python
95
star
8

opengl_spectrogram

using JUCE to create a 3D spectrogram drawn with OpenGL
C++
39
star
9

neural-loops

Make musical loops in the browser using WaveGAN, GANSynth, and MusicVAE
JavaScript
34
star
10

ddc_onset

Music onset detector from Dance Dance Convolution packaged as a lightweight PyTorch module
Python
31
star
11

midi2key_linux

Simple script to convert MIDI inputs to hotkeys on linux
Shell
13
star
12

fall23-phd-prospectives

Info for prospective PhD students for Chris Donahue's lab at CMU starting Fall 23.
12
star
13

piano-transcribe-batch

Uses Magenta's Onsets and Frames piano transcription model to transcribe a batch of solo piano recordings
JavaScript
11
star
14

piano-genie-research-demo

This is the old Piano Genie demo. For a shiny new one, go to http://piano-genie.glitch.me
TypeScript
10
star
15

gdrive-wget

Generate wget commands for Google Drive links!
JavaScript
5
star
16

ject

JUCE extended convolution techniques GUI
C++
5
star
17

wavegan_examples

Sound examples for WaveGAN
HTML
3
star
18

gpsynth

audio synthesis using genetic programming
C++
2
star
19

chrisdonahue.github.io

Jupyter Notebook
1
star
20

advoc_examples

HTML
1
star
21

sheetsage-lbd

Sound examples for Sheet Sage at ISMIR 2021 late breaking demos https://archives.ismir.net/ismir2021/latebreaking/000049.pdf
TeX
1
star
22

js_audio_examples

repository for open source JS/Web Audio API computer music tutorials
JavaScript
1
star