Multitrack Music Transformer
This repository contains the official implementation of "Multitrack Music Transformer" (ICASSP 2023).
Multitrack Music Transformer
Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley and Taylor Berg-Kirkpatrick
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
[homepage]
[paper]
[code]
[reviews]
Content
Prerequisites
We recommend using Conda. You can create the environment with the following command.
conda env create -f environment.yml
Preprocessing
Preprocessed Datasets
The preprocessed datasets can be found here. You can use gdown to download them via command line as follows.
gdown --id 1owWu-Ne8wDoBYCFiF9z11fruJo62m_uK --folder
Extract the files to data/{DATASET_KEY}/processed/json
and data/{DATASET_KEY}/processed/notes
, where DATASET_KEY
is sod
, lmd
, lmd_full
or snd
.
Preprocessing Scripts
You can skip this section if you download the preprocessed datasets.
Step 1 -- Download the datasets
Please download the Symbolic orchestral database (SOD). You may download it via command line as follows.
wget https://qsdfo.github.io/LOP/database/SOD.zip
We also support the following two datasets:
-
wget http://hog.ee.columbia.edu/craffel/lmd/lmd_full.tar.gz
-
gdown https://drive.google.com/u/0/uc?id=1j9Pvtzaq8k_QIPs8e2ikvCR-BusPluTb&export=download
Step 2 -- Prepare the name list
Get a list of filenames for each dataset.
find data/sod/SOD -type f -name *.mid -o -name *.xml | cut -c 14- > data/sod/original-names.txt
Note: Change the number in the cut command for different datasets.
Step 3 -- Convert the data
Convert the MIDI and MusicXML files into MusPy files for processing.
python convert_sod.py
Note: You may enable multiprocessing with the
-j
option, for example,python convert_sod.py -j 10
for 10 parallel jobs.
Step 4 -- Extract the note list
Extract a list of notes from the MusPy JSON files.
python extract.py -d sod
Step 5 -- Split training/validation/test sets
Split the processed data into training, validation and test sets.
python split.py -d sod
Training
Pretrained Models
The pretrained models can be found here. You can use [gdown] to download all the pretrained models via command line as follows.
gdown --id 1HoKfghXOmiqi028oc_Wv0m2IlLdcJglQ --folder
Training Scripts
Train a Multitrack Music Transformer model.
-
Absolute positional embedding (APE):
python mmt/train.py -d sod -o exp/sod/ape -g 0
-
Relative positional embedding (RPE):
python mmt/train.py -d sod -o exp/sod/rpe --no-abs_pos_emb --rel_pos_emb -g 0
-
No positional embedding (NPE):
python mmt/train.py -d sod -o exp/sod/npe --no-abs_pos_emb --no-rel_pos_emb -g 0
Generation (Inference)
Generate new samples using a trained model.
python mmt/generate.py -d sod -o exp/sod/ape -g 0
Evaluation
Evaluate the trained model using objective evaluation metrics.
python mmt/evaluate.py -d sod -o exp/sod/ape -ns 100 -g 0
Acknowledgment
The code is based largely on the x-transformers library developed by lucidrains.
Citation
Please cite the following paper if you use the code provided in this repository.
Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley and Taylor Berg-Kirkpatrick, "Multitrack Music Transformer," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
@inproceedings{dong2023mmt,
author = {Hao-Wen Dong and Ke Chen and Shlomo Dubnov and Julian McAuley and Taylor Berg-Kirkpatrick},
title = {Multitrack Music Transformer},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = 2023,
}