musescore-dataset
🚨 The dataset has been left unmaintained since Sep 30, 2021.
Help appreciated if you want to take the risk of becoming the victim of personal harassments
The unofficial dataset of all music sheets and users on musescore.com, dedicated to big data analytics / data science / machine learning.
All data is collected by iterating through musecore.com's public API.
The
jsonl
files are in the Newline-delimited JSON (JSON Lines) format.
Only need the sheet files to learn music? try musescore-downloader.
View/Query in Google BigQuery
User Data
Update Manually,
Last Updated: Nov 9, 2020
https://musescore-dataset.xmader.com/user.jsonl
Music Sheet Metadata
Last Updated: Sep 30, 2021
https://musescore-dataset.xmader.com/score.jsonl
mscz
files
All Last Updated: Sep 30, 2021
https://musescore-dataset.xmader.com/mscz-files.csv
# The CSV file itself is on IPFS
# ipns://QmSdXtvzC8v8iTTZuj5cVmiugnzbR1QATYRcGix4bBsioP
cid=$(curl https://musescore-dataset.xmader.com/csv-ipfs-ref | grep -o "\\w\{46\}")
wget -O mscz-files.csv https://ipfs.io/ipfs/${cid}/mscz-files.csv
This is a csv file, which contains score id (id
) and the corresponding IPFS reference (ref
) to each mscz file.
All files are available on IPFS.
NO ONE CAN TAKE IT DOWN NOW!
Bulk Download
We (LibreScore team) don't condone mass downloads using regular methods.
USE AT YOUR OWN RISK
See https://discord.com/channels/774491656643674122/777457743983411221/1032054445422420039
(You must join the LibreScore Community Discord first to see the message.)
IPFS HTTP Gateways
Download mscz files via#!/bin/bash
while IFS=, read -r id ref
do
if [ -f "$id.mscz" ]; then
echo "$id.mscz exists."
else
echo "$id.mscz does not exist."
wget -nv --read-timeout=3 https://ipfs.io$ref -O $id.mscz
fi
done < <(sed '1d' mscz-files.csv)
Using CURL
#!/bin/bash
while IFS=, read -r id ref
do
if [ -f "$id.mscz" ]; then
echo "$id.mscz exists."
else
echo "$id.mscz does not exist."
curl -\# -f https://ipfs.io$ref -o $id.mscz -m 3
fi
done < <(sed '1d' mscz-files.csv)
Or using local IPFS daemon
#!/bin/bash
# Install IPFS https://docs.ipfs.io/how-to/command-line-quick-start/#install-ipfs
ipfs daemon --init &
while IFS=, read -r id ref
do
ipfs get $ref -o $id.mscz
done < <(sed '1d' mscz-files.csv)
Help hosting files
You could help musescore-dataset become more accessible by:
-
Hosting (ipfs pin) those mscz files on your own IPFS nodes
#!/bin/bash while IFS=, read -r id ref do ipfs pin add -r --progress $ref done < <(sed '1d' mscz-files.csv)
or,
-
Asking a public IPFS gateway to periodically fetch and cache file requests
#!/bin/bash # run in a cron job while IFS=, read -r id ref do echo "fetching $id.mscz" curl -\# -f https://ipfs.io$ref -o $id.mscz -m 0.5 rm -f $id.mscz done < <(sed '1d' mscz-files.csv | shuf)
Contact me if you have any questions.
The purpose of the project is to make the data of musescore.com accessible to anyone in need, and bring a clean and high-quality music dataset to the world of computer science, but not for individuals who only want to keep the data pointlessly.