• Stars
    star
    1,431
  • Rank 31,754 (Top 0.7 %)
  • Language
    Python
  • Created over 3 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GiantMIDI-Piano

GiantMIDI-Piano [1] is a classical piano MIDI dataset contains 10,855 MIDI files of 2,786 composers. The curated subset by constraining composer surnames contains 7,236 MIDI files of 1,787 composers. GiantMIDI-Piano are transcribed from live recordings with a high-resolution piano transcription system [2].

Here is the demo of GiantMIDI-Piano: https://www.youtube.com/watch?v=5U-WL0QvKCg

Transcribed MIDI files of GiantMIDI-Piano can be viewed at midis_preview directory.

Download GiantMIDI-Piano

Method 1 (suggested)

Follow disclaimer.md to agree a disclaimer and download a stable version of GiantMIDI-Piano (193 MB).

Method 2

Users can acquire GiantMIDI-Piano by downloading all audio recordings, and transcribing them into MIDI files following the rest part of this repo. The transcription takes ~200 hours on a single GPU card.

Install requirements

Install PyTorch (>=1.4) following https://pytorch.org/.

The above links also include a curated subset. The curated subset constrains the YouTube titles should contain composers surnames.

pip install -r requirements.txt

Download audio recordings

Download audio recordings from YouTube using the following scripts. Approximately 10,855 audio recordings can be downloaded. There can be audios no longer downloadable.

WORKSPACE="./workspace"
mkdir -p $WORKSPACE
cp "resources/full_music_pieces_youtube_similarity_pianosoloprob.csv" $WORKSPACE/"full_music_pieces_youtube_similarity_pianosoloprob.csv"

# Download all mp3s. Users could split the downloading into parts to speed up the downloading. E.g.,
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=0 --end_index=30000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=30000 --end_index=60000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=60000 --end_index=90000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=90000 --end_index=120000
python3 dataset.py download_youtube_piano_solo --workspace=$WORKSPACE --begin_index=12000 --end_index=150000

The downloaded mp3 files look like:

mp3s_piano_solo (10,855 files)
โ”œโ”€โ”€ Aaron, Michael, Piano Course, V8WvKK-1b2c.mp3
โ”œโ”€โ”€ Aarons, Alfred E., Brother Bill, Giet2Krl6Ww.mp3
โ””โ”€โ”€ ...

Transcribe audios to MIDI files

# Transcribe all mp3s to midi files. Users could split the transcription into parts to speed up the transcription. E.g.,
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=0 --end_index=30000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=30000 --end_index=60000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=60000 --end_index=90000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=90000 --end_index=120000
python3 audios_to_midis.py transcribe_piano --workspace=$WORKSPACE --mp3s_dir=$WORKSPACE"/mp3s_piano_solo" --midis_dir=$WORKSPACE"/midis" --begin_ind=120000 --end_index=150000

The transcribed MIDI files look like:

midis (10,855 files)
โ”œโ”€โ”€ Aaron, Michael, Piano Course, V8WvKK-1b2c.mid
โ”œโ”€โ”€ Abel, Frederic, Lola Polka, SLNJF0uiqRw.mid
โ””โ”€โ”€ ...

The transcription of all audio recordings may take around 10 days on a single GPU card.

Details of scripts can be viewed at scripts

Analyses the statistics of GiantMIDI-Piano

All statistics and figures in [1] can be reproduced by:

./scripts/3_statistics.sh

FAQ

If users met "Too many requests! Sleep for 3600 s" when downloading, it means that YouTube has limited the number of videos for downloading. Users could either 1) Wait until YouTube unblock your IP (1 days or a few weeks), or 2) try to use another machine with a different IP for downloading.

Contact

Qiuqiang Kong, [email protected]

Cite

[1] Qiuqiang Kong, Bochen Li, Jitong Chen, and Yuxuan Wang. "GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music." arXiv preprint arXiv:2010.07061 (2020). https://arxiv.org/pdf/2010.07061

License

CC BY 4.0

More Repositories

1

IconPark

๐ŸŽTransform an SVG icon into multiple themes, and generate React icons๏ผŒVue icons๏ผŒsvg icons
TypeScript
8,016
star
2

xgplayer

A HTML5 video player with a parser that saves traffic
JavaScript
7,851
star
3

sonic

A blazingly fast JSON serializing & deserializing library
Assembly
6,369
star
4

monoio

Rust async runtime based on io-uring.
Rust
3,621
star
5

byteps

A high performance and generic framework for distributed DNN training
Python
3,547
star
6

lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation
C++
3,103
star
7

ByteX

ByteX is a bytecode plugin platform based on Android Gradle Transform API and ASM. ๅญ—่Š‚็ ๆ’ไปถๅผ€ๅ‘ๅนณๅฐ
Java
2,865
star
8

AlphaPlayer

AlphaPlayer is a video animation engine.
Java
2,124
star
9

Elkeid

Elkeid is an open source solution that can meet the security requirements of various workloads such as hosts, containers and K8s, and serverless. It is derived from ByteDance's internal best practices.
Go
2,101
star
10

scene

Android Single Activity Applications framework without Fragment.
Java
2,024
star
11

flutter_ume

UME is an in-app debug kits platform for Flutter. Produced by Flutter Infra team of ByteDance
Dart
2,001
star
12

terarkdb

A RocksDB compatible KV storage engine with better performance
C++
1,989
star
13

bhook

๐Ÿ”ฅ ByteHook is an Android PLT hook library which supports armeabi-v7a, arm64-v8a, x86 and x86_64.
C
1,923
star
14

btrace

๐Ÿ”ฅ๐Ÿ”ฅ btrace(AKA RheaTrace) is a high performance Android trace tool which is based on Perfetto, it support to define custom events automatically during building apk and using bhook to provider more native events like Render/Binder/IO etc.
Kotlin
1,826
star
15

gopkg

Universal Utilities for Go
Go
1,586
star
16

bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Java
1,584
star
17

go-tagexpr

An interesting go struct tag expression syntax for field validation, etc.
Go
1,470
star
18

android-inline-hook

๐Ÿ”ฅ ShadowHook is an Android inline hook library which supports thumb, arm32 and arm64.
C
1,445
star
19

appshark

Appshark is a static taint analysis platform to scan vulnerabilities in an Android app.
Kotlin
1,363
star
20

piano_transcription

Python
1,247
star
21

AabResGuard

The tool of obfuscated aab resources.(Android app bundle่ต„ๆบๆททๆท†ๅทฅๅ…ท)
Java
1,247
star
22

CodeLocator

Kotlin
1,163
star
23

BoostMultiDex

BoostMultiDex is a solution for quickly loading multiple dex files on low Android version devices (4.X and below, SDK <21).
Java
1,106
star
24

music_source_separation

Python
1,039
star
25

Fastbot_Android

Fastbot(2.0) is a model-based testing tool for modeling GUI transitions to discover app stability problems
C++
971
star
26

memory-leak-detector

C
919
star
27

fedlearner

A multi-party collaborative machine learning framework
Python
877
star
28

monolith

ByteDance's Recommendation System
Python
812
star
29

SALMONN

SALMONN: Speech Audio Language Music Open Neural Network
Python
786
star
30

sonic-cpp

A fast JSON serializing & deserializing library, accelerated by SIMD.
C++
781
star
31

godlp

sensitive information protection toolkit
Go
770
star
32

tailor

C
669
star
33

RealRichText

A Tricky Solution for Implementing Inline-Image-In-Text Feature in Flutter.
Dart
657
star
34

guide

A new feature guide component by react ๐Ÿงญ
TypeScript
645
star
35

ibot

iBOT ๐Ÿค–: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
Jupyter Notebook
608
star
36

MVDream

Multi-view Diffusion for 3D Generation
Python
588
star
37

magic-microservices

Make Web Components easier and powerful!๐Ÿ˜˜
TypeScript
556
star
38

Fastbot_iOS

About Fastbot(2.0) is a model-based testing tool for modeling GUI transitions to discover app stability problems
Objective-C
537
star
39

res-adapter

Official implementation of "ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models".
Python
508
star
40

mockey

a simple and easy-to-use golang mock library
Go
494
star
41

flow-builder

A highly customizable streaming flow builder.
TypeScript
486
star
42

effective_transformer

Running BERT without Padding
C++
439
star
43

Next-ViT

Python
426
star
44

unpub

Self-hosted private Dart Pub server for Enterprise
Dart
411
star
45

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
C++
407
star
46

MVDream-threestudio

3D generation code for MVDream
Python
397
star
47

matxscript

A high-performance, extensible Python AOT compiler.
C++
390
star
48

syllepsis

Syllepsis is an out-of-the-box rich text editor.
TypeScript
343
star
49

bytemd

ByteMD v1 repository
TypeScript
336
star
50

OMGD

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)
Python
323
star
51

uss

Python
306
star
52

byteir

A model compilation solution for various hardware
MLIR
305
star
53

neurst

Neural end-to-end Speech Translation Toolkit
Python
293
star
54

danmu.js

HTML5 danmu (danmaku) plugin for any DOM element
JavaScript
276
star
55

CloudShuffleService

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Java
235
star
56

g3

Enterprise-oriented Generic Proxy Solutions
Rust
227
star
57

lynx-llm

paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/
Python
225
star
58

xgplayer-vue

Vue component for xgplayer, a HTML5 video player with a parser that saves traffic
JavaScript
219
star
59

vArmor

vArmor is a cloud native container sandbox based on AppArmor/BPF/Seccomp. It also includes multiple built-in protection rules that are ready to use out of the box.
Go
214
star
60

particle-sfm

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild. ECCV 2022.
C++
213
star
61

trace-irqoff

Interrupts-off or softirqs-off latency tracer
C
195
star
62

ParaGen

ParaGen is a PyTorch deep learning framework for parallel sequence generation.
Python
180
star
63

AWERTL

An non-invasive iOS framework for quickly adapting Right-To-Left style UI
Objective-C
172
star
64

Bytedance-UnionAD

Ruby
164
star
65

react-model

The next generation state management library for React
TypeScript
162
star
66

keyhouse

Keyhouse is a skeleton of general-purpose Key Management System written in Rust.
Rust
162
star
67

LargeBatchCTR

Large batch training of CTR models based on DeepCTR with CowClip.
Python
153
star
68

primus

Java
148
star
69

diat

A CLI tool to help with diagnosing Node.js processes basing on inspector.
JavaScript
143
star
70

ic_flow_platform

IFP (ic flow platform) is an integrated circuit design flow platform, mainly used for IC process specification management and data flow contral.
Python
137
star
71

Hammer

An efficient toolkit for training deep models.
Python
136
star
72

DanmakuRenderEngine

DanmakuRenderEngine is a lightweight and scalable Android danmaku library. ่ฝป้‡็บง้ซ˜ๆ‰ฉๅฑ•ๅฎ‰ๅ“ๅผนๅน•ๆธฒๆŸ“ๅผ•ๆ“Ž
Kotlin
127
star
73

ns-x

An easy-to-use, flexible network simulator library in Go.
Go
116
star
74

pv3d

Python
113
star
75

fc-clip

This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Python
109
star
76

RLFN

Winner of runtime track in NTIRE 2022 challenge on Efficient Super-Resolution
Python
106
star
77

trace-noschedule

Trace noschedule thread
C
99
star
78

DCFrame

DCFrame is a powerful UI collection framework, which can easily create complex UI.
Swift
96
star
79

TWIST

Official codes: Self-Supervised Learning by Estimating Twin Class Distribution
Python
95
star
80

tar-wasm

A faster experimental wasm-based tar implementation for browsers.
Rust
94
star
81

magic-portal

โšก A blazing fast micro-component and micro-frontend solution uses web-components under the hood.
TypeScript
90
star
82

xgplayer-react

React component for xgplayer, a HTML5 video player with a parser that saves traffic
JavaScript
84
star
83

fe-foundation

UI Foundation for React Hooks and Vue Composition Api
TypeScript
81
star
84

nnproxy

Scalable NameNode RPC Proxy for HDFS Federation
Java
79
star
85

dbatman

Go
74
star
86

Elkeid-HUB

Elkeid HUB is a rule/event processing engine maintained by the Elkeid Team that supports streaming/offline (not yet supported by the community edition) data processing. The original intention is to solve complex data/event processing and external system linkage requirements through standardized rules.
Python
74
star
87

FreeSeg

Python
69
star
88

pull_to_refresh

Flutter pull_to_refresh widget
Dart
67
star
89

ByteMLPerf

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
Python
63
star
90

Jeddak-DPSQL

DPSQL (Privacy Protection SQL Query Service) - This project is a microservice Middleware located between the database engine ( Hive , Clickhouse , etc.) and the application system. It provides transparent SQL query result desensitization capabilities.
Python
62
star
91

trace-runqlat

C
61
star
92

kernel

ByteDance kernel for use on cloud.
C
57
star
93

terark-zip

A data structure and algorithm library built for TerarkDB
C++
56
star
94

scroll_kit

Dart
54
star
95

ovs-dpdk

This is a fork of Open vSwitch, we focus DPDK based Open vSwitch
C
50
star
96

node-unix-socket

Unix dgram, seqpacket, etc binding for Node.js.
Rust
48
star
97

RangersAppLog

Bytedance AppLog SDK
Objective-C
47
star
98

kvm-utils

C
47
star
99

arishem

A high performance and lightweight rule engine written by Golang.
Go
46
star
100

markov-molecular-sampling

Python
46
star