• Stars
    star
    330
  • Rank 127,657 (Top 3 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

free website for client-side music demixing with Demucs + WebAssembly

free-music-demixer

A free static website for client-side music demixing (aka music source separation) with two AI models: Open-Unmix (with UMX-L weights) and the Demucs v4 hybrid transformer model. It runs on GitHub Pages and Clouflare.

umx.cpp: transliterated the original PyTorch model Python code to C++ with Eigen3. It compiles to WebAssembly with Emscripten. The UMX-L weights are quantized (mostly uint8, uint16 for the last 4 layers) and saved with the ggml binary file format. They are then gzipped. This reduces the 425 MB of UMX-L weights down to 45 MB, while achieving similar performance (verified empirically using BSS metrics).

demucs.cpp: similar to the above. No quantization: the weights of Demucs v4 (4-source) are ~80 MB and stored as float16. Anything smaller affects the quality of the network, and compression only gets down to ~70 MB: not worth the extra loading time.

Roadmap

  • Implement threaded demucs + 6-source (piano, guitar) variants

Dev instructions

Clone the repo with submodules:

git clone --recurse-submodules https://github.com/sevagh/free-music-demixer

To generate a weights file with Python, first create a Python venv, then:

python -m pip install -r ./scripts/requirements.txt
python ./scripts/convert-umx-pth-to-ggml.py --model=umxl ./ggml-umxl
gzip -k ./ggml-umxl/ggml-model-umxhl-u8.bin

Build for WebAssembly with Emscripten using emcmake:

mkdir -p build-wasm && cd build-wasm && emcmake cmake .. && make

The wav-file-encoder project has been vendored in; I manually compiled the Typescript file to Javascript with these commands:

npm install typescript
npx tsc --module es6 ../vendor/wav-file-encoder/src/WavFileEncoder.ts

Demucs v4

Fewer memory issues from segmented design (largest track tested is ~7 minutes, 'Georgia Wonder - Siren').

'Georgia Wonder - Siren' (takes ~41 minutes):

vocals          ==> SDR:   7.261  SIR:  13.550  ISR:  13.158  SAR:   6.763
drums           ==> SDR:  10.629  SIR:  17.819  ISR:  17.373  SAR:  10.829
bass            ==> SDR:  10.593  SIR:  19.696  ISR:  12.244  SAR:  10.007
other           ==> SDR:   6.324  SIR:   9.005  ISR:  13.223  SAR:   6.067

'Zeno - Signs' (takes ~20 minutes):

vocals          ==> SDR:   8.326  SIR:  18.257  ISR:  15.927  SAR:   8.311
drums           ==> SDR:  10.041  SIR:  18.413  ISR:  17.054  SAR:  10.692
bass            ==> SDR:   3.893  SIR:  12.221  ISR:   7.076  SAR:   3.237
other           ==> SDR:   7.432  SIR:  11.422  ISR:  14.161  SAR:   8.201

Open-Unmix (UMX-L)

MUSDB18-HQ test track 'Zeno - Signs':

'Zeno - Signs', fully segmented (60s) inference + wiener + streaming lstm:

vocals          ==> SDR:   6.830  SIR:  16.421  ISR:  14.044  SAR:   7.104
drums           ==> SDR:   7.425  SIR:  14.570  ISR:  12.062  SAR:   8.905
bass            ==> SDR:   2.462  SIR:   4.859  ISR:   5.346  SAR:   3.566
other           ==> SDR:   6.197  SIR:   9.437  ISR:  12.519  SAR:   7.627

'Zeno - Signs', unsegmented inference (crashes with large tracks) w/ streaming lstm + wiener:

vocals          ==> SDR:   6.846  SIR:  16.382  ISR:  13.897  SAR:   7.024
drums           ==> SDR:   7.679  SIR:  14.462  ISR:  12.606  SAR:   9.001
bass            ==> SDR:   2.386  SIR:   4.504  ISR:   5.802  SAR:   3.731
other           ==> SDR:   6.020  SIR:   9.854  ISR:  11.963  SAR:   7.472

Previous release results on 'Zeno - Signs' (no streaming LSTM, no Wiener filtering):

vocals          ==> SDR:   6.550  SIR:  14.583  ISR:  13.820  SAR:   6.974
drums           ==> SDR:   6.538  SIR:  11.209  ISR:  11.163  SAR:   8.317
bass            ==> SDR:   1.646  SIR:   0.931  ISR:   5.261  SAR:   2.944
other           ==> SDR:   5.190  SIR:   6.623  ISR:  10.221  SAR:   8.599
  • Streaming UMX LSTM module for longer tracks with Demucs overlapping segment inference

Testing 'Georgia Wonder - Siren' (largest MUSDB track) for memory usage with 60s segments:

vocals          ==> SDR:   5.858  SIR:  10.880  ISR:  14.336  SAR:   6.187
drums           ==> SDR:   7.654  SIR:  14.933  ISR:  11.459  SAR:   8.466
bass            ==> SDR:   7.256  SIR:  12.007  ISR:  10.743  SAR:   6.757
other           ==> SDR:   4.699  SIR:   7.452  ISR:   9.142  SAR:   4.298

vs. pytorch inference (w/ wiener):

vocals          ==> SDR:   5.899  SIR:  10.766  ISR:  14.348  SAR:   6.187
drums           ==> SDR:   7.939  SIR:  14.676  ISR:  12.485  SAR:   8.383
bass            ==> SDR:   7.576  SIR:  12.712  ISR:  11.188  SAR:   6.951
other           ==> SDR:   4.624  SIR:   7.937  ISR:   8.845  SAR:   4.270

More Repositories

1

pitch-detection

autocorrelation-based O(NlogN) pitch detection
C++
571
star
2

pq

a command-line Protobuf parser with Kafka support and JSON output
Rust
166
star
3

chord-detection

DSP algorithms for chord detection + key estimation
Python
109
star
4

demucs.cpp

C++17 port of Demucs v3 (hybrid) and v4 (hybrid transformer) models with ggml and Eigen3
C++
84
star
5

wireshark-dissector-rs

write wireshark dissectors in Rust via C FFI
Rust
47
star
6

audio-degradation-toolbox

easy-to-use implementation of the ISMIR 2013 Audio Degradation Toolbox
Python
46
star
7

umx.cpp

C++17 port of Open-Unmix-PyTorch with streaming LSTM inference, ggml, quantization, and Eigen
C++
33
star
8

OnAir-Music-Dataset

🎵 a new stem dataset for Music Demixing research, from the OnAir royalty-free music project
33
star
9

goat

AWS EBS-EC2 attach utility. UNMAINTAINED, SEE FORK ->
Go
28
star
10

xumx-sliCQ

music demixing with the sliCQ Transform and PyTorch
Python
26
star
11

Zen

optimized realtime harmonic/percussive source separation using the GPU (NVIDIA CUDA) and CPU (Intel IPP)
Cuda
20
star
12

Real-Time-HPSS

MATLAB + Python implementations of real-time median-filtering Harmonic-Percussive Source Separation
MATLAB
19
star
13

ape

XDP-based packet manipulation tool with Prometheus metrics
C
12
star
14

jitters

RTP jitter buffer implementation written in Rust with example sender and receiver programs
Rust
10
star
15

Music-Separation-TF

Music source separation testbench with various offline and realtime DSP and machine learning algorithms using the STFT, CQT, NSGT, sliCQT, WMDCT, and TFJigsaw
MATLAB
8
star
16

MiXiN

Music Xtraction with Nonstationary Gabor Transforms and Convolutional Denoising Autoencoders
Python
8
star
17

ElectroPARTYogram

⚡ 🎉 native C++ Android beat visualizer with BTrack, Oboe, Ne10, SFML
C++
8
star
18

transcribe

simplistic pitch-detection-based music transcriber
Python
7
star
19

xumx-sliCQ-V2

better music demixing with PyTorch and the sliCQT + interactive live GUI with ONNXRuntime
Python
7
star
20

surge

mpv + youtube-dl command line music player in Rust
Rust
7
star
21

libmetro

create custom metronomes - compound/simple/odd time signatures, polyrhythms, etc.
C++
6
star
22

headbang.py

consensus beat tracking and visualization in mixed metal songs, and headbanging motion analysis with 2D pose estimation
Python
5
star
23

warped-linear-prediction

perceptual audio codec/file format based on WLPC in a FLAC container
Python
4
star
24

music-demixing-challenge-ismir-2021

working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX
Python
4
star
25

multiband-transient-shaper

Bark frequency filterbank + SPL differential envelope follower transient shaper
MATLAB
4
star
26

Pitcha

the original pitch detection app for Android; see https://github.com/pitch-detection instead
Java
3
star
27

raft-badgerdb

Hashicorp Raft LogStore + StableStore backed by dgraph-io's BadgerDB
Go
3
star
28

quadtree-compression

lossy image compression with quadtrees and protobuf
Go
3
star
29

Scriptorium

OCR reading assistant with opencv, Tesseract, kraken, DAWGs and a splay tree
Python
2
star
30

pitchlite

realtime pitch tracking in WebAssembly with AudioWorklet
C++
2
star
31

go-sort

collection of tested Go integer sort algorithms
Go
2
star
32

sevagh

this is also houses the giscus discussions for sevag.xyz
2
star
33

beamer-presentation

my template for LaTeX/beamer presentations (for both academia and industry)
TeX
1
star
34

drum_machine

create click tracks from harmonixset annotations on the fly with libmetro, libsoundio, and stk
Go
1
star
35

MIR-presentations

grad school presentations on the topic of Music Information Retrieval
TeX
1
star
36

top-bar-clocks

top bar clocks extension in Gnome 3
JavaScript
1
star
37

xnetwork

simple graph library with slotmaps
Rust
1
star
38

xumx_slicq_extra

extra files for xumx-sliCQ
Python
1
star
39

mss-oracle-experiments

music source separation experiments with oracle performance and different spectrograms
Python
1
star