• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    Python
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Model for MDX23 music separation contest

MVSEP-MDX23-music-separation-model

Model for Sound demixing challenge 2023: Music Demixing Track - MDX'23. Model perform separation of music into 4 stems "bass", "drums", "vocals", "other". Model won 3rd place in challenge (Leaderboard C).

Model based on Demucs4, MDX neural net architectures and some MDX weights from Ultimate Vocal Remover project (thanks Kimberley Jensen for great high quality vocal models). Brought to you by MVSep.com.

Usage

    python inference.py --input_audio mixture1.wav mixture2.wav --output_folder ./results/

With this command audios with names "mixture1.wav" and "mixture2.wav" will be processed and results will be stored in ./results/ folder in WAV format.

All available keys

  • --input_audio - input audio location. You can provide multiple files at once. Required
  • --output_folder - output audio folder. Required
  • --cpu - choose CPU instead of GPU for processing. Can be very slow.
  • --overlap_large - overlap of splitted audio for light models. Closer to 1.0 - slower, but better quality. Default: 0.6.
  • --overlap_small - overlap of splitted audio for heavy models. Closer to 1.0 - slower, but better quality. Default: 0.5.
  • --single_onnx - only use single ONNX model for vocals. Can be useful if you have not enough GPU memory.
  • --chunk_size - chunk size for ONNX models. Set lower to reduce GPU memory consumption. Default: 1000000.
  • --large_gpu - it will store all models on GPU for faster processing of multiple audio files. Requires at least 11 GB of free GPU memory.
  • --use_kim_model_1 - use first version of Kim model (as it was on contest).
  • --only_vocals - only create vocals and instrumental. Skip bass, drums, other. Processing will be faster.

Notes

  • If you have not enough GPU memory you can use CPU (--cpu), but it will be slow. Additionally you can use single ONNX (--single_onnx), but it will decrease quality a little bit. Also reduce of chunk size can help (--chunk_size 200000).
  • In current revision code requires less GPU memory, but it process multiple files slower. If you want old fast method use argument --large_gpu. It will require > 11 GB of GPU memory, but will work faster.
  • There is Google.Collab version of this code.

Quality comparison

Quality comparison with best separation models performed on MultiSong Dataset.

Algorithm SDR bass SDR drums SDR other SDR vocals SDR instrumental
MVSEP MDX23 12.5034 11.6870 6.5378 9.5138 15.8213
Demucs HT 4 12.1006 11.3037 5.7728 8.3555 13.9902
Demucs 3 10.6947 10.2744 5.3580 8.1335 14.4409
MDX B --- ---- --- 8.5118 14.8192
  • Note: SDR - signal to distortion ratio. Larger is better.

GUI

GUI Window

  • Script for GUI (based on PyQt5): gui.py.
  • You can download standalone program for Windows here (~730 MB). Unzip archive and to start program double click run.bat. On first run it will download pytorch with CUDA support (~2.8 GB) and some Neural Net models.
  • Program will download all needed neural net models from internet at the first run.
  • GUI supports Drag & Drop of multiple files.
  • Progress bar available.

Web Interface

executing web-ui.py with python will start the web interface locally on localhost (127.0.0.1). You'll see what port it is running on within the terminal output.

image

  • Browser-Based user interface
  • Program will download all needed neural net models from internet at the first run.
  • supports Drag & Drop for audio upload (single file)

Web-UI Window

Changes

v1.0.1

  • Settings in GUI updated, now you can control all possible options
  • Kim vocal model updated from version 1 to version 2, you still can use version 1 using parameter --use_kim_model_1
  • Added possibility to generate only vocals/instrumental pair if you don't need bass, drums and other stems. Use parameter --only_vocals
  • Standalone program was updated. It has less size now. GUI will download torch/cuda on the first run.

Citation

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

More Repositories

1

Weighted-Boxes-Fusion

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.
Python
1,719
star
2

Keras-RetinaNet-for-Open-Images-Challenge-2018

Code for 15th place in Kaggle Google AI Open Images - Object Detection Track
Python
266
star
3

Verilog-Generator-of-Neural-Net-Digit-Detector-for-FPGA

Verilog Generator of Neural Net Digit Detector for FPGA
Verilog
264
star
4

ZF_UNET_224_Pretrained_Model

Modification of convolutional neural net "UNET" for image segmentation in Keras framework
Python
213
star
5

volumentations

Library for 3D augmentations
Python
177
star
6

Keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time
Python
156
star
7

Mean-Average-Precision-for-Boxes

Function to calculate mAP for set of detected boxes and annotated boxes.
Python
121
star
8

MobileNet-in-FPGA

Generator of verilog description for FPGA MobileNet implementation
Verilog
120
star
9

classification_models_3D

Set of models for classifcation of 3D volumes
Python
93
star
10

KAGGLE_DISTRACTED_DRIVER

Solutions
Python
93
star
11

segmentation_models_3D

Set of models for segmentation of 3D volumes
Python
89
star
12

Kaggle-Planet-Understanding-the-Amazon-from-Space

3rd place solution
Python
64
star
13

Keras-Mask-RCNN-for-Open-Images-2019-Instance-Segmentation

Code and pre-trained models for Instance Segmentation track in Open Images Dataset
Python
56
star
14

efficientnet_3D

EfficientNets in 3D variant for keras and TF.keras
Python
51
star
15

Keras-augmentation-layer

Keras implementation of layer which performs augmentations of images using GPU.
Python
49
star
16

VGG16-Pretrained-C

Pretrained VGG16 neural net in C language
C
45
star
17

2nd-place-solution-for-VinBigData-Chest-X-ray-Abnormalities-Detection

Localization of thoracic abnormalities model based on VinBigData (top 1%)
Python
36
star
18

KAGGLE_CERVICAL_CANCER_2017

Python
33
star
19

DrivenData-Alzheimer-Research-1st-place-solution

1st place solution for Clog Loss: Advance Alzheimer’s Research with Stall Catchers
Python
22
star
20

classification_models_1D

Classification models 1D Zoo - Keras and TF.Keras
Python
13
star
21

DrivenData-Identify-Fish-Challenge-2nd-Place-Solution

Solution for N+1 fish, N+2 fish DrivenData competition (2nd place)
Python
12
star
22

Covid-19-spread-prediction

Automatic short-term covid-19 spread prediction by countries and Russian regions
Python
10
star
23

MVSEP-CDX23-Cinematic-Sound-Demixing

Model for CDX23 (Cinematic Sound Demixing) contest
Python
9
star
24

DrivenData-Pri-matrix-Factorization-2nd-Place-Solution

DrivenData Pri-matrix Factorization (2nd place solution)
Python
8
star
25

DrivenData-Open-AI-Caribbean-Challenge-2nd-place-solution

Code for DrivenData Open AI Caribbean Challenge. 2nd place solution.
Python
7
star
26

VOTS2023-Challenge-Tracker

Code for VOTS2023 Challenge tracker
Python
7
star
27

KAGGLE_AVITO_2016

Avito Duplicate Ads Detection
Python
6
star
28

MobileNet-v1-Pytorch

MobileNet v1 in Pytorch. Weights converted from Keras implementation
Python
5
star
29

Post-Training-Integer-Quantization

Some examples of quantization process
Python
3
star
30

Audio-separation-models-checker

Python
3
star
31

demucs3

Fork of demucs repository
Python
3
star
32

KAGGLE_YELP

Solutions
Python
3
star
33

KAGGLE_DSB2

Solutions
Python
1
star
34

Pretrained-VGG-neural-nets-in-TensorFlow

Set of VGG neural net models for TensorFlow. Weights converted from Pytorch.
Python
1
star