• Stars
    star
    469
  • Rank 93,595 (Top 2 %)
  • Language
    Python
  • License
    GNU Affero Genera...
  • Created about 8 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Deep Convolutional Neural Networks for Musical Source Separation

DeepConvSep

Deep Convolutional Neural Networks for Musical Source Separation

This repository contains classes for data generation and preprocessing and feature computation, useful in training neural networks with large datasets that do not fit into memory. Additionally, you can find classes to query samples of instrument sounds from RWC instrument sound dataset.

In the 'examples' folder you can find use cases for the classes above for the case of music source separation. We provide code for feature computation (STFT) and for training convolutional neural networks for music source separation: singing voice source separation with the dataset iKala dataset, for voice, bass, drums separation with DSD100 dataset, for bassoon, clarinet, saxophone, violin with Bach10 dataset. The later is a good example for training a neural network with instrument samples from the RWC instrument sound database RWC instrument sound dataset, when the original score is available.

In the 'evaluation' folder you can find matlab code to evaluate the quality of separation, based on BSS eval.

For training neural networks we use Lasagne and Theano.

We provide code for separation using already trained models for different tasks.

Separate music into vocals, bass, drums, accompaniment in examples/dsd100/separate_dsd.py :

python separate_dsd.py -i <inputfile> -o <outputdir> -m <path_to_model.pkl>

where :

  • <inputfile> is the wav file to separate
  • <outputdir> is the output directory where to write the separation
  • <path_to_model.pkl> is the local path to the .pkl file you can download from this address

Singing voice source separation in examples/ikala/separate_ikala.py :

python separate_ikala.py -i <inputfile> -o <outputdir> -m <path_to_model.pkl>

where :

  • <inputfile> is the wav file to separate
  • <outputdir> is the output directory where to write the separation
  • <path_to_model.pkl> is the local path to the .pkl file you can download from this address

Separate Bach chorales from the Bach10 dataset into bassoon, clarinet, saxophone, violin in examples/bach10/separate_bach10.py :

python separate_bach10.py -i <inputfile> -o <outputdir> -m <path_to_model.pkl>

where :

  • <inputfile> is the wav file to separate
  • <outputdir> is the output directory where to write the separation
  • <path_to_model.pkl> is the local path to the .pkl file you can download from this address

Score-informed separation of Bach chorales from the Bach10 dataset into bassoon, clarinet, saxophone, violin in examples/bach10_scoreinformed/separate_bach10.py:

python separate_bach10.py -i -o -m <path_to_model.pkl>

where :

  • <inputfile> is the wav file to separate
  • <outputdir> is the output directory where to write the separation
  • <path_to_model.pkl> is the local path to the .pkl file you can download from zenodo

The folder with the <inputfile> must contain the scores: 'bassoon_b.txt','clarinet_b.txt','saxophone_b.txt','violin_b.txt'. The score file as a note on each line with the format: note_onset_time,note_offset_time,note_name .

Feature computation

Compute the features for a given set of audio signals extending the "Transform" class in transform.py

For instance the TransformFFT class helps computing the STFT of an audio signal and saves the magnitude spectrogram as a binary file.

Examples

### 1. Computing the STFT of a matrix of signals \"audio\" and writing the STFT data in \"path\" (except the phase)
tt1=transformFFT(frameSize=2048, hopSize=512, sampleRate=44100)
tt1.compute_transform(audio,out_path=path, phase=False)

### 2. Computing the STFT of a single signal \"audio\" and returning the magnitude and phase
tt1=transformFFT(frameSize=2048, hopSize=512, sampleRate=44100)
mag,ph = tt1.compute_file(audio,phase=True)

### 3. Computing the inverse STFT using the magnitude and phase and returning the audio data
#we use the tt1 from 2.
audio = tt1.compute_inverse(mag,phase)

Data preprocessing

Load features which have been computed with transform.py, and yield batches necessary for training neural networks. These classes are useful when the data does not fit into memory, and the batches can be loaded in chunks.

Example

### Load binary training data from the out_path folder
train = LargeDataset(path_transform_in=out_path, batch_size=32, batch_memory=200, time_context=30, overlap=20, nprocs=7)

Audio sample querying using RWC database

The RWC instrument sound dataset contains samples played by various musicians in various styles and dynamics, comprising different instruments. You can obtain a sample for a given midi note, instrument, style, dynamics and musician(1,2,3) by using the classes in 'rwc.py'.

Example

### construct lists for the desired dynamics,styles,musician and instrument codes
allowed_styles = ['NO']
allowed_dynamics = ['F','M','P']
allowed_case = [1,2,3]
instrument_nums=[30,31,27,15] #bassoon,clarinet,saxophone,violin
instruments = []
for ins in range(len(instrument_nums)):
    #for each instrument construct an Instrument object
    instruments.append(rwc.Instrument(rwc_path,instrument_nums[ins],allowed_styles,allowed_case,allowed_dynamics))

#then, for a given instrument 'i' and midi note 'm', dynamics 'd', style 's', musician 'n'
note = self.instruments[i].getNote(melNotes[m],d,s,n)
#get the audio vector for the note
audio = note.getAudio()

Data generation

Bach10 experiments offer examples of data generation (or augmentation). Starting from the score or from existing pieces, we can augment the existing data or generate new data with some desired factors. For instance if you have four factors time_shifts,intensity_shifts,style_shifts,timbre_shifts, you can generate the possible combinations between them for a set of pieces and instruments(sources).

#create the product of these factors
cc=[(time_shifts[i], intensity_shifts[j], style_shifts[l], timbre_shifts[k]) for i in xrange(len(time_shifts)) for j in xrange(len(intensity_shifts)) for l in xrange(len(style_shifts)) for k in xrange(len(timbre_shifts))]

#create combinations for each of the instruments (sources)
if len(cc)<len(sources):
    combo1 = list(it.product(cc,repeat=len(sources)))
    combo = []
    for i in range(len(combo1)):
      c = np.array(combo1[i])
      #if (all(x == c[0,0] for x in c[:,0]) or all(x == c[0,1] for x in c[:,1])) \
      if (len(intensity_shifts)==1 and not(all(x == c[0,0] for x in c[:,0]))) \
        or (len(time_shifts)==1 and not(all(x == c[0,1] for x in c[:,1]))):
          combo.append(c)
    combo = np.array(combo)
else:
    combo = np.array(list(it.permutations(cc,len(sources))))
if len(combo)==0:
    combo = np.array([[[time_shifts[0],intensity_shifts[0],style_shifts[0],timbre_shifts[0]] for s in sources]])

#if there are too many combination, you can just randomly sample
if sample_size<len(combo):
    sampled_combo = combo[np.random.choice(len(combo),size=sample_size, replace=False)]
else:
    sampled_combo = combo

References

More details on the separation method can be found in the following article:

P. Chandna, M. Miron, J. Janer, and E. Gomez, “Monoaural audio source separation using deep convolutional neural networks” International Conference on Latent Variable Analysis and Signal Separation, 2017. PDF

M. Miron, J. Janer, and E. Gomez, "Generating data to train convolutional neural networks for low latency classical music source separation" Sound and Music Computing Conference 2017

M. Miron, J. Janer, and E. Gomez, "Monaural score-informed source separation for classical music using convolutional neural networks" ISMIR Conference 2017

Dependencies

python 2.7

climate, numpy, scipy, cPickle, theano, lasagne

The dependencies can be installed with pip:

pip install numpy scipy pickle cPickle climate theano
pip install https://github.com/Lasagne/Lasagne/archive/master.zip

Separating classical music mixtures with Bach10 dataset

We separate bassoon,clarinet,saxophone,violing using Bach10 dataset, which comprises 10 Bach chorales. Our approach consists in synthesing the original scores considering different timbres, dynamics, playing styles, and local timing deviations to train a more robust model for classical music separation.

We have three experiments:

-Oracle: train with the original pieces (obviously overfitting, hence this is the "Oracle");

-Sibelius: train with the pieces sythesized with Sibelius software;

-RWC: train with the pieces synthesized using the samples in RWC instrument sound dataset.

The code for feature computation and training the network can be found in "examples/bach10" folder.

Score-informed separation of classical music mixtures with Bach10 dataset

We separate bassoon,clarinet,saxophone,violing using Bach10 dataset, which comprises 10 Bach chorales and the associated score.

We generate training data with the approach mentioned above using the RWC database. Consequently, we train with the pieces synthesized using the samples in RWC instrument sound dataset.

The score is given in .txt files containing the name of the of the instrument and an additional suffix, e.g. 'bassoon_g.txt'. The format for a note in the text file is: onset, offset, midinotename , as the following example: 6.1600,6.7000,F4# .

The code for feature computation and training the network can be found in "examples/bach10_sourceseparation" folder.

Separating Professionally Produced Music

We separate voice, bass, drums and accompaniment using DSD100 dataset comprising professionally produced music. For more details about the challenge, please refer to SiSEC MUS challenge and DSD100 dataset.

The code for feature computation and training the network can be found in "examples/dsd100" folder.

iKala - Singing voice separation

We separate voice and accompaniment using the iKala dataset. For more details about the challenge, please refer to MIREX Singing voice separation 2016 and iKala dataset.

The code for feature computation and training the network can be found in "examples/ikala" folder.

Training models

For Bach10 dataset :

#train with the original dataset
python -m examples.bach10.compute_features_bach10 --db '/path/to/Bach10/'
#train with the the synthetic dataset generated with Sibelius
python -m examples.bach10.compute_features_bach10sibelius --db '/path/to/Bach10Sibelius/'
#train with the rwc dataset
python -m examples.bach10.compute_features_bach10rwc --db '/path/to/Bach10Sibelius/' --rwc '/path/to/rwc/'
### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.bach10.trainCNNrwc --db '/path/to/Bach10/' --dbs '/path/to/Bach10Sibelius/' --output '/output/path/'
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.bach10.trainCNNSibelius --db '/path/to/Bach10/' --dbs '/path/to/Bach10Sibelius/' --output '/output/path/'

For iKala :

python -m examples.ikala.compute_features --db '/path/to/iKala/'
### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.ikala.trainCNN --db '/path/to/iKala/'

For SiSEC MUS using DSD100 dataset :

python -m examples.dsd100.compute_features --db '/path/to/DSD100/'
### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.dsd100.trainCNN --db '/path/to/DSD100/'

Evaluation

The metrics are computed with bsseval images v3.0, as described here.

The evaluation scripts can be found in the subfolder "evaluation". The subfolder "script_cluster" contains scripts to run the evaluation script in parallel on a HPC cluster system.

For Bach10, you need to run the script Bach10_eval_only.m for each method in the 'base_estimates_directory' folder and for the 10 pieces. To evaluate the separation of the Bach10 Sibeliust dataset, use the 'Bach10_eval_only_original.m' script. Be careful not to mix the estimation directories for the two datasets.

For iKala, you need to run the script evaluate_SS_iKala.m for each of the 252 files in the dataset. The script takes as parameters the id of the file, the path to the dataset, and the method of separation, which needs to be a directory containing the separation results, stored in 'output' folder.

for id=1:252
    evaluate_SS_iKala(id,'/homedtic/mmiron/data/iKala/','fft_1024');
end

For SiSEC-MUS/DSD100, use the scripts at the web-page.

If you have access to a HPC cluster, you can use the .sh scripts in the script_cluster folder which call the corresponding .m files.

Research reproducibility

For DSD100 and iKAla, the framework was tested as a part of a public evaluation campaign and the results were published online (see the sections above).

For Bach10, we provide the synthetic Bach10 Sibeliust dataset and the Bach10 Separation SMC2017 dataset containing the separation for each method as .wav files and the evaluation results as .mat files.

If you want to compute the features and re-train the models, check the 'examples/bach10' folder and the instructions above. Alternatively, you can download an already trained model and perform separation with 'separate_bach10.py'.

If you want to evaluate the methods in Bach10 Separation SMC2017 dataset, then you can use the scripts in evaluation directory, which we explained above in the 'Evaluation' section.

If you want to replicate the plots in the SMC2017 paper, you need to have installed 'pandas' and 'seaborn' (pip install pandas seaborn) and then run the script in the plots subfolder:

bach10_smc_stats.py --db 'path-to-results-dir'

Where 'path-to-results-dir' is the path to the folder where you have stored the results for each method (e.g. if you downloaded the Bach10 Separation SMC2017, it would be the 'results' subfolder).

Acknowledgments

The TITANX used for this research was donated by the NVIDIA Corporation.

License

Copyright (c) 2014-2017
Marius Miron <miron.marius at gmail dot com>,
Pritish Chandna <pc2752 at gmail dot com>,
Gerard Erruz, and Hector Martel
Music Technology Group, Universitat Pompeu Fabra, Barcelona <mtg.upf.edu>

This program is free software: you can redistribute it and/or modify
it under the terms of the Affero GPL license published by
the Free Software Foundation, either version 3 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
Affero GPL license for more details.

You should have received a copy of the Affero GPL license
along with this program.  If not, see <http://www.gnu.org/licenses/>.

More Repositories

1

essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
C++
2,775
star
2

sms-tools

Sound analysis/synthesis tools for music applications
Python
1,620
star
3

essentia.js

JavaScript library for music/audio analysis and processing powered by Essentia WebAssembly
TypeScript
632
star
4

freesound

The Freesound website
Python
309
star
5

gaia

C++ library to apply similarity measures and classifications on the results of audio analysis, including Python bindings. Together with Essentia it can be used to compute high-level descriptions of music.
C++
269
star
6

mtg-jamendo-dataset

Metadata, scripts and baselines for the MTG-Jamendo dataset
Python
259
star
7

WGANSing

Multi-voice singing voice synthesis
Python
235
star
8

freesound-datasets

A platform for the collaborative creation of open audio collections labeled by humans and based on Freesound content.
Python
135
star
9

freesound-python

python client for the freesound API
Python
130
star
10

dunya-desktop

A modular, customizable and open-source desktop application for accessing and visualizing music data.
Python
89
star
11

SymbTr

Turkish Makam Music Symbolic Data Collection
Python
82
star
12

MIRCourse

python notebooks used in the MIR course of the SMC Master of the MTG-UPF
Jupyter Notebook
61
star
13

conferences

Music Technology / MIR conference and journal calls
SCSS
60
star
14

da-tacos

A Dataset for Cover Song Identification and Understanding
Python
56
star
15

miredu

A C++ Vamp plugin implementing basic audio descriptors for educational purposes
C++
49
star
16

DCASE-models

Python library for rapid prototyping of environmental sound analysis systems
Jupyter Notebook
42
star
17

ArabicTransliterator

A code for transliterating (romanizing) Arabic text using the American Library Association - Library of Congress (ALA-LC) standard
Python
42
star
18

essentia-replicate-demos

Demos of Essentia models hosted on Replicate.com
Python
37
star
19

homebrew-essentia

Homebrew build scripts for Essentia
Ruby
34
star
20

music-explore

App to explore latent spaces of music collections
Python
32
star
21

pycompmusic

Tools to help researchers work with Dunya and CompMusic
Python
31
star
22

JAAH

Python
31
star
23

PodcastMix-inference

Python
31
star
24

dunya

The Dunya music browser
Python
29
star
25

acousticbrainz-client

A client to upload data to an acousticbrainz server
Python
29
star
26

essentia-docker

Docker images for Essentia
Roff
28
star
27

freesound-juce

A JUCE client for accessing the Freesound API
C++
28
star
28

acousticbrainz-labs

Cool hacks using acousticbrainz
Jupyter Notebook
25
star
29

acousticbrainz-genre-dataset

The AcousticBrainz Genre Dataset
TeX
24
star
30

essentia-tutorial

A tutorial for using Essentia in Python
Jupyter Notebook
23
star
31

singing-synthesis-demos

Sound examples for the Neural Parametric Singing Synthesizer (NPSS)
HTML
22
star
32

MIR-toolbox-docker

This project provides a docker image to run a jupyter notebook server with essentia, freesound-python and a set of python dependencies commonly used in Music Information Retrieval (MIR).
Dockerfile
22
star
33

github-traffic

Save information about traffic to a GitHub repository
Python
21
star
34

violin-transcription

High-Resolution Violin Transcription using Weak Labels
Jupyter Notebook
20
star
35

tape

TAPE: An End-to-End Timbre-Aware Pitch Estimator
Jupyter Notebook
18
star
36

jingjuPhonemeAnnotation

Hierarchical annotation - line (phrase), syllable, phoneme annotations of the jingju (Beijing opera) a-cappella singing dataset
Python
17
star
37

compIAM

Common tools for the computational analysis of Indian Art Music
Jupyter Notebook
16
star
38

otmm_makam_recognition_dataset

A dataset of Ottoman-Turkish makam music to test makam recognition (and tonic identification) methodologies
Jupyter Notebook
13
star
39

musav-dataset

MusAV: a dataset of relative arousal-valence annotations for validation of audio models
Python
13
star
40

Ismir2018TutorialNotebooks

Jupyter notebooks for Ismir-2018 tutorial titled "Computational approaches for analysis of non-Western music traditions" by Serra, Clayton and Bozkurt
Jupyter Notebook
13
star
41

turkish-makam-acapella-sections-dataset

Clean singing voice with no accompaniment. Semiprofessional singers. Semiprofessional quality. Songs from classical turkish makam in şarkı form. Recorded in studios in Istanbul in June 2014. Annotated on word and phoneme level.
12
star
42

acousticbrainz-gui

C++
12
star
43

similarity-annotator

An annotation tool for sound segmentation and similarity
Python
11
star
44

content_choral_separation

Python
11
star
45

otmm_tonic_dataset

The tonic test dataset for classical Ottoman-Turkish makam music
Jupyter Notebook
10
star
46

otmm_audio_score_alignment_dataset

The Audio Score Alignment Test dataset for Ottoman-Turkish makam music
MATLAB
10
star
47

carnatic-separation-ismir23

Carnatic singing voice separation trained with in-domain data with leakage
Python
10
star
48

SLVision

Developed in c++ with the OpenCV libraries, SLVision is a vision tracking software developed for SecondLight. It tracks 6DoF Markers, hands and fingers and sends the tracked data by using TUIO2 Messages trough a TCP socket to a client application.
C++
9
star
49

freesound-labs

Source code repository for the Freesound Labs. Freesound Labs lists projects and activities related to Freesound.
JavaScript
9
star
50

playlists-stat-analysis

Tools for Analyzing Popularity and Semantic Diversity of a Playlist Dataset
Python
9
star
51

SingWithExpressions

This is the accompanying repository to the paper - Automatic Estimation of Singing Voice Musical Dynamics
9
star
52

echonest-backup

A backup of EchoNest data exposed in the Million Song Dataset
Python
8
star
53

matlab-c-tools

Tools and tutorials for calling C and C++ code from Matlab
C++
8
star
54

essentia.js-tutorial-wac2021

Essentia.js tutorial at Web Audio Conference 2021
8
star
55

metadb

A simple database containing metadata linked to musicbrainz ids
Python
8
star
56

saraga

The companion repository of Saraga collections, with a companion website, a dump of the dataset, documentation, utility scripts and python notebooks to access and interact with the dataset
Jupyter Notebook
7
star
57

pymtg

Python research utils that some of us use at the MTG and eventually everyone will use :)
Python
7
star
58

hands-free-sound-machine

Demo application for the MusicBricks project, combining Ircam's RIoT sensor with MTG's Freesound API.
Python
7
star
59

smc-2016

Beijing opera singing intonation analysis
Python
6
star
60

ChoralSynth

Jupyter Notebook
5
star
61

melon-playlist-dataset

5
star
62

essentia-models-extraction

Batch extractor for melspectrograms, embeddings, and activations for the Essentia models.
Python
5
star
63

IAM-tutorial-ismir22

Webbook source code for ISMIR 2022 Tutorial: Computational Methods for Supporting Corpus-Based Research on Indian Art Music
Jupyter Notebook
5
star
64

andalusian-corpus-notebooks

Python
4
star
65

mtg-jamendo-annotator

A web app for annotating the MTG Jamendo dataset
HTML
4
star
66

CIPI

Python
4
star
67

SymbTr-extras

Basic tools to manipulate the SymbTr-scores
Jupyter Notebook
4
star
68

kaldi

Kaldi installation
Shell
4
star
69

phonos-music-explorer

Web real-time application for interactively exploring a collection of music
Python
4
star
70

SingingChoralSepAnalyzeSynthRemix

Python
4
star
71

SymbTr-pdf

The symbTr-scores in pdf format
4
star
72

makam-symbolic-phrase-segmentation

Automatic Phrase Segmentation on symbolic scores for Ottoman-Turkish makam music
MATLAB
4
star
73

metaverse1-soundscape-rendering

This repository contains code and documentation for the soundscape rendering application developed at the Music Technology Group for a virtual tourism use case within the Metaverse1 project.
SuperCollider
3
star
74

essentia-builds

Docker images for building Essentia
Shell
3
star
75

essentia.js-benchmarks

Web app and scripts for benchmarking Essentia.js
JavaScript
3
star
76

otmm_symbolic_phrase_dataset

A training dataset of scores of Turkish makam music withphrase boundary annotations
Jupyter Notebook
3
star
77

autotagging-qa-playlists

Playlists for evaluation of auto-tagging models
Python
3
star
78

musav-annotator

A web app for annotating relative arousal/valence data
HTML
3
star
79

mtg-logos

Logos for projects by the MTG
3
star
80

otmm_section_dataset

The section test dataset for classical Ottoman-Turkish makam music
TeX
3
star
81

carnatikit

Common tools for the computational analysis of Carnatic Music
Python
3
star
82

essentia-models

Machine learning models used for the Essentia unit tests
PureBasic
3
star
83

melon-music-dataset

2
star
84

arab-andalusian-music

Jupyter Notebook
2
star
85

otmm_composition_identification_dataset

Composition Identification dataset for Ottoman-Turkish Makam Music
2
star
86

essentia-models-benchmark

Scripts to benchmark the Essentia Models
Python
2
star
87

Jingju-Scores-Analysis

A collection of tools for extracting statistical information from the Jingju Music Scores Collection
Python
2
star
88

carnatic-pitch-patterns

Python
1
star
89

beijing-opera-intonation

Python
1
star
90

content_based_singing_extraction

Python
1
star
91

Jingju-Lyrics-Collection

Python
1
star
92

searching_for_sancaras

Python
1
star
93

cmbrowser-orig

The original version of the CompMusic browser
JavaScript
1
star
94

music-ner

Musical Named Entity Recognition System for Twitter
Python
1
star
95

otmm_tuning_intonation_dataset

A dataset of Turkish makam music to test tuning and intonation analysis methodologies
Jupyter Notebook
1
star
96

dunya-makam-demo

The binaries and example recordings picked for Dunya-makam demo
MATLAB
1
star
97

amplab-jamendo-notebook

Notebook + Essentia for AMPLab 2020-2022 projects
Dockerfile
1
star
98

essentia-audio

Audio and other binary assets used for the Essentia unit tests
ChucK
1
star
99

acousticbrainz-mediaeval-baselines

Baselines for MediaEval AcousticBrainz Genre Task
Python
1
star
100

essentia-robustness-ismir2014

Scripts to evaluate the robustness of descriptors to different encodings and analysis parameters
R
1
star