musdb

A python package to parse and process the MUSDB18 dataset as part of the MUS task of the Signal Separation Evaluation Campaign (SISEC).

Download Dataset

The dataset can be downloaded here.

Installation

Decoding

As the MUSDB18 is encoded as STEMS, it relies on ffmpeg to read the multi-stream files. We provide a python wrapper called stempeg that allows to easily parse the dataset and decode the stem tracks on-the-fly. Before you install musdb (that includes the stempeg requirement), it is therefore required to install ffmpeg. The installation differ among operating systems.

E.g. if you use Anaconda you can install ffmpeg on Windows/Mac/Linux using the following command:

conda install -c conda-forge ffmpeg

Alternatively you can install ffmpeg manually as follows:

Mac: use homebrew: brew install ffmpeg
Ubuntu Linux: sudo apt-get install ffmpeg

Use a decoded version

If you have trouble installing stempeg or ffmpeg we also support parse and process the pre-decoded PCM/wav files. We provide docker based scripts to decode the dataset to wav files. If you want to use the decoded musdb dataset, use the is_wav parameter when initialsing the dataset.

musdb.DB(is_wav=True)

Package installation

You can install the musdb parsing package using pip:

pip install musdb

Usage

This package should nicely integrate with your existing python code, thus makes it easy to participate in the SISEC MUS tasks. The core of this package is calling a user-provided function that separates the mixtures from the MUS into several estimated target sources.

Providing a compatible function

The function will take an MUS Track object which can be used from inside your algorithm.
Participants can access:
Track.audio, representing the stereo mixture as an np.ndarray of shape=(nun_sampl, 2)
Track.rate, the sample rate
Track.path, the absolute path of the mixture which might be handy to process with external applications, so that participants don't need to write out temporary wav files.
The provided function needs to return a python Dict which consists of target name (key) and the estimated target as audio arrays with same shape as the mixture (value).
It is the users choice which target sources they want to provide for a given mixture. Supported targets are ['vocals', 'accompaniment', 'drums', 'bass', 'other'].
Please make sure that the returned estimates do have the same sample rate as the mixture track.

Here is an example for such a function separating the mixture into a vocals and accompaniment track:

def my_function(track):
    # get the audio mixture as
    # numpy array shape=(nun_sampl, 2)
    track.audio

    # compute voc_array, acc_array
    # ...

    return {
        'vocals': voc_array,
        'accompaniment': acc_array
    }

Creating estimates for SiSEC evaluation

Setting up musdb

Simply import the musdb package in your main python function:

import musdb

mus = musdb.DB(root_dir='path/to/musdb')

The root_dir is the path to the musdb dataset folder. Instead of root_dir it can also be set system-wide. Just export MUSDB_PATH=/path/to/musdb inside your terminal environment.

Test if your separation function generates valid output

Before processing the full MUS which might take very long, participants can test their separation function by running:

mus.test(my_function)

This test makes sure the user provided output is compatible to the musdb framework. The function returns True if the test succeeds.

Processing the full MUS

To process all 150 MUS tracks and saves the results to the folder estimates_dir:

mus.run(my_function, estimates_dir="path/to/estimates")

Processing training and testing subsets separately

Algorithms which make use of machine learning techniques can use the training subset and then apply the algorithm on the test data. That way it is possible to apply different user functions for both datasets.

mus.run(my_training_function, subsets="train")
mus.run(my_test_function, subsets="test")

Processing individual tracks

If you want to access individual tracks, e.g. to specify a validation dataset. You can manually load the track array before running your separation function.

# load the training tracks
tracks = mus.load_mus_tracks(subsets=['train'])
for track in tracks:
  print(track.name)

# use run with a subset of tracks
mus.run(my_validation_function, tracks=tracks[:10])

Instead of parsing the track list, musdb supports loading tracks by track name, as well:

tracks = mus.load_mus_tracks(tracknames=["PR - Oh No", "Angels In Amplifiers - I'm Alright"])

Access the reference signals / targets

For supervised learning you can use the provided reference sources by loading the track.targets dictionary. E.g. to access the vocal reference from a track:

track.targets['vocals'].audio

Use multiple cores

Python Multiprocessing

To speed up the processing, run can make use of multiple CPUs:

mus.run(my_function, parallel=True, cpus=4)

Note: We use the python builtin multiprocessing package, which sometimes is unable to parallelize the user provided function to PicklingError.

Full code Example

import musdb

def my_function(track):
    '''My fancy BSS algorithm'''

    # get the audio mixture as numpy array shape=(num_sampl, 2)
    track.audio

    # get the mixture path for external processing
    track.path

    # get the sample rate
    track.rate

    # return any number of targets
    estimates = {
        'vocals': vocals_array,
        'accompaniment': acc_array,
    }
    return estimates

# initiate musdb
mus = musdb.DB(root_dir="./Volumes/Data/musdb")

# verify if my_function works correctly
if mus.test(my_function):
    print "my_function is valid"

# this might take 3 days to finish
mus.run(my_function, estimates_dir="path/to/estimates")

Baselines

Please check examples of oracle separation methods. This will show you how oracle performance is computed, i.e. an upper bound for the quality of the separtion.cancel-led

Evaluation and Submission

Please refer to our Submission site.

Frequently Asked Questions

The mixture is not exactly the sum of its sources, is that intended?

This is not a bug. Since we adopted the STEMS format, we used AAC compression. Here the residual noise of the mixture is different from the sum of the residual noises of the sources. This difference does not significantly affect separation performance.

References

LVA/ICA 2018 publication t.b.a

titospadini / sigsep-mus-db Goto Github PK

sigsep-mus-db's Introduction