Giter VIP home page Giter VIP logo

music-source-separation-training's Introduction

Music Source Separation Universal Training Code

Repository for training models for music source separation. Repository is based on kuielab code for SDX23 challenge. The main idea of this repository is to create training code, which is easy to modify for experiments. Brought to you by MVSep.com.

Models

Model can be chosen with --model_type arg.

Available models for training:

  1. Note 1: For segm_models there are many different encoders is possible. Look here.
  2. Note 2: Thanks to @lucidrains for recreating the RoFormer models based on papers.
  3. Note 3: For torchseg gives access to more than 800 encoders from timm module. It's similar to segm_models.

How to: Train

To train model you need to:

  1. Choose model type with option --model_type, including: mdx23c, htdemucs, segm_models, mel_band_roformer, bs_roformer.
  2. Choose location of config for model --config_path <config path>. You can find examples of configs in configs folder. Prefixes config_musdb18_ are examples for MUSDB18 dataset.
  3. If you have a check-point from the same model or from another similar model you can use it with option: --start_check_point <weights path>
  4. Choose path where to store results of training --results_path <results folder path>

Training example

python train.py \ 
    --model_type mel_band_roformer \ 
    --config_path configs/config_mel_band_roformer_vocals.yaml \
    --start_check_point results/model.ckpt \
    --results_path results/ \
    --data_path 'datasets/dataset1' 'datasets/dataset2' \
    --valid_path datasets/musdb18hq/test \
    --num_workers 4 \
    --device_ids 0

All training parameters are here.

How to: Inference

Inference example

python inference.py \  
    --model_type mdx23c \
    --config_path configs/config_mdx23c_musdb18.yaml \
    --start_check_point results/last_mdx23c.ckpt \
    --input_folder input/wavs/ \
    --store_dir separation_results/

All inference parameters are here.

Useful notes

  • All batch sizes in config are adjusted to use with single NVIDIA A6000 48GB. If you have less memory please adjust correspodningly in model config training.batch_size and training.gradient_accumulation_steps.
  • It's usually always better to start with old weights even if shapes not fully match. Code supports loading weights for not fully same models (but it must have the same architecture). Training will be much faster.

Code description

  • configs/config_*.yaml - configuration files for models
  • models/* - set of available models for training and inference
  • dataset.py - dataset which creates new samples for training
  • inference.py - process folder with music files and separate them
  • train.py - main training code
  • utils.py - common functions used by train/valid
  • valid.py - validation of model with metrics

Pre-trained models

If you trained some good models, please, share them. You can post config and model weights in this issue.

Vocal models

Model Type Instruments Metrics (SDR) Config Checkpoint
MDX23C vocals / other SDR vocals: 10.17 Config Weights
HTDemucs4 (MVSep finetuned) vocals / other SDR vocals: 8.78 Config Weights
Segm Models (VitLarge23) vocals / other SDR vocals: 9.77 Config Weights
Swin Upernet vocals / other SDR vocals: 7.57 Config Weights
BS Roformer (viperx edition) vocals / other SDR vocals: 10.87 Config Weights
MelBand Roformer (viperx edition) vocals / other SDR vocals: 9.67 Config Weights
MelBand Roformer (KimberleyJensen edition) vocals / other SDR vocals: 10.98 Config Weights

Note: Metrics measured on Multisong Dataset.

Single stem models

Model Type Instruments Metrics (SDR) Config Checkpoint
HTDemucs4 FT Drums drums SDR drums: 11.13 Config Weights
HTDemucs4 FT Bass bass SDR bass: 11.96 Config Weights
HTDemucs4 FT Other other SDR other: 5.85 Config Weights
HTDemucs4 FT Vocals (Official repository) vocals SDR vocals: 8.38 Config Weights
BS Roformer (viperx edition) other SDR other: 6.85 Config Weights
MelBand Roformer (aufr33 and viperx edition) crowd SDR crowd: 5.99 Config Weights
MelBand Roformer (anvuew edition) dereverb SDR dereverb: 7.56 Config Weights
BS Roformer (anvuew edition) dereverb SDR dereverb: 8.07 Config Weights
MelBand Roformer Denoise (by aufr33) denoise --- Config Weights
MelBand Roformer Denoise Aggressive (by aufr33) denoise --- Config Weights

Note: All HTDemucs4 FT models output 4 stems, but quality is best only on target stem (all other stems are dummy).

Multi-stem models

Model Type Instruments Metrics (SDR) Config Checkpoint
MDX23C * bass / drums / vocals / other MUSDB test avg: 7.15 (bass: 5.77, drums: 7.93 vocals: 9.23 other: 5.68) Multisong avg: 7.02 (bass: 8.40, drums: 7.73 vocals: 7.36 other: 4.57) Config Weights
BandIt Plus speech / music / effects DnR test avg: 11.50 (speech: 15.64, music: 9.18 effects: 9.69) Config Weights
HTDemucs4 bass / drums / vocals / other Multisong avg: 9.16 (bass: 11.76, drums: 10.88 vocals: 8.24 other: 5.74) Config Weights
HTDemucs4 (6 stems) bass / drums / vocals / other / piano / guitar Multisong (bass: 11.22, drums: 10.22 vocals: 8.05 other: --- piano: --- guitar: ---) Config Weights
Demucs3 mmi bass / drums / vocals / other Multisong avg: 8.88 (bass: 11.17, drums: 10.70 vocals: 8.22 other: 5.42) Config Weights
DrumSep htdemucs (by inagoy) kick / snare / cymbals / toms --- Config Weights
DrumSep mdx23c (by aufr33 and jarredou) kick / snare / toms / hh / ride / crash --- Config Weights
SCNet (by starrytong) * bass / drums / vocals / other Multisong avg: 8.87 (bass: 11.07, drums: 10.79 vocals: 8.27 other: 5.34) Config Weights

* Note: Model was trained only on MUSDB18HQ dataset (100 songs train data)

Dataset types

Look here: Dataset types

Augmentations

Look here: Augmentations

Citation

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

music-source-separation-training's People

Contributors

zfturbo avatar fmac2000 avatar dj-nuo avatar hunterhogan avatar marekkon5 avatar suc-driverold avatar kitsunex07 avatar lion-mod avatar anvuew avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.