Giter VIP home page Giter VIP logo

tonet's Introduction

TONet

Introduction

The official implementation of "TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music", in ICASSP 2022

We propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. Any CFP-input-based Model can be settled in TONet and lead to possible better performance.

TONet Architecture

Main Results on Extraction Performance

Experiments are done to verify the capability of TONet with various baseline backbone models. Our results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets -- with substantial gains in octave and tone accuracy.

Results

Getting Started

Download Datasets

After downloading the data, use the txt files in the data folder, and process the CFP feature by feature_extraction.py.

Overwrite the Configuration

The config.py contains all configurations you need to change and set.

Train and Evaluation

python main.py train

python main.py test

Produce the Estimation Digram

Uncomment the write prediction in tonet.py

Estimation

Model Checkpoints

We provide the best TO-FTANet checkpoints in this link. More checkpoints will be uploaded.

Citing

@inproceedings{tonet-ke2022,
  author = {Ke Chen, Shuai Yu, Cheng-i Wang, Wei Li, Taylor Berg-Kirkpatrick, Shlomo Dubnov},
  title = {TONet: Tone-Octave Network for Singing Melody Extraction  from Polyphonic Music},
  booktitle = {{ICASSP} 2022}
}

tonet's People

Contributors

retrocirce avatar zexuehe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tonet's Issues

soundfile.LibsndfileError: Error opening 'data/wav/tammy_1_07.wav': System error.

CFP process in data/wav/tammy_1_07.wav ... (It may take some times)
Traceback (most recent call last):
File "feature_extraction.py", line 269, in
W, _, _ = cfp_process(wavpath, sr=8000)
File "feature_extraction.py", line 216, in cfp_process
y, sr = load_audio(fpath, sr=sr)
File "feature_extraction.py", line 159, in load_audio
x, fs = sf.read(filepath)
File "/home/zhangjunwei/miniconda3/envs/TONet/lib/python3.8/site-packages/soundfile.py", line 285, in read
with SoundFile(file, 'r', samplerate, channels,
File "/home/zhangjunwei/miniconda3/envs/TONet/lib/python3.8/site-packages/soundfile.py", line 658, in init
self._file = self._open(file, mode_int, closefd)
File "/home/zhangjunwei/miniconda3/envs/TONet/lib/python3.8/site-packages/soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'data/wav/tammy_1_07.wav': System error.

Hello, I encountered the above error when dealing with the CFP function in the first step.Is this a data error or a problem with my system?
Can someone help me?

No f0ref in MIR-1K and Medley-DB dataset

I tried to read your code. In feature_extraction.py, line 262, f0path is given. But I did not found any .txt file or f0ref file from MIR1K and Medley-DB dataset. I know this f0path is the label path with time and frequency but I dont know how this F0ref was obtained. Also, as we know the data format for Mir-1K and Medley-DB is different, how to pre-process for this both data to extract the feature for training?

A question about the octave

Hello, I see octave_class = 8 in the config.py file and use octave_index = (gds // 60 + 2).long() to calculate octave in the tonet.py. I would like to ask the reason for +2.
Thanks!!

The dataset ref file is missing

Hello, I want to reproduce your paper, but I can't download the mir-1k dataset with ref file from the official website and I can't get the ref file of medleydb dataset. I want to ask you if there is any way to obtain these missing files? thank you.

tammy_1_07.npy

Traceback (most recent call last):
File "feature_extraction.py", line 270, in
np.save(magfile, W)
File "<array_function internals>", line 6, in save
File "/home/zhangjunwei/miniconda3/envs/TONet/lib/python3.7/site-packages/numpy/lib/npyio.py", line 525, in save
file_ctx = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: 'data/cfp_360_new/tammy_1_07.npy'

Hello, I ran into the following problem when running the feature_extraction.py, it seems that the npy file is missing from the data, I didn't find it in the MIR-1K dataset, I don't know where this file was obtained from, can someone help me?

Frequency spectrum

Hello, Senior Chen Ke!
I have a question I would like to ask you. Your work has helped me a lot.
As for the characteristics of CFP, I really want to check the spectrum of CFP, but I don't know how to print it out, so I want to consult the predecessors.
Could you please send me the relevant code for printing the CFP spectrum? If you really can, I really appreciate the predecessors.I look forward to hearing from you!

Hey,Chen

I got a problem in testing the model like that: TypeError : test_step() missing 1 required positional argument: 'dataset_idx', and I can't fix it. Could you please help me fix it? This problem is in model/tonet.py, 454 row. Thank you very much.

The problem of multi-GPU training

Hi there,

When I training the model with multi-GPU by setting gpus=2 in pl.Trainer(), it throws an error:
TypeError: cannot pickle 'module' object.
How can I solve this problem? Thanks!

    ...
    trainer = pl.Trainer(
        deterministic = True,
        gpus = 2, # <---------
        checkpoint_callback = False,
        max_epochs = config.max_epoch,
        auto_lr_find = True,
        sync_batchnorm=True,
        # check_val_every_n_epoch = 1,
        val_check_interval = 0.25,
    )
    ...

python 3.8.3
torch '1.7.1+cu110'
Ubuntu 18.04.5 LTS

Global seed set to 19961206
Data List: data/test_adc.txt
Song Size: 12
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00,  9.73it/s]
[W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and  functionality may change in the future. (function operator())
/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:849: UserWarning: You requested multiple GPUs but did not specify a backend, e.g. `Trainer(strategy="dp"|"ddp"|"ddp2")`. Setting `strategy="ddp_spawn"` for you.
  rank_zero_warn(
/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:147: LightningDeprecationWarning: Setting `Trainer(checkpoint_callback=False)` is deprecated in v1.5 and will be removed in v1.7. Please consider using `Trainer(enable_checkpointing=False)`.
  rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
Traceback (most recent call last):
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data4/chengfang/.vscode-server/extensions/ms-python.python-2022.0.1814523869/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/data4/chengfang/.vscode-server/extensions/ms-python.python-2022.0.1814523869/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/data4/chengfang/.vscode-server/extensions/ms-python.python-2022.0.1814523869/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data4/chengfang/project/melodyExtraction/TONet/main.py", line 163, in <module>
    train()
  File "/data4/chengfang/project/melodyExtraction/TONet/main.py", line 97, in train
    trainer.fit(model, train_dataloader, test_dataloaders)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 173, in start_training
    self.spawn(self.new_process, trainer, self.mp_queue, return_result=False)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 201, in spawn
    mp.spawn(self._wrapped_function, args=(function, args, kwargs, return_queue), nprocs=self.num_processes)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 148, in start_processes
    process.start()
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'module' object

Doubt about harmonics

It is mentioned in the paper "Due to the local receptive field of CNN, most existing methods [7, 8, 9, 10] cannot capture harmonic relationships well.", I lack the knowledge in this regard, if stacking more CNNs, is it possible capture harmonic relationships?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.