Giter VIP home page Giter VIP logo

dnn-based_source_separation's Issues

`hidden_channels` parameter in LSTM

Now, the hidden_channels is used like

import torch.nn as nn

if causal:
    lstm = nn.LSTM(num_features, hidden_channels, bidirectional=False)
else:
    lstm = nn.LSTM(num_features, hidden_channels//2, bidirectional=True)

in dprnn.py.

This configuration may be confusing.

Linear encoder.

Now, the encoder of TasNet requires a nonlinear function (enc_nonlinear). However, a linear encoder is used in the paper.

DPRNN-TasNet architecture

Questions:

  • Does DPRNN-TasNet require the bottleneck convolution?
  • separable and dilated option is unnecessary.

missing SPEAKERS.TXT for example

When running the 0. Preparation section of the Example in the Readme.md file, I get an error:
FileNotFoundError: [Errno 2] No such file or directory: '../../../dataset/SPEAKERS.TXT'
If I just create an empty file, ./prepare_librispeech.sh runs without error, but then ./train.sh gives an error ValueError: num_samples should be a positive integer value, but got num_samples=0
Please give some advice on the required content and format of SPEAKERS.TXT
Thank you.

Training for ORPIT.

For the training of ORPIT, the number of ground-truth sources is mixed in one batch.
e.g.)

sources_A # tensor with the shape (2, T)
sources_B # tensor with the shape (3, T)

We cannot concatenate them.

# in collate_fn
minibatch = torch.cat([sourcesA.unsqueeze(dim=0), sourcesB.unsqueeze(dim=0)], dim=0) # Error

How to handle them using ORPIT class in criterion/pit.py

Evaluation metrics

Evaluation of source separation by

  • SDR (improvement)
  • SIR (improvement)
  • SAR

These are realized by mir_eval.

Conv-TasNet Cumulative Layer Norm Bug?

shouldn't lines 78-92 be

`
step_sum = input.sum(dim=1) # -> (batch_size, T)
cum_sum = torch.cumsum(step_sum, dim=1) # -> (batch_size, T)

cum_num = torch.arange(C, C*(T+1), C, dtype=torch.float) # -> (T, ): [C, 2C, ..., TC]
cum_mean = cum_sum / cum_num # (batch_size, T)
cum_var = (cum_sum - cum_mean)**2/cum_num

cum_mean = cum_mean.unsqueeze(dim=1)
cum_var = cum_var.unsqueeze(dim=1)

output = (input - cum_mean) / (torch.sqrt(cum_var) + eps) * self.gamma + self.beta
`

according to the Conv-TasNet paper?

Join efforts?

Hi @tky823, nice repo !

We'd welcome most of this code in Asteroid if you'd like to contribute ๐Ÿ˜ƒ Would you?

Cheers,

Unstable training of DPRNN-TasNet

The network wouldn't be trained when

  • Layer normalization is applied before the bottleneck convolution.
  • Batch size / GPU is small (e.g. 1).
    • This phenomenon is observed in Conv-TasNet.

No bottleneck convolution seems to be better.

Separating a specific drum

Hi, great work! I am interested in separating a specific drum from tracks. We could provide training data. Would you be interested in a gig (could be everything from helping pointing out what needs to be done to adapt your code to take our music files as training to writing the code).

Thanks!

problem in google colab

I'm getting error when running cell 2 of all google colab notebook, how to solve? thanks.

|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 596 kB 8.8 MB/s 
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 963 kB 49.4 MB/s 
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 130 kB 74.6 MB/s 
Download CrossNet-Open-Unmix. (Dataset: MUSDB18, sampling frequency 44.1kHz)
Access denied with the following error:

 	Cannot retrieve the public link of the file. You may need to change
	the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

	 https://drive.google.com/uc?id=1yQC00DFvHgs4U012Wzcg69lvRxw5K9Jj 

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/content/DNN-based_source_separation/src/utils/utils.py", line 43, in download_pretrained_model_from_google_drive
    with zipfile.ZipFile(zip_path) as f:
  File "/usr/lib/python3.7/zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/53e252ff-8063-4c95-87ac-01fdaff0341b.zip'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
[<ipython-input-3-b990f481835a>](https://localhost:8080/#) in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'cd "/content/DNN-based_source_separation/egs/tutorials/x-umx"\n\n# Build environment\npip install -r requirements.txt -q\n\n# Download pretrained model\nmodel_name="musdb18"\n\n. ./prepare.sh --model_name "${model_name}"')

2 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in check_returncode(self)
    137     if self.returncode:
    138       raise subprocess.CalledProcessError(
--> 139           returncode=self.returncode, cmd=self.args, output=self.output)
    140 
    141   def _repr_pretty_(self, p, cycle):  # pylint:disable=unused-argument

CalledProcessError: Command 'cd "/content/DNN-based_source_separation/egs/tutorials/x-umx"

# Build environment
pip install -r requirements.txt -q

# Download pretrained model
model_name="musdb18"

. ./prepare.sh --model_name "${model_name}"' returned non-zero exit status 1.

Screenshot_1

Just a question

Close this off afterwards but is there such a thing as a targeted keyword separation as say voicefilter is targeted speech separation I just wondered if similar had been applied to a known word(s) than a specific voice spectra and if you knew of anything or maybe could point out possibles?

Finetuning

A script of finetuning for source separation with an unknown number of sources is required.

Model common usage

Hi tky823, thank you so much for your work, this is a great framework. I've just a quick question about the usage of the models. I see you provide Conv-TasNet & D3Net for a training over the MUSDB but not using the TasNet and more specificaly the DPRNNTasNet. Are theses models eventually working on other than pure speech signal? And, in your opinion, what is the best (current) model that cover musical signal? Thank you very much.

PESQ error

PESQ sometimes raises processing error. 443c020x_0.18686_447c0205_-0.18686_22go010j_0.wav may cause the error because the target 443c020x has too short utterance part.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.