tky823 / dnn-based_source_separation Goto Github PK

View Code? Open in Web Editor NEW

275.0 7.0 49.0 300.33 MB

A PyTorch implementation of DNN-based source separation.

Python 53.88% Shell 6.48% Jupyter Notebook 39.64%

source-separation speech-separation pytorch tasnet audio-separation conv-tasnet

dnn-based_source_separation's Issues

Bug of LSTM-TasNet

model.filterbank.GatedEncoder does NOT apply relu and sigmoid operation.

DNN-based_source_separation/src/models/filterbank.py

Lines 341 to 350 in d8e6235

 def forward(self, input): 

 eps = self.eps 

 norm = torch.norm(input, dim=2, keepdim=True) 

 x = input / (norm + eps) 

 x_U = self.conv1d_U(x) 

 x_V = self.conv1d_V(x) 

 output = x_U * x_V 

 return output

`hidden_channels` parameter in LSTM

Now, the hidden_channels is used like

import torch.nn as nn

if causal:
    lstm = nn.LSTM(num_features, hidden_channels, bidirectional=False)
else:
    lstm = nn.LSTM(num_features, hidden_channels//2, bidirectional=True)

in dprnn.py.

This configuration may be confusing.

Linear encoder.

Now, the encoder of TasNet requires a nonlinear function (enc_nonlinear). However, a linear encoder is used in the paper.

DPRNN-TasNet architecture

Questions:

Does DPRNN-TasNet require the bottleneck convolution?
separable and dilated option is unnecessary.

missing SPEAKERS.TXT for example

When running the 0. Preparation section of the Example in the Readme.md file, I get an error:
FileNotFoundError: [Errno 2] No such file or directory: '../../../dataset/SPEAKERS.TXT'
If I just create an empty file, ./prepare_librispeech.sh runs without error, but then ./train.sh gives an error ValueError: num_samples should be a positive integer value, but got num_samples=0
Please give some advice on the required content and format of SPEAKERS.TXT
Thank you.

Training for ORPIT.

For the training of ORPIT, the number of ground-truth sources is mixed in one batch.
e.g.)

sources_A # tensor with the shape (2, T)
sources_B # tensor with the shape (3, T)

We cannot concatenate them.

# in collate_fn
minibatch = torch.cat([sourcesA.unsqueeze(dim=0), sourcesB.unsqueeze(dim=0)], dim=0) # Error

How to handle them using ORPIT class in criterion/pit.py

Evaluation metrics

Evaluation of source separation by

SDR (improvement)
SIR (improvement)
SAR

These are realized by mir_eval.

Conv-TasNet Cumulative Layer Norm Bug?

shouldn't lines 78-92 be

`
step_sum = input.sum(dim=1) # -> (batch_size, T)
cum_sum = torch.cumsum(step_sum, dim=1) # -> (batch_size, T)

cum_num = torch.arange(C, C*(T+1), C, dtype=torch.float) # -> (T, ): [C, 2C, ..., TC]
cum_mean = cum_sum / cum_num # (batch_size, T)
cum_var = (cum_sum - cum_mean)**2/cum_num

cum_mean = cum_mean.unsqueeze(dim=1)
cum_var = cum_var.unsqueeze(dim=1)

output = (input - cum_mean) / (torch.sqrt(cum_var) + eps) * self.gamma + self.beta
`

according to the Conv-TasNet paper?

Join efforts?

Hi @tky823, nice repo !

We'd welcome most of this code in Asteroid if you'd like to contribute 😃 Would you?

Cheers,

Implementation of DPTNet.

This is under implementation in feature/dptnet.

Pseudo inverse for decoder architecture.

Implement decoder as a pseudo-inverse of an encoder considering the window function and optimal analysis.

Unstable training of DPRNN-TasNet

The network wouldn't be trained when

Layer normalization is applied before the bottleneck convolution.
Batch size / GPU is small (e.g. 1).
- This phenomenon is observed in Conv-TasNet.

No bottleneck convolution seems to be better.

Separating a specific drum

Hi, great work! I am interested in separating a specific drum from tracks. We could provide training data. Would you be interested in a gig (could be everything from helping pointing out what needs to be done to adapt your code to take our music files as training to writing the code).

Thanks!

The number of parameter is different.

In deep clustering, the number of trainable parameters is different from here.

Curriculum training for DANet

In the DANet paper, the curriculum training was used.

problem in google colab

I'm getting error when running cell 2 of all google colab notebook, how to solve? thanks.

|████████████████████████████████| 596 kB 8.8 MB/s 
     |████████████████████████████████| 963 kB 49.4 MB/s 
     |████████████████████████████████| 130 kB 74.6 MB/s 
Download CrossNet-Open-Unmix. (Dataset: MUSDB18, sampling frequency 44.1kHz)
Access denied with the following error:

 	Cannot retrieve the public link of the file. You may need to change
	the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

	 https://drive.google.com/uc?id=1yQC00DFvHgs4U012Wzcg69lvRxw5K9Jj 

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/content/DNN-based_source_separation/src/utils/utils.py", line 43, in download_pretrained_model_from_google_drive
    with zipfile.ZipFile(zip_path) as f:
  File "/usr/lib/python3.7/zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/53e252ff-8063-4c95-87ac-01fdaff0341b.zip'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
[<ipython-input-3-b990f481835a>](https://localhost:8080/#) in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'cd "/content/DNN-based_source_separation/egs/tutorials/x-umx"\n\n# Build environment\npip install -r requirements.txt -q\n\n# Download pretrained model\nmodel_name="musdb18"\n\n. ./prepare.sh --model_name "${model_name}"')

2 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in check_returncode(self)
    137     if self.returncode:
    138       raise subprocess.CalledProcessError(
--> 139           returncode=self.returncode, cmd=self.args, output=self.output)
    140 
    141   def _repr_pretty_(self, p, cycle):  # pylint:disable=unused-argument

CalledProcessError: Command 'cd "/content/DNN-based_source_separation/egs/tutorials/x-umx"

# Build environment
pip install -r requirements.txt -q

# Download pretrained model
model_name="musdb18"

. ./prepare.sh --model_name "${model_name}"' returned non-zero exit status 1.

Experiment configuration in DPRNN-TasNet.

In this paper,

The learning rate is decayed every two epochs.
Early stopping is applied if no best model is found in consecutive 10 epochs.

Prepare for the specific driver.py?

Just a question

Close this off afterwards but is there such a thing as a targeted keyword separation as say voicefilter is targeted speech separation I just wondered if similar had been applied to a known word(s) than a specific voice spectra and if you knew of anything or maybe could point out possibles?

parse_options.sh

For usability, parse_options.sh is required.

Finetuning

A script of finetuning for source separation with an unknown number of sources is required.

Implementation of D3Net

Reference: "D3Net: Densely connected multidilated DenseNet for music source separation"

K-means clustering

TODO:

batch operation of K-means clustering

Model common usage

Hi tky823, thank you so much for your work, this is a great framework. I've just a quick question about the usage of the models. I see you provide Conv-TasNet & D3Net for a training over the MUSDB but not using the TasNet and more specificaly the DPRNNTasNet. Are theses models eventually working on other than pure speech signal? And, in your opinion, what is the best (current) model that cover musical signal? Thank you very much.

PESQ error

PESQ sometimes raises processing error. 443c020x_0.18686_447c0205_-0.18686_22go010j_0.wav may cause the error because the target 443c020x has too short utterance part.

Use torch.nn.utils.rnn

In ORPIT training, torch.nn.utils.rnn may be useful.

	def forward(self, input):
	eps = self.eps

	norm = torch.norm(input, dim=2, keepdim=True)
	x = input / (norm + eps)
	x_U = self.conv1d_U(x)
	x_V = self.conv1d_V(x)
	output = x_U * x_V

	return output

tky823 / dnn-based_source_separation Goto Github PK

dnn-based_source_separation's Issues

Recommend Projects

Recommend Topics

Recommend Org