tky823 / dnn-based_source_separation Goto Github PK
View Code? Open in Web Editor NEWA PyTorch implementation of DNN-based source separation.
A PyTorch implementation of DNN-based source separation.
model.filterbank.GatedEncoder
does NOT apply relu
and sigmoid
operation.
DNN-based_source_separation/src/models/filterbank.py
Lines 341 to 350 in d8e6235
Now, the hidden_channels
is used like
import torch.nn as nn
if causal:
lstm = nn.LSTM(num_features, hidden_channels, bidirectional=False)
else:
lstm = nn.LSTM(num_features, hidden_channels//2, bidirectional=True)
in dprnn.py
.
This configuration may be confusing.
Now, the encoder of TasNet requires a nonlinear function (enc_nonlinear
). However, a linear encoder is used in the paper.
Questions:
separable
and dilated
option is unnecessary.When running the 0. Preparation section of the Example in the Readme.md file, I get an error:
FileNotFoundError: [Errno 2] No such file or directory: '../../../dataset/SPEAKERS.TXT'
If I just create an empty file, ./prepare_librispeech.sh
runs without error, but then ./train.sh
gives an error ValueError: num_samples should be a positive integer value, but got num_samples=0
Please give some advice on the required content and format of SPEAKERS.TXT
Thank you.
For the training of ORPIT, the number of ground-truth sources is mixed in one batch.
e.g.)
sources_A # tensor with the shape (2, T)
sources_B # tensor with the shape (3, T)
We cannot concatenate them.
# in collate_fn
minibatch = torch.cat([sourcesA.unsqueeze(dim=0), sourcesB.unsqueeze(dim=0)], dim=0) # Error
How to handle them using ORPIT
class in criterion/pit.py
Evaluation of source separation by
These are realized by mir_eval
.
shouldn't lines 78-92 be
`
step_sum = input.sum(dim=1) # -> (batch_size, T)
cum_sum = torch.cumsum(step_sum, dim=1) # -> (batch_size, T)
cum_num = torch.arange(C, C*(T+1), C, dtype=torch.float) # -> (T, ): [C, 2C, ..., TC]
cum_mean = cum_sum / cum_num # (batch_size, T)
cum_var = (cum_sum - cum_mean)**2/cum_num
cum_mean = cum_mean.unsqueeze(dim=1)
cum_var = cum_var.unsqueeze(dim=1)
output = (input - cum_mean) / (torch.sqrt(cum_var) + eps) * self.gamma + self.beta
`
according to the Conv-TasNet paper?
Hi @tky823, nice repo !
We'd welcome most of this code in Asteroid if you'd like to contribute ๐ Would you?
Cheers,
This is under implementation in feature/dptnet
.
Implement decoder as a pseudo-inverse of an encoder considering the window function and optimal analysis.
The network wouldn't be trained when
No bottleneck convolution seems to be better.
Hi, great work! I am interested in separating a specific drum from tracks. We could provide training data. Would you be interested in a gig (could be everything from helping pointing out what needs to be done to adapt your code to take our music files as training to writing the code).
Thanks!
In deep clustering, the number of trainable parameters is different from here.
In the DANet paper, the curriculum training was used.
I'm getting error when running cell 2 of all google colab notebook, how to solve? thanks.
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 596 kB 8.8 MB/s
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 963 kB 49.4 MB/s
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 130 kB 74.6 MB/s
Download CrossNet-Open-Unmix. (Dataset: MUSDB18, sampling frequency 44.1kHz)
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=1yQC00DFvHgs4U012Wzcg69lvRxw5K9Jj
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/content/DNN-based_source_separation/src/utils/utils.py", line 43, in download_pretrained_model_from_google_drive
with zipfile.ZipFile(zip_path) as f:
File "/usr/lib/python3.7/zipfile.py", line 1240, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/53e252ff-8063-4c95-87ac-01fdaff0341b.zip'
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
[<ipython-input-3-b990f481835a>](https://localhost:8080/#) in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'cd "/content/DNN-based_source_separation/egs/tutorials/x-umx"\n\n# Build environment\npip install -r requirements.txt -q\n\n# Download pretrained model\nmodel_name="musdb18"\n\n. ./prepare.sh --model_name "${model_name}"')
2 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in check_returncode(self)
137 if self.returncode:
138 raise subprocess.CalledProcessError(
--> 139 returncode=self.returncode, cmd=self.args, output=self.output)
140
141 def _repr_pretty_(self, p, cycle): # pylint:disable=unused-argument
CalledProcessError: Command 'cd "/content/DNN-based_source_separation/egs/tutorials/x-umx"
# Build environment
pip install -r requirements.txt -q
# Download pretrained model
model_name="musdb18"
. ./prepare.sh --model_name "${model_name}"' returned non-zero exit status 1.
In this paper,
Prepare for the specific driver.py
?
Close this off afterwards but is there such a thing as a targeted keyword separation as say voicefilter is targeted speech separation I just wondered if similar had been applied to a known word(s) than a specific voice spectra and if you knew of anything or maybe could point out possibles?
For usability, parse_options.sh
is required.
A script of finetuning for source separation with an unknown number of sources is required.
TODO:
Hi tky823, thank you so much for your work, this is a great framework. I've just a quick question about the usage of the models. I see you provide Conv-TasNet & D3Net for a training over the MUSDB but not using the TasNet and more specificaly the DPRNNTasNet. Are theses models eventually working on other than pure speech signal? And, in your opinion, what is the best (current) model that cover musical signal? Thank you very much.
PESQ sometimes raises processing error
. 443c020x_0.18686_447c0205_-0.18686_22go010j_0.wav
may cause the error because the target 443c020x
has too short utterance part.
In ORPIT training, torch.nn.utils.rnn may be useful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.