Thanks for sharing the code! It's a really great work of audio source separation!

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The preprocessing results in less than 20000 mixtures in training set of wsj0-mix2. about two_step_mask_learning HOT 9 CLOSED

etzinis commented on July 17, 2024

The preprocessing results in less than 20000 mixtures in training set of wsj0-mix2.

from two_step_mask_learning.

Comments (9)

etzinis commented on July 17, 2024

Yes actually for the data augmentation experiments (where I am onling mixing the audio sources) I needed all the files to have at least 4 secs of audio so I chopped them. In the WSJ case this is not needed. I will try to push a newer and cleaner version of this code after I am done with some deadlines that I have.

Glad that you liked the work and I would be happy to answer other questions, if you have any! :)

from two_step_mask_learning.

etzinis commented on July 17, 2024

Section 3.2.1. refers to the WSJ case only. With no online mixing so you can use all mixtures after zero-padding.

from two_step_mask_learning.

etzinis commented on July 17, 2024

Sorry for this confusion but I wanted to match the experiments from other works but also create the 4sec online mixing procedure described in 3.2.2 in the paper.

from two_step_mask_learning.

fjiang9 commented on July 17, 2024

@etzinis Thank you so much for your kind reply! I am still running the speech separation experiment code. Looking forward to your new release : )

from two_step_mask_learning.

etzinis commented on July 17, 2024

Because as I said before, I will not be able to put out the new release until I am finished with some urgent things. I would suggest you to just create the WSJ with the matlab script provided here: http://wham.whisper.ai/README.html and then use my script with wav_timelength=4s https://github.com/etzinis/two_step_mask_learning/blob/master/two_step_mask_learning/utils/preprocess_wsj0mix.py

I will leave this issue open in order to fix it on the new release.

from two_step_mask_learning.

etzinis commented on July 17, 2024

So I have rechecked what you said and it seems that indeed when using the 'max' folder from the wsj2mix dataset you have the following files created:
Training: 19855
Testing: 2988
Validation: 4980

and this is caused because as you said in lines 139-140 I have discarded the files with a duration lower than 4secs. However, this amount is like neglecting 0.7% on the training dataset and 0.4% on the testing and validation which I consider is negligible compared to the total size of the dataset. Moreover zeros do not contribute to any SI-SDR loss so either-way I am just making my configuration a tiny bit harder than the initial setup. If you want to just use the remaining 0.4% as well you can just zero pad in 139-140 lines.

I close this issue for now.

from two_step_mask_learning.

etzinis commented on July 17, 2024

I have also added the code of padding now in the corresponding lines so the output distribution of samples will be:
Training: 20000
Testing: 3000
Validation: 5000

from two_step_mask_learning.

etzinis commented on July 17, 2024

Thanks for noticing that @flyjiang92 🍺 😃

from two_step_mask_learning.

fjiang9 commented on July 17, 2024

@etzinis Thank you so much for your response and the code updating! 👍

from two_step_mask_learning.

The preprocessing results in less than 20000 mixtures in training set of wsj0-mix2. about two_step_mask_learning HOT 9 CLOSED

Comments (9)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent