As per your comment in one of the closed issue, you mentioned that you concatenate dif

VAD and speech enhancement (SE) dataset must be different

Training Datamake about vad HOT 3 OPEN

jtkim-kaist commented on May 18, 2024

Training Datamake

from vad.

Comments (3)

jtkim-kaist commented on May 18, 2024

VAD and speech enhancement (SE) dataset must be different. The reason is that SE assumes that incoming signal is always noisy speech not noise only. We also have exploited the VAD dataset (first approach as you mentioned) to train the SE model, however, as expected, failed to train (The problem to solve become more hard because of the noise only segments). In contrast, for the VAD dataset, the ratio between noise only and noisy speech segments should be almost equal in order to prevent the class imbalance problem. These are the reason why we follow different methods to make the VAD and SE dataset. To make the VAD dataset, just follow the way you mentioned "As per your comment in one of the closed issue, you mentioned that you concatenate different sound effects to make one long sound wave containing noises and then pick a random speech utterance and add that speech utterance to noise files at various SNRs until the end of the long sound wave of noises." Additionally YOU MUST VERIFY THE RATIO BETWEEN NOISE (labeled to 0) AND SPEECH SEGMENTS (labeled to 1), ideally, 1 : 1 is the best.
Both the fant tool and v_add_noise.m in voicebox (implemented by MATLAB) follow the ITU standard. Therefore, according to my experiments, they didn't show significant difference so that I prefer to use voicebox because of easy implementation. Use anything you want. The make_train_noisy.m is ONLY for the speech enhancement toolkit.

from vad.

shezanmirzan commented on May 18, 2024

I have one more doubt, I would be grateful if you help me out with this. Suppose even I create a long file containing noise of 40 mins and suppose the speech utterance is of 1 minute. On using v_addNoise, it just gives me an output noisy speech of 1 minute in which a noise interval is randomly picked up from the long noise file and added to the speech.

According to you, we however need to add the same 1 minute file 40 times to the long noise file at different SNRs. How do we do that using V_AddNoise? Is it even possible using v_addNoise or should I try with the FaNT tool?

However, a big thanks for clearing my above doubt.

from vad.

jtkim-kaist commented on May 18, 2024

the inputs for vaddnoise.m should be
step 1
noise(1:length(speech1)), speech1
step 2
noise(length(speech1)+1 :length(speech1)+1 + length(speech2) ), speech2

from vad.

Recommend Projects

Training Datamake about vad HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent