Giter VIP home page Giter VIP logo

Comments (3)

jtkim-kaist avatar jtkim-kaist commented on May 18, 2024
  1. VAD and speech enhancement (SE) dataset must be different. The reason is that SE assumes that incoming signal is always noisy speech not noise only. We also have exploited the VAD dataset (first approach as you mentioned) to train the SE model, however, as expected, failed to train (The problem to solve become more hard because of the noise only segments). In contrast, for the VAD dataset, the ratio between noise only and noisy speech segments should be almost equal in order to prevent the class imbalance problem. These are the reason why we follow different methods to make the VAD and SE dataset. To make the VAD dataset, just follow the way you mentioned "As per your comment in one of the closed issue, you mentioned that you concatenate different sound effects to make one long sound wave containing noises and then pick a random speech utterance and add that speech utterance to noise files at various SNRs until the end of the long sound wave of noises." Additionally YOU MUST VERIFY THE RATIO BETWEEN NOISE (labeled to 0) AND SPEECH SEGMENTS (labeled to 1), ideally, 1 : 1 is the best.

  2. Both the fant tool and v_add_noise.m in voicebox (implemented by MATLAB) follow the ITU standard. Therefore, according to my experiments, they didn't show significant difference so that I prefer to use voicebox because of easy implementation. Use anything you want. The make_train_noisy.m is ONLY for the speech enhancement toolkit.

from vad.

shezanmirzan avatar shezanmirzan commented on May 18, 2024

I have one more doubt, I would be grateful if you help me out with this. Suppose even I create a long file containing noise of 40 mins and suppose the speech utterance is of 1 minute. On using v_addNoise, it just gives me an output noisy speech of 1 minute in which a noise interval is randomly picked up from the long noise file and added to the speech.

According to you, we however need to add the same 1 minute file 40 times to the long noise file at different SNRs. How do we do that using V_AddNoise? Is it even possible using v_addNoise or should I try with the FaNT tool?

However, a big thanks for clearing my above doubt.

from vad.

jtkim-kaist avatar jtkim-kaist commented on May 18, 2024

the inputs for vaddnoise.m should be
step 1
noise(1:length(speech1)), speech1
step 2
noise(length(speech1)+1 :length(speech1)+1 + length(speech2) ), speech2

from vad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.