Giter VIP home page Giter VIP logo

ms-snsd's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ms-snsd's Issues

Different audio Lengths cause error broadcasting

if the passed arguments (clean&noise) were with different lengths, it would lead to the following error :
ValueError: operands could not be broadcast together with shapes

it could be solved by increasing the shorter length to be equal with the greater one like this

def snr_mixer(clean, noise, snr):
    clean_len = len(clean)
    noise_len = len(noise)
    if clean_len < noise_len:
        rep_time = int(np.floor(noise_len / audio_len))
        left_len = noise_len - clean_len * rep_time
        tmp = np.tile(clean, [1, rep_time])
        tmp.shape = (tmp.shape[1], )
        clean = np.hstack((tmp, clean[:left_len]))
        noise = np.array(noise)

    else:
        rep_time = int(np.floor(clean_len / noise_len))
        left_len = clean_len - noise_len * rep_time
        tmp = np.tile(noise, [1, rep_time])
        tmp.shape = (tmp.shape[1], )
        noise = np.hstack((tmp, noise[:left_len]))
        clean = np.array(clean)
    
    # Normalizing to -25 dB FS
    rmsclean = (clean**2).mean()**0.5
    scalarclean = 10 ** (-25 / 20) / rmsclean
    clean = clean * scalarclean
    rmsclean = (clean**2).mean()**0.5

    rmsnoise = (noise**2).mean()**0.5
    scalarnoise = 10 ** (-25 / 20) /rmsnoise
    noise = noise * scalarnoise
    rmsnoise = (noise**2).mean()**0.5
    
    # Set the noise level for a given SNR
    noisescalar = np.sqrt(rmsclean / (10**(snr/20)) / rmsnoise)
    noisenewlevel = noise * noisescalar
    noisyspeech = clean + noisenewlevel
    return clean, noisenewlevel, noisyspeech

clean, noisenewlevel, noisyspeech = snr_mixer(audio_org, noise_org, 2)

'noisescalar' derivation in clean speech and noise mix

Hi,

Thanks for sharing this open-source dataset. I am trying to apply this code to generate synthetic noisy datasets for speech processing. In my practice, I observed that the code-generated data has only half of SNR than the code nominated, which I tested from Audacity. After further checked the 'audiolib.py', I think the 'noisescalar' derivation (line 68) seems to be incorrect.

In the 'audiolib.py' code, the original code is:
noisescalar = np.sqrt(rmsclean / (10(snr/20)) / rmsnoise)**

Where I think the square root shall not be used for the noise scalar since the SNR is calculated based on RMS in the derivation, and it shall be corrected as below in the scaling of the noise level.
noisescalar = rmsclean / (10(snr/20)) / rmsnoise**

In my test, I got the synthetic noisy data with the correct SNR level after this correction. So could you please correct it in the code?

Same configuration as the reference paper

Hello,
We are looking for the configuration that was used for 'A scalable noisy speech dataset and online subjective test framework' paper? We could not find all the values such as SNR levels, noise types etc.
Maybe add it to the repo so that its really easy to reproduce the same setup.
Thank you in advance!

Real time noise suppression

Excellent article on VentureBeat today:
https://venturebeat.com/2020/04/09/microsoft-teams-ai-machine-learning-real-time-noise-suppression-typing/

Funny enough I've used this dataset (which I'm assuming you are referring to in the article) to also train noise suppression. I didn't have a requirement for real-time/streaming so I used a bidirectional LSTM recurrent layer. I also trained against Librispeech (technically LibriTTS as I wanted 24hz audio.)

Examples

Sourced from national news broadcasts to show performance against data it was NOT trained on. Audio files are compressed as GitHub doesn't allow raw waveform upload. I've provided the source files from the broadcast with _noisy.wav suffix and the predicted output from the network with the _clean.wav suffix.

Example 1

sequence 1585584_clean
sequence.1585584_.zip

Example 2

sequence 1597540_clean
sequence.1597540_.zip

Example 3

sequence 1046182_clean
sequence.1046182_.zip

Example 4

sequence 1597377_clean
sequence.1597377_.zip

Example 5

sequence 231_clean
sequence.231_.zip

Example 6

Not the best but still did a decent job suppressing a noise sample it was never trained against.
00049 unknown and_despite_that_and_despite_40_million_18_trump_haters_including_people_that_worked_for_hillary_clinton_and_some_of_the_worst_human_beings_on_earth_they_got_nothing_clean
trump_helicopter.zip

Masking-based methods

Hello, if I want to use a mask-based approach for speech enhancement, e.g. IBM, IRM, etc.

how should I use this dataset?

Maybe something wrong in audiolib.py

For the function def snr_mixer(clean, noise, snr) in the audiolib.py file, I think there is something wrong.
First, line 66 code: Function np.sqrt( ) may be unnecessary.
Second, as clean and noise have been normalized to -25 dBFS, noisescalar may not need to be calculated using rmsclean and rmsnoise.

Silence Removal Idea

Maybe a silence removal option could be added to be able to develop robust voice activity detection models. pyAudioAnalysis could be integrated for such purpose.

some bugs in audiolib

I think in audiolib file, line 66 code: noisescalar = np.sqrt (rmsclean / (10 ** (snr / 20)) / rmsnoise),
snr should be divided by 10, not 20.

noisyspeech_synthesizer.py fails to run with numpy 1.18.5

There seems to be a breaking issue while running the noisyspeech_synthesizer.py while running it on Google Colab.
Colab has numpy 1.18.5 at the time this issue was posted. This version is installed by default when connecting to a runtime.
On following the standard procedure and prerequisites for running this scipt, it gives the following error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/numpy/core/function_base.py", line 117, in linspace
    num = operator.index(num)
TypeError: 'float' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./MS-SNSD/noisyspeech_synthesizer.py", line 124, in <module>
    main(cfg._sections[args.cfg_str])
  File "./MS-SNSD/noisyspeech_synthesizer.py", line 47, in main
    SNR = np.linspace(snr_lower, snr_upper, total_snrlevels)
  File "<__array_function__ internals>", line 6, in linspace
  File "/usr/local/lib/python3.6/dist-packages/numpy/core/function_base.py", line 121, in linspace
    .format(type(num)))
TypeError: object of type <class 'float'> cannot be safely interpreted as an integer.

However, when the numpy version is downgraded from 1.18.5 to 1.16.4, the script works perfectly fine as it is supposed to.

noisyspeech_synthesizer.py always slices from the start of the noise array

In noisyspeech_synthesizer.py, an array of audio samples are read from a noise file (line 78). On line 81, a slice of the noise array is taken from index 0 to len(clean) as:

noise = noise[0:len(clean)]

By always starting at index 0, in the case where the clean speech arrays are roughly the same length (~16000 samples) as in the speech commands case, it means that the number of unique noise arrays we see is equal to the number of noise files.

Even if we have one noise file with 10 hours of audio, we may only ever make use of the first 1 second of this data.

It would be better to pick a random starting index within the noise array from which to take a slice. For example
start_idx = np.random.randint(low=0, high=len(noise)-len(clean), size=1)
noise = noise[start_idx : start_idx+len(clean)]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.