vbelz / speech-enhancement Goto Github PK

View Code? Open in Web Editor NEW

618.0 618.0 124.0 47.18 MB

Deep learning for audio denoising

License: MIT License

Python 100.00%

cnn deep-learning speech unet

speech-enhancement's People

Contributors

Stargazers

Watchers

Forkers

zhengjing8628 karlzheng awesome-archive allensmile xuerenjie124 xinkez betegon road2018 dung-n-tran yfliao mxe191 toukihei huanhaoa wonderwrj hexin-yu sankeerth26 aze0056 punkcure zrq-7 ashishpatel26 pablodz herryfan zhoujintao990131 sundy1219 xjia520 markusbuchholz jacktheripper-17 bsrikar zhanyongsheng dendisuhubdy ahlas dfzhizi taiespol earthjade96 aitalk ashwinijha6 nasos-anagnostou runngezhang sunnyinai neuroai-pi chenmalobani tarzanzcx annamhuber engineer1999 nightmoonbridge makinglong crinzler kdesai2018 jonathantryon devin178 oucxlw yizhao996 zuowanbushiwo youngjay0612 daipi2020 sciai-ai tollanador nmd2k imfinee juwon29 hoaileba fcrome justforname sonyeric xj-martin nickpyy samsgates keshavbhandari yy199891 sushantaarchana junjie2008v rusdisaga techthiyanes racerchen markhsia ai-x-king junaidiqbalsyed omkar-sutar tianhualefei t110368032hsu-chiente herochengm ashdyh1999 zxynbnb ionite34 leyangxing sumarniportofolio zcy618 harithoppil taoufikizem vikneo2017 mohamedantergad baekms akihiro-inui holdenzll ashish-nehra jokerxu924 tuong-olli mwx123456789 khalida1wwin accucim

speech-enhancement's Issues

Hi, do you have a paper in arxiv?

The lack of documentation

hello
I really need your helpppppp.
When i run the main.py, It has error [Errno 2] No such file or directory: './Train/sound/noisy_voice_long.wav'
I want to know what's "noisy_voice_long.wav / noise_long.wav / voice.wav ” and how do i get it.
plz answer me

Parser takes only first character from the filename and says “File not found”

I know it's very difficult to understand my issue, but I'll try my best to explain.
So I've cloned a repository from Github and working on it.
When I run the program without any arguments, it works fine
python main.py --audio_input_prediction works fine.
But when I try to pass my own file, it shows an error.
python main.py --audio_input_prediction myaudio.wav shows an error saying "FileNotFoundError: [Errno 2] No such file or directory: '<File Path/m'"
Notice how it only takes the first character from my argument?
In the code, for default mode, it's something like:
(args.py)parser.add_argument('--audio_input_prediction', default=['default_audio.wav'], type=list) and it works fine.
So naturally, I tried to add '[]' to my file name
python main.py --audio_input_prediction [myaudio.wav] Shows an error too which says "FileNotFoundError: [Errno 2] No such file or directory: '<File Path/['"
See, here it took only the first character of my provided argument i.e. '['
And the file IS there. No spelling mistakes in file name either. Any help would be very much appreciated.

In my conclusion, the issue is in args.py, specifically in this line `parser.add_argument('--audio_input_prediction', default=['noisy_voice_long_t2.wav'], type=list)'.
I even tried to change the type to 'str', but still, I got the same error

thanks for this project,and here are some solutions of the problems i met with. for reference only.

1.windows anaconda3 ,versions of some packages
librosa == 0.6.0
keras == 2.3.1
numba == 0.48.0
2. some porblems and solutions
original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'

pip install h5py==2.10

audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError

conda install ffmpeg

Python 2 or 3 ?

Hi, should we use python2.x or python 3.x in order to make it run?
I ask you because I am going through some issues installing the requirements and maybe it has to do with the python version.

Thanks for your work!

General questions

HI @vbelz ,

First of all, thankyou for your work, I have tried to denoise some audio and it worked so good, but I have a few questions

Quoted from README:

Specify how many frames you want to create as nb_samples in args.py (or pass it as argument from the terminal) I let nb_samples=50 by default for the demo but for production I would recommend having 40 000 or more.

1. What is exactly nb_samples?

2. Are the weights provided by you from nb_samples=50?

3. Should I resample audio to be 8KHz for denoising or is it done inside the network? Also, should I do it for training?

4. I want to twerk it to be a better denoiser for background noise rather than specific sounds. What are your thoughts on this? I have a dataset with clean samples and background noise samples. Will it work if it train it? Which hyperparameter should I use?

Thank you so much and sorry for bothering you!

Update requirements.txt with librosa version 0.6.x

Yesterday, i used the model for PREDICTION with librosa 0.8.0 in my conda environment.

In this 0.8.0 version librosa.output.write_wav (line 55 in prediction_denoise.py for example) is deprecated (cf. stackoverflow#63997969).

So librosa==0.6 has to be specified in requirements.txt.

global scaling

Hey @vbelz,
first, thank you for sharing this project, it helps me a lot!
There is one thing I didn't understand and that's the global scaling of matrix_spec (and inverse global scaling).
How did you choose the numbers for scaling? and why there is different scaling for X_in and X_ou?

How to train for a different audio sampling rate?

Hi vbelz,

I am wondering what changes need to be made in order to train for a different audio sampling rate, e.g. at 44100 Hz?

I assume both the model and some parameters in args.py need to be modified. Can you please share some insights on this?

Thanks,
Tony

[BUG]: Validation against test data

Training error

At line 60, as mentioned here you're validating against test data while training ? Isn't it supposed to be train data?

python3 history = generator_nn.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, shuffle=True, callbacks=[checkpoint], verbose=1, validation_data=(X_test, y_test))

Error (tensorflow)

I am getting the following error for tensorflow module:

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

please help me out to get rid of this error.

Thank you.

Could you share your best model?

Excuse me sir. Since i don't have such gpu resources. I'd like to ask for your best h5 model. Thanks

Is this project unsupervised learning?

Hi,@vbelz, I have a problem.
Is this project unsupervised learning? There are 10 kinds of noise collected in this project. I originally thought that 10 models could be seen in the weight folder, but I only saw two models, model_best and model_unet.

thanks.

Inference pipeline

Hello,

The model does a great job of removing the noise. However I notice that the speech quality is degraded.

For testing, I changed X_denoise = m_amp_db_audio - inv_sca_X_pred[:,:,:,0] to X_denoise = m_amp_db_audio.

I was expecting the original audio file. FYI, my input is a mono channel 16000 wav file. Can you please help me. I am guessing I need to change some parameters other than the sample rate.

Question on Error of Invalid Instruction (core dumped)

Hey @vbelz I had a question :
While running the python main.py --mode='data_creation' I get the error Invalid Instruction (core dumped)
I guess it because of the tensorflow version (1.15.2) as my cpu does not support AVX
But it would not give an error if I use tensorflow version (1.5)

If I want to use the same version of tensorflow that is 1.15.2 what could be an alternative?

MemoryError

Hello blogger, I encountered an error: MemoryError: Unable to allocate array with shape (20000, 128, 128) and data, where I changed 40000 to 20000, but there is still this issue. I would like to ask if this is due to excessive training set data or NB_ The samples are too large

Why extracted windows are slightly above 1 second?

First of all, thank you so much for this repository. I am doing some research in the speech domain, and this has been very helpful.

But, I have some doubts regarding the same.

Why extracted windows are slightly above 1 second and not exactly 1 second?
Can this 1 second be increased to more number of seconds? How will this affect the training?

Thanks in advance.

Update requirements.txt and there is no json file outputed

Hello there!
So I had a journey installing all dependencys and python it would be great if you update the requirements.txt file! Iam guessing that my problem is cussed by the worng version of tensorflow.
But my problem is that after training there won't be any .json file outputed in any folder
It would be great if you help me 😃

Question about the inputs and the outputs of the model

Hey,
First of all your code is great! it worked for me and it is very simple and clear 👍
One question - in your model you used Xin to be spectogram(noisy_voice) and Xout is spectogram(noisy_voice) - spectogram(voice). I didn't understand why did you do the substruction so I tried to take Xout to be spectogram(voice), but then I got underfitted loss. Do you know why that happens?

Thanks again!
Olga :)

a little confusion

Hey!
I have some confusion about the computing process. The input audio whose size is 112501KB, gets an output of 112486KB. Could you tell me the reason and the operations about the audio throughout the prediction?

Thank you