breizhn / dtln Goto Github PK

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

License: MIT License

Python 100.00%

noise-reduction deep-learning audio real-time-audio audio-processing noise-suppression tensorflow dns-challenge dtln-model speech-denoising

dtln's People

Contributors

Stargazers

Watchers

Forkers

ashishpatel26 xixirupan cybermax-008 gurbaaz27 ismita98 sunilsivadas kubasiak hiyoung-asr hyungui abhinavm24 rafle0 normonisping punkcure wang-asher shetu1994 rafeal23 fmbao rungalad rbozydar lihao0214 ahlas okrio ardhitama gxu82 anupamme cloudchenl anigi98932 wendongj shobhit-agarwal dewanggogte entn-at youngjay0612 taalua wangtianrui templeblock wac81 omar-fouad xzm2004260 jihwanparkpreprocessing azhiltz networkedaudio alfa1210 gatsbychen msaad1311 dgsivan liroda acbdef123 xiaowei-coder marvin-nj ishine syljoy sciai-ai iamweiweishi ajithprinc rxhmdia titoruiz bozonhiggsa sanebow intflow chungyehwangai ryuk17 lesanpi unanan zelokuo subhroc173 joshih-cad yongyug scofir fatehsinghiit zhaoliang1983x wangbq18 rogervaas redearly123 erridan23 jangsooyoung maxmax2016 heping236 haikefengw samiulshuvo shenhark stuartiannaylor shamsnafisaali hongfei-niu pb-001 omar-kitegames weishanyi sorangnl0 dennistang742 joaquinsun yaoao2017 agangzz hdubey dariadiatlova has-n fragrantrookie jinmingche yin-zhang wendonggan avinash-glitch olegjakushkin

dtln's Issues

How to run on Raspberry Pi?

Hi Nils, I'm trying to run your model on Raspberry Pi, however it seems miniconda doesn't support python3.7 yet. Should I degrade to python3.6 or install packages manually? Thanks.

problem in running fixed size batch processing with realtime_processing_tflite.py

it gives different outputs with different batch sizes.
Input with batch size 1 gives a decent output.
but with batch size 10, it gives a disturbing audio output.

Hi @breizhn, thanks for your job.
I found the B1 model(4 Layer, STFT) in your paper. I understand there is only one separation kernel using an STFT analysis and synthesis basis and four LSTM layers in the B1 model( the second separation core is removed), and the training target is the negative SNR. Do I understand correctly?

Result of retrain is not good as pretrain models provided.

Thanks for your wonderful job. @breizhn
I use this project to retrain on DNS-challenge dataset that was updated recently. The denoised results of the retraining model is a little worse than that of your model provided in this project( both 40h and 500h model). I just set 'norm_stft'=True'.
Any advice to improve the performance of retraining?

Looking forward to your reply.

How to calculate those metrics in the DTLN paper?

Could you please share some code about calculating the metrics , like PESQ SI-SDR STOI score in the DTLN paper .

Real-time-processing Tf_lite inference program ERROR

Hi I loaded a noisy one channel .wav file to perform inference using real_time_processing_tf_lite.py.
I got the following Dimension mismatch error.

Traceback (most recent call last):
File "real_time_processing_tf_lite.py", line 70, in
interpreter_1.set_tensor(input_details_1[0]['index'], states_1)
File "/Users/cyberon/anaconda3/envs/dtln_py/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 399, in set_tensor
self._interpreter.SetTensor(tensor_index, value)
File "/Users/cyberon/anaconda3/envs/dtln_py/lib/python3.7/site-packages/tflite_runtime/interpreter_wrapper.py", line 148, in SetTensor
return _interpreter_wrapper.InterpreterWrapper_SetTensor(self, i, value)
ValueError: Cannot set tensor: Dimension mismatch. Got 4 but expected 3 for input 0.

Any inputs on this will be helpful!

Any tricks to improve DTLN performance on Raspberry?

I test the real_time_audio_device.py, when there are two human voices in a environment, I want to extract one from the mixed speech, but the performance is not good, any tricks to improve this?

How is y_pred conneted to output of generator implicitly in keras?

Hi, thanks for sharing the code!

I am new to Keras. I am try to train your model but confused about how y_pred is generated.

The input of model is yield by the create_generator function

        yield in_dat.astype('float32'), tar_dat.astype('float32')

then tf_data_set is created from the generator

 self.tf_data_set = tf.data.Dataset.from_generator(
                self.create_generator,
                (tf.float32, tf.float32),
                output_shapes=(tf.TensorShape([self.len_of_samples]), \
                               tf.TensorShape([self.len_of_samples])),
                args=None
                )

the data batches are generated by .batch op

generator_input = audio_generator(path_to_train_mix, 
                                  path_to_train_speech, 
                                  len_in_samples, 
                                  self.fs, train_flag=True)
dataset = generator_input.tf_data_set
dataset = dataset.batch(self.batchsize, drop_remainder=True).repeat()
# calculate number of training steps in one epoch
steps_train = generator_input.total_samples//self.batchsize
# create data generator for validation data
generator_val = audio_generator(path_to_val_mix,
                                path_to_val_speech, 
                                len_in_samples, self.fs)
dataset_val = generator_val.tf_data_set
dataset_val = dataset_val.batch(self.batchsize, drop_remainder=True).repeat()

then modeo.fir is called to fit the model

self.model.fit(
    x=dataset, 
    batch_size=None,
    steps_per_epoch=steps_train, 
    epochs=self.max_epochs,
    verbose=1,
    validation_data=dataset_val,
    validation_steps=steps_val, 
    callbacks=[checkpointer, reduce_lr, csv_logger, early_stopping],
    max_queue_size=50,
    workers=4,
    use_multiprocessing=True)

In terms of calculation of the loss

    loss = tf.squeeze(self.cost_function(y_pred,y_true))
    # calculate mean over batches
    loss = tf.reduce_mean(loss)

y_pred,y_true is passed to the cost_function.

For some other codes that I've read, usually y_pred and y_true is clearly calculated, such as:

y_pred = Model(x_train_data) # output of model
y_true = x_label

I know that here, y_pred is the output of the model when in_dat is inputted to the model, and y_true is tar_dat. However, in yout model, I cannot find such calculations.

To summarize, my questions are:

How is y_pred conneted to output of generator (in_data)?
If I want to modify ouput of the generator and the loss function(e.g. original noise data and filenames), what should I do to differentiate which data are input to the model(such as in_dat) and which are not (such as tar_dat)?

PortAudioError: Error opening Stream: Invalid sample rate [PaErrorCode -9997]

Hi, I'm currently using a Raspberry Pi 4, I have installed the dependencies, I'm trying to make it work but I'm stuck on this error. Can you please help? Thank you.

pi@raspberrypi:~/Downloads/DTLN-master $ python3 real_time_dtln_audio.py --list-devices
0 bcm2835 HDMI 1: - (hw:0,0), ALSA (0 in, 8 out)
1 bcm2835 Headphones: - (hw:1,0), ALSA (0 in, 8 out)
2 USB PnP Sound Device: Audio (hw:2,0), ALSA (1 in, 2 out)
3 sysdefault, ALSA (0 in, 128 out)
4 lavrate, ALSA (0 in, 128 out)
5 samplerate, ALSA (0 in, 128 out)
6 speexrate, ALSA (0 in, 128 out)
7 pulse, ALSA (32 in, 32 out)
8 upmix, ALSA (0 in, 8 out)
9 vdownmix, ALSA (0 in, 6 out)
10 dmix, ALSA (0 in, 2 out)

11 default, ALSA (32 in, 32 out)

pi@raspberrypi:~/Downloads/DTLN-master $ python3 real_time_dtln_audio.py -i 2 -o 2
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2719
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
PortAudioError: Error opening Stream: Invalid sample rate [PaErrorCode -9997]

My USB headset with microphone is at 2 USB PnP Sound Device: Audio (hw:2,0), ALSA (1 in, 2 out)

Thank you very much!

Model B1 from paper

Hi, is the B1 model from the paper (4 layers of LSTM with STFT) also implemented in this code? Maybe I'm missing something..

Onnx in Javascript

@breizhn I am experimenting with loading Onnx model in js like this and experiencing an error. Looks like something on the input to lstm4 is not correct. All the ops are supported in Onnx I checked like DropOut, Activation etc here https://github.com/microsoft/onnxjs/blob/master/docs/operators.md

Any ideas why ?

    async function loadDTLN_modelOnnx()
    {
      
        const model1 = new onnx.InferenceSession();
        const model2 = new onnx.InferenceSession();
        var output1 = await model1.loadModel("./model_1.onnx");
        var output2 = await model2.loadModel("./model_2.onnx")
     };

It throws an error:

graph.ts:313 Uncaught (in promise) Error: unrecognized input '' for node: lstm_4
at t.buildGraph (graph.ts:313)
at new t (graph.ts:139)
at Object.from (graph.ts:77)
at t.load (model.ts:25)
at session.ts:85
at t.event (instrument.ts:294)
at e.initialize (session.ts:81)
at e. (session.ts:63)
at inference-session-impl.ts:16
at Object.next (inference-session-impl.ts:16)

Training the model with no shift loss doesn't improve

@breizhn Hi Nils,
I have pulled your prepared sample set for training with just modifying the blk_shift to be 512 i.e no shift. I notice no significant improvement after 40+ epoch. Is this expected? What loss and val loss did you get in your training?

Epoch 00055: val_loss improved from -13.22399 to -13.22746, saving model to ./models_DTLN_model_512_norm/DTLN_model_512_norm.h5
3000/3000 [==============================] - 8885s 3s/step - loss: -13.2950 - val_loss: -13.2275
Epoch 56/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2961
Epoch 00056: val_loss did not improve from -13.22746
3000/3000 [==============================] - 8886s 3s/step - loss: -13.2961 - val_loss: -13.2204
Epoch 57/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2977
Epoch 00057: val_loss did not improve from -13.22746
3000/3000 [==============================] - 8894s 3s/step - loss: -13.2977 - val_loss: -13.2184
Epoch 58/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2971
Epoch 00058: val_loss improved from -13.22746 to -13.23236, saving model to ./models_DTLN_model_512_norm/DTLN_model_512_norm.h5
3000/3000 [==============================] - 8888s 3s/step - loss: -13.2971 - val_loss: -13.2324
Epoch 59/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2989
Epoch 00059: val_loss did not improve from -13.23236
3000/3000 [==============================] - 8890s 3s/step - loss: -13.2989 - val_loss: -13.2236
Epoch 60/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2993
Epoch 00060: val_loss did not improve from -13.23236
3000/3000 [==============================] - 8889s 3s/step - loss: -13.2993 - val_loss: -13.2229
Epoch 61/200
2698/3000 [=========================>....] - ETA: 13:08 - loss: -13.3098

question about LSTM states

I'm trying to run your model on arm platform by TVM or any other inference engine. But there is nothing but tf-lite supported stateful LSTM. From my experiment, I find that the model without states achieves a bad performance.
Is there any solution to achieve a good performance with stateless LSTM?
which mean:
model_1 = Model(inputs=mag, outputs=mask_1)
model_2 = Model(inputs=estimated_frame_1, outputs=decoded_frame)
I need your help

Voice Flickering during RT Use

Hi,

Thank you for this great repo, the model and pre-trained weights.

I tried to use the model you provided (dtln_saved_model) for real-time denoising on my laptop.
It has successfully removed the background noise, but the resultant speech signal has a lot of flickering.

Please check the audio files in below links.

Using Windows OS, CPU, TF 2.2. pyaudio for real time processing.

Can you suggest what is causing this and how to avoid it?

noisy - clyp.it/opnfnagd?token=1f91ac255bf94fce0dac66f2fe2cc36c
cleaned - clyp.it/55rwflcn?token=cec87a02aa62634e7ff5da4d9e43d5c2

error on run noisyspeech_synthesizer_multiprocessing.py

I clone your dns-challage fork, and run python noisyspeech_synthesizer_multiprocessing.py. But error can be seen below:

WARNING: Audio type not supported
Generating file #59032
WARNING: Audio type not supported
Generating file #59220
WARNING: Audio type not supported
Generating file #59408
WARNING: Audio type not supported
Generating file #59596
WARNING: Audio type not supported
Generating file #59784
WARNING: Audio type not supported
Generating file #59972
WARNING: Audio type not supported
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/.conda/envs/train_env/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/root/.conda/envs/train_env/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "noisyspeech_synthesizer_multiprocessing.py", line 156, in main_gen
gen_audio(True, params, filenum)
File "noisyspeech_synthesizer_multiprocessing.py", line 124, in gen_audio
build_audio(is_clean, params, filenum, audio_samples_length)
File "noisyspeech_synthesizer_multiprocessing.py", line 75, in build_audio
input_audio, fs_input = audioread(source_files[idx])
File "/data1/dtln/fork-dns-challenge/audiolib.py", line 46, in audioread
if len(audio.shape) == 1: # mono
AttributeError: 'NoneType' object has no attribute 'shape'
"""

Google research

https://github.com/google-research/google-research/tree/master/kws_streaming

In there models they have embedded MFCC so you just point the 16k audio stream in chunks and the streaming KWS works.

There is FFT in the python ops and just wondered with the TFlite models and a look at what they did above could improve performance.
This is all beyond me but after some testing the embedded MFCC seems approx 2x faster than an external routine with Librosa.

Dunno if the above is any you to you. @breizhn

question about B3 model in paper

Hi @breizhn, thanks for your job. The parameters of B3 model are as follows. Is it correct?

Model: "functional_1"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, None)] 0

lambda (Lambda) [(None, None, 257), 0 input_1[0][0]

tf_op_layer_AddV2 (TensorFlowOp [(None, None, 257)] 0 lambda[0][0]

tf_op_layer_Log (TensorFlowOpLa [(None, None, 257)] 0 tf_op_layer_AddV2[0][0]

instant_layer_normalization (In (None, None, 257) 514 tf_op_layer_Log[0][0]

lstm (LSTM) (None, None, 156) 258336 instant_layer_normalization[0][0]

dropout (Dropout) (None, None, 156) 0 lstm[0][0]

lstm_1 (LSTM) (None, None, 156) 195312 dropout[0][0]

dense (Dense) (None, None, 257) 40349 lstm_1[0][0]

activation (Activation) (None, None, 257) 0 dense[0][0]

multiply (Multiply) (None, None, 257) 0 lambda[0][0]
activation[0][0]

lstm_2 (LSTM) (None, None, 156) 258336 multiply[0][0]

dropout_1 (Dropout) (None, None, 156) 0 lstm_2[0][0]

lstm_3 (LSTM) (None, None, 156) 195312 dropout_1[0][0]

dense_1 (Dense) (None, None, 257) 40349 lstm_3[0][0]

activation_1 (Activation) (None, None, 257) 0 dense_1[0][0]

multiply_1 (Multiply) (None, None, 257) 0 multiply[0][0]
activation_1[0][0]

lambda_1 (Lambda) (None, None, 512) 0 multiply_1[0][0]
lambda[0][1]

lambda_2 (Lambda) (None, None) 0 lambda_1[0][0]

Total params: 988,508
Trainable params: 988,508
Non-trainable params: 0

> > @shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the `Execution Times` section of this repo's `ReadMe`.

@shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the Execution Times section of this repo's ReadMe.
Also, since you've got a working model, I have a couple of questions:

Which model did you use for conversion? .h5 or savedmodel?

.h5 norm model

Does tfjs have stateful LSTMs? Or did you handle states outside the model?

I didn't have to handle it outside. Tfjs handles sateful lstm [email protected]

Trouble is TFJS team told me they don't have any immediate plans to implement conv1dwithbias and causal padding. I feel it's a bug since the last layer bias is false.
Issue open: tensorflow/tfjs#3578

I just found that in tfjs, conv1d doesn't have an option to set usebias to false. It only supports usebias - true.

Yup you can modify and still get reasonable results.

Originally posted by @hchintada in #4 (comment)

Unable to test the Pre-trained model with GPU

Thank you so much for providing the open source solution as it is very helpful. I came across a problem which I though would be better answered by the creators of DTLN. I tested the pre-trained models on simple linux machine on CPU and it works fine but when I shifted it to the Google Cloud Platform(GCP) instance which has a Nvidia Tesla K80 GPU and I'm using it for other processes such as Speaker Diarization but I'm unable to test the Pre-trained models with that installation, as it does not result in any error, its just some GPU libraries which gives some errors and I face them with my previous tests for Speaker Diarization in which the GPU is being used, is there any chance that DTLN does not support GPU to test pre-trained models or does not work with GPU at all.

You response will be highly appreciated, I'm attaching an output snippet below just for your reference. As you would see that the GPU is visible and the code executes successfully but does not result in any converted audio files with Noise suppression.

Thank you

Output:

2021-09-30 06:42:41.828956: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:41.829007: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-09-30 06:42:45.115012: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-30 06:42:47.497740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:42:47.498557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2021-09-30 06:42:47.498729: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.498885: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.499052: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.501305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-30 06:42:47.502306: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-30 06:42:47.502448: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.502617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.502790: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.502831: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-30 06:42:47.503317: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-30 06:42:47.504158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-30 06:42:47.504203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]
Processing finished.

run_training.py, shape problem

I'm trying to train a new model using run_training.py.
When I read my .wav files using scipy.io.wavfile.read I get a 2d array, something like the following.

[[ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 2 2], [ 2 2], [ 2 2], [ 2 2], [ 3 3], [ 5 5], [ 6 6], [ 6 6], [ 5 5], [ 3 3], [ 3 3], [ 2 3], [ 3 3], [ 2 2], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 2 2], [ 2 2], [ 0 0], [-2 -3], [-5 -5], [-5 -5], [-2 -2], [ 0 0], [ 2 2], [ 3 3], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 1 1], [ 2 2], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 2 2], [ 1 1], [ 0 0], [-1 -1], [-2 -2], [-2 -2], [-1 -1], [ 0 0], [ 1 1], [ 2 2], [ 2 2]]

When I run the script, I get the following error:

ValueError: generator yielded an element of shape (720000, 2) where an element of shape (720000,) was expected.

Full error message:

`
Traceback (most recent call last):
File "/home/qendrim/solaborate/repos/ML/NoiseCancellation/DTLN/run_training.py", line 51, in
modelTrainer.train_model(runName, path_to_train_mix, path_to_train_speech, path_to_val_mix, path_to_val_speech)
File "/home/qendrim/solaborate/repos/ML/NoiseCancellation/DTLN/DTLN_model.py", line 383, in train_model
use_multiprocessing=True)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit
total_epochs=epochs)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
distributed_function(input_fn))
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in call
result = self._call(*args, **kwds)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
return self._stateless_fn(*args, **kwds)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2363, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
self.captured_inputs)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: generator yielded an element of shape (720000, 2) where an element of shape (720000,) was expected.
Traceback (most recent call last):

File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in call
ret = func(*args)

File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 825, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))

ValueError: generator yielded an element of shape (720000, 2) where an element of shape (720000,) was expected.

[[{{node PyFunc}}]]
[[IteratorGetNext]] [Op:__inference_distributed_function_10659]

Function call stack:
distributed_function
`

python: 3.7
tensorflow: 2.1

Transfer Learning with DTLN model weights to remove the block shift

To do away with block processing at inference, I'm trying to use your pre-trained weights, and retrain the network, after replacing the stftLayer with fftLayer.

Using mag normalization as is.

I can set stateful=False for both separation kernels while training?

Train for around 20 epochs.

Does this idea make sense @breizhn? Or should I start from zero to effect this change?

Also, can you please provide guidance on how to use data augmentation/pre-processing to train the network only with 40 hours data?

The data augment method used for

Would work with audio streaming?

Hi, first of all great work!

I'm wandering if it would be possible to use this method on audio streaming, because of the block shift that it is used.

I'd be possible? If so, should I do some modifications like for example not using the block shift?

Thanks!

Modify to 32K sample rate

I tried to modified the DTLN model to 32K, but it always got error.
''----------------------------------------------------------------------------------------------------------------
File "/Library/Python/3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1110, in fit
raise ValueError('Expect x to be a non-empty array or dataset.')
ValueError: Expect x to be a non-empty array or dataset.
''---------------------------------------------------------------------------------------------------------------

I changed the dataset to 32K, and modify self.fs = 32000, and all fs related stuffs.

Please help...

How to retraining for 10ms audio

How to retraining this model for 10ms audio frame : ）

some error after python run_training.py

Traceback (most recent call last):
File "run_training.py", line 56, in
path_to_val_mix, path_to_val_speech)
File "/data1/dtln/DTLN/DTLN_model.py", line 555, in train_model
self.fs, train_flag=True)
File "/data1/dtln/DTLN/DTLN_model.py", line 54, in init
self.count_samples()
File "/data1/dtln/DTLN/DTLN_model.py", line 68, in count_samples
info = WavInfoReader(os.path.join(self.path_to_input, file))
File "/root/.conda/envs/train_env/lib/python3.7/site-packages/wavinfo/wave_reader.py", line 52, in init
self.main_list = chunks.children
AttributeError: 'ChunkDescriptor' object has no attribute 'children'

Model fails with exception if given epoch more than 62

Hi,

I'm trying to run your model on python 3.8.5 with the Tensorflow version of 2.3.1.
It works fine with lower epochs but as soon as it goes to 62 or higher epoch, the model fails with the exception given below

2021-01-06 15:28:33.866558: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]

Have you faced such an error or can you tell what can be the solution for it?

Thanks in advance!

Paper link is not working

update paper link. The current paper link is not working.

Valid paper link should be https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2631.pdf

Training got error

Epoch 00095: val_loss did not improve from -16.76465
2021-01-21 00:24:52.466681: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

about traing a real time model

Hello, in your code about training, you used 15s of data and then divided into frames, so the timestep of the data input to lstm is the number of frames, then I want to know about if I want to train a real-time processing model, Then the input data during training is single frame data, like (batch,timestep=1,512) or 15s data like (batch,timestep=1873,512)

TensorflowJS conversion

I am trying to convert the savedmodel using tensorflowjs 2.X using the following command:

tensorflowjs_converter --control_flow_v2=False --input_format=tf_saved_model --saved_model_tags=serve --signature_name=serving_default --strip_debug_ops=False --weight_shard_size_bytes=4194304 C:\Users\ss\Documents\workspace\DTLN\DTLN-master\pretrained_model\DTLN_norm_500h_saved_model C:\Users\ss\Documents\workspace\DTLN\tfjs

I get the following two errors:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 497, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 0 of node StatefulPartitionedCall/model/lstm/AssignVariableOp was passed float from Func/StatefulPartitionedCall/input/_4:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 482, in convert_tf_saved_model
frozen_graph = _freeze_saved_model_v2(concrete_func, control_flow_v2)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 352, in _freeze_saved_model_v2
concrete_func, lower_control_flow=not control_flow_v2).graph
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 680, in convert_variables_to_constants_v2
return _construct_concrete_function(func, output_graph_def, converted_inputs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 406, in _construct_concrete_function
new_output_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 633, in function_from_graph_def
wrapped_import = wrap_function(_imports_graph_def, [])
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 611, in wrap_function
collections={}),
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 86, in call
return self.call_with_variable_creator_scope(self._fn)(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 92, in wrapped
return fn(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 631, in _imports_graph_def
importer.import_graph_def(graph_def, name="")
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Input 0 of node StatefulPartitionedCall/model/lstm/AssignVariableOp was passed float from Func/StatefulPartitionedCall/input/_4:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\wizard.py", line 606, in run
converter.convert(arguments)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\converter.py", line 681, in convert
control_flow_v2=args.control_flow_v2)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 485, in convert_tf_saved_model
output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 342, in _freeze_saved_model_v1
sess, g.as_graph_def(), output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 359, in convert_variables_to_constants
inference_graph = extract_sub_graph(input_graph_def, output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 205, in extract_sub_graph
_assert_nodes_are_present(name_to_node, dest_nodes)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 160, in _assert_nodes_are_present
assert d in name_to_node, "%s is not in graph" % d
AssertionError: Identity is not in graph

[Question] Preparation of dataset?

Hi Nils,

I trained your model on speech & noise provided by DNS challenge, but it seems that the model you provided in pretrained_model/ folder performs better than the model I trained (especially for the data with_reverb). Therefore I got some question about how you prepared your data and trained your model.

In your paper you mentioned that WHAMR corpus was also used, did you use it as train & cross validation & test or only test set?
Did you use DNS-Challenge script provided by microsoft to create the training set (I think they did not consider room impulse response in thier script)? Or did you add any RIR to create your training set?
In the DNS-Challenge repo you forked, I think you just randomly split the noisy data into train and val, which means that same speaker(s) may appear in both train and val, right?
Compared with DTLN_norm_500h.h5, did you use norm_stft=false and other parameters unchanged to get the model.h5 in pretrained_model?

Convert model.h5 to tflite

I'm trying to convert the provided pretrained model to tflite:

converter = tf.lite.TocoConverter.from_keras_model_file("./model.h5") tflite_model = converter.convert() open(os.path.join(".", "model.tflite"), 'wb').write(tflite_model)

And I get the following error:

ValueError: No model found in config file.

Full error message:

Traceback (most recent call last): File "/home/qendrim/solaborate/repos/solaborate/Solaborate.ML.KPP/src/s4_export_to_tflite.py", line 18, in <module> converter = tf.lite.TocoConverter.from_keras_model_file("./model.h5") File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 1002, in from_keras_model_file input_shapes, output_arrays) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 747, in from_keras_model_file keras_model = _keras.models.load_model(model_file, custom_objects) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/python/keras/saving/save.py", line 146, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 209, in load_model_from_hdf5 raise ValueError('No model found in config file.') ValueError: No model found in config file.

python: 3.7
tensorflow: 1.4

What can I do to convert this model to tflite?

ERROR: Could not find a version that satisfies the requirement tflite-runtime==2.1.0.post1

I just want to run using tflite model and executed
conda env create -f tflite_env.yml
But got error on End.
My system is Mac Big Sur 11.6
Someone please help me.

I'm training model 48K model, block_len and block_shift，encoder_size，numUnits ？

Thanks for your wonderful job. @breizhn，48K model and 44.1K model，block_len and block_shift，encoder_size，numUnits, shouled I change these parameters?

Issue with tflite interpreter

Hi,
In DTLN model, I did some changes in separation kernel which looks as given below
` x = keras.layers.Conv1D( B, 1,use_bias=False )(x)

    y =  keras.layers.Conv1D(N, 1,use_bias=False  )(x)
    y =  keras.activations.relu(y, alpha = 0.01 ) 

    y =  InstantLayerNormalization()(y) 
    y =  keras.layers.Conv1D( N, kernel_size=P,  strides=1, padding= "causal", dilation_rate=2**0, groups=N,use_bias=False)(y) 
 
    y = keras.activations.relu(y, alpha = 0.01 )
    y = InstantLayerNormalization()(y)
    v = keras.layers.Conv1D( B, 1 ,use_bias=False)(y)
    z = y
    x = x + v

When I'm trying tflite convertion and tflite interpreter I'm getting error which is as mentioned below
"RuntimeError: tensorflow/lite/kernels/conv.cc:238 input->dims->data[3] != filter->dims->data[3] (32 != 1)Node number 1 (CONV_2D) failed to prepare"
I tried to debug the issue and understood that if I kept x = x+v or using v as the input for the next layers then I'm getting this error. (due to dilated convolution dependency with v)

1.How to resolve this error?
2. If we cannot do it directly with tflite , then is there any alternative method for post quantization of model such as given above code?

Any help will be greatly appreciable . Thanks in Advance!

There is no any code with Training data preparation?

Microphone sampling rate

I was trying to get real_time_dtln_audio.py to work . Should i change my microphone input and speaker output to 16khz as well ?

40h training early stopped between 80~90 epochs

Hi Nils,
Thank you for your great work! Recently I tried to train my model exactly as you described in the environment created with train_env.yml, the only difference is my Ubuntu 20.04, however my training always early stopped between 80-90 epochs, val_loss -16~-17, no matter how I recreated dataset for training. Any suggestions to improve?
Thanks,
Junmin Guo

Low version tf model training

Hello, can I use tensorflow version 1.15 to train the model？thanks

validation loss -16.83, but performance is not better than model.h5.

model.h5 : /pretrained_model/model.h5 (not normalized)

As a result of performance comparison between my_model.h5 and model.h5, model.h5 is the best.
I looked at the reason.
The size of the weights(weight,bias) values of "my_model.h5" is small compared to the weights values of "model.h5"

figure1. weights result

figure2. signal result

tensorflow-gpu == 2.4.0
tensorflow==2.4.0
GPU : GeForce GTX 1050
The parameters are the same as the code. The dataset was DNS-Challenge2020. I used clean,noise.
snr_lowr : -5
snr_upper : 25
total_hours : 40
norm_stft : False

Training data was created with the code provided by DNS-Challenge2020.
I set it to epoch 200, but because it is set to patient 10, it stopped at 96.
validation loss : -16.83

Is there a way to increase the weight value?
Is it a dataset issue?
Any advice would be appreciated.
Thank you.

nan when training

hi!Breizhn.thanks for your job.
I have a question about training，when i changing the variable len_samples to 5, nan appears. the len_samples must set longer？

The data augment method used for 40h training dataset better than 500h

Hi! I want to know what data augment method you used in 40h training dataset to achieve better performance than 500h training dataset

Some questions about real-time denoising

Breizhn, I have few question of real-time denoising.

First, what the 'quantization' argument in 'convert_weights_to_tf_lite.py' mean. If i want ti get the tf lite model, I just use the tf model (.h5) I just trained as input to 'convert_weights_to_tf_lite.py', right? And, if I set the argument 'quantization' be True, then the result of denoising shold be better, right? Did I misunderstanding?

And, when I run the code, 'real_time_dtln_audio.py', I notice that result of output is much worse than the result when I use 'run_evaluation.py' with tf model. Is that because I used tf lite model without quantization?

The last but not least, if I want to change the value of block_len_ms and block_shift_ms, I need to retrain the model with new value of batchsize, blockLen, and block_shift in DTLN_model(), right?

Thank you a lot.

onnx quantization

Thanks for your code! If I want to do the quantization on the onnx model, Coud you give me some advice?

TFLite Android

Hi,
I want to use tflite model in android project. When I load model to android studio it generates a code like below:

`
val model = Dtln.newInstance(context)

// Creates inputs for reference.
val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 1, 512), DataType.FLOAT32)
inputFeature0.loadBuffer(byteBuffer)
val inputFeature1 = TensorBuffer.createFixedSize(intArrayOf(1, 2, 128, 2), DataType.FLOAT32)
inputFeature1.loadBuffer(byteBuffer)

// Runs model inference and gets result.
val outputs = model.process(inputFeature0, inputFeature1)
val outputFeature0 = outputs.outputFeature0AsTensorBuffer
val outputFeature1 = outputs.outputFeature1AsTensorBuffer

// Releases model resources if no longer used.
model.close()
`

My question is what is the inputFeature0 and inputFeature1 in this code? Should I read wav file as byte array than reshape it? Or Should I create feature vector of wav file? Can you help me with this?

Thanks

can't convert same onnx by provided python file.

I convert onnx file by convert_weights_to_onnx.py. The onnx files converted is not sames onnx files provided

The B3 model in paper

Hi，Thank you for your fantastic implement about DTLN! I just want to know more details about model b3 in paper, only one STFT&iFFT and 2*2 lstm layer in b3? what the difference compare to b1?

the inference time of tflite_quant is larger than tflite

I use the supported model file (model_1.tflite, model_2.tflite, model_quant_1.tflite and model_quant_2.tflite) and the script "real_time_processing_tf_lite.py" to compare the inference time.
My implementation configs: Ubuntu 18.04, tf2.0.
the processing times are shown as follows:
TF-lite: 0.383403 ms; TF-lite quantized: 0.4470351 ms
It is a little abnormal that TF-lite quantized model is slower than TF-lite model during inference. I found the script is required in tf2.3.0 when running tflite model. Does it mean that tf2.0 has some limitations in your script? Looking forward to your reply

Real-time-processing Tf_lite inference program ERROR

Hi, even after changing the input_details as you did in the file, the code is still giving an error

    interpreter_2.set_tensor(input_details_2[0]['index'], estimated_block)#input_details_2[0]
  File "/home/purna/.local/lib/python3.6/site-packages/tflite_runtime/interpreter.py", line 399, in set_tensor
    self._interpreter.SetTensor(tensor_index, value)
  File "/home/purna/.local/lib/python3.6/site-packages/tflite_runtime/interpreter_wrapper.py", line 148, in SetTensor
    return _interpreter_wrapper.InterpreterWrapper_SetTensor(self, i, value)
ValueError: Cannot set tensor: Dimension mismatch. Got 512 but expected 257 for dimension 2 of input 0.

and the input_details_2 is below along with shapes we want

states2
(1, 2, 128, 2)
input_details_2[1]['shape']
[  1   2 128   2]
estimated_block
(1, 1, 512)
input_details_2[0]['shape']
[  1   1 257]

any help would be appreciated.
Thanks in advance