neillu23 / cdiffuse Goto Github PK

View Code? Open in Web Editor NEW

200.0 200.0 34.0 3.03 MB

Conditional Diffusion Probabilistic Model for Speech Enhancement

License: Apache License 2.0

Shell 11.18% Python 88.82%

cdiffuse's People

Contributors

Stargazers

Watchers

cdiffuse's Issues

Problems with execution

Dear authors,

thank you very much for uploading the code for CDiffuSE! Unfortunately I have some problems with the execution.

Would be great if you could address the following issues:

1.) In __main__.py there there is no --vocoder argument for pretraining the Diffwave model. Should I use --pretrain instead?

2.) In train.sh the argument --pretain does come with path of the pretrained model although it is a store_false argument. Should I use --pretrain_path for passing the model path?

3.) In params.py the inference noise schedule for the fast reverse sampling is different than the one described in the paper. Should it be inference_noise_schedule=[0.0001, 0.001, 0.01, 0.05, 0.2, 0.35]?

4.) In the predict function of inference.py the noisy signal y is not considered. This function seems to implement the sampling algorithm of DiffuSE. Could it be that you accidentally uploaded the inference.py of your old repo?

Thanks a lot in advance!

Best, Julius

The performance of CDiffuSE is unsatisfactory

I try to train your model in my device or use your pretrained model directly on Voicebank dataset.But the both performance is different from that in your paper.I can only get the Unprocessed result in your paper.Can you give some tricks to get the same result in your paper?

Some questions about inference.py

Hi, thank you for sharing your work! I have some confusion about it.

As shown in Eq.(17) in the paper，delta_bar[n] should be equal to (delta_cond[n])* delta[n] / delta[n-1]. However, the code shows that `delta_bar[n] = (delta_cond[n])* delta[n-1]/ delta[n]'.
The code shows that audio = noisy_audio (line 148), but according to Eq. (10), audio = alpha_cum[-1]**0.5 * noisy_signal + delta[-1] * torch.randn_like(noisy_signal).

Hope to obtain your answer, thank you!

PESQ Evaluation

Hi!

Thanks for your nice work! It helps me a lot. I have 2 questions:

How do you test the PESQ? I found that there are some differences between 'pypesq' and 'pseq' libs.
Do you have plan to open source the pre-trained model?

Thanks again for your time, looking forward to your reply.

Best regards

a easy problem

The initial paper directly uses $\epsilon$ minus the noise estimated by model in the training stage,why the way you use divides by square root 1-$a_0$ additionally?

Try to reproduce but some issues occur

I run the command "./train.sh 0 se model_se"

The issue is
"""""""""""""""""""""""""""""""""
Preprocessing: 0%| | 0/11572 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/cdiffuse/preprocess.py", line 140, in
main(parser.parse_args())
File "src/cdiffuse/preprocess.py", line 120, in main
list(tqdm(executor.map(spec_transform, filenames, repeat(args.dir), repeat(args.outdir)), desc='Preprocessing', total=len(filenames)))
File "/home/tiger/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/usr/lib/python3.7/concurrent/futures/process.py", line 476, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
"""""""""""""""""""""""""""""""""
How to solve this?

Although "se_pre" mode can run, with the dataset provided by your link, I MUST change the sample_rate to 48000 in params.py, otherwise this code will throw a wrong information. Does this correct for the reproduce?

Also, I run for 12 hours with 4 GPU at step 156600 for "se_pre" mode. How long(how much epoch) do we need to train your model?

How to set training epoch and batch size in the setting?

Dear Author,

May I ask how to modify the training epoch and batch size?
Also, how to print out the whole parameters of the model?

Thank you!

How to solve '[: -le: unexpected operator'?

train.sh: 16: [: -le: unexpected operator
train.sh: 31: [: -le: unexpected operator

A question about when to stop training

I am a novice in this area. When I was reading the code, I couldn't find when the training would finish. Could you please tell me when should I finish my training? Thank you very much for your help.

Trian on my own dataset

Hello,

Can I try this model on my own 8kHz dataset? Will it have a good performance...?

some code is lack

I notice there is a "snr_process" function in inference.py and there may be a "scoring" file in valid.But I can't find them in this repository

Reproduce on Voicebank dataset

First of all, thank you for your great work
I tried to reproduce on the Voicebank dataset with your code but got some problems. I try inference on checkpoint 100k but the result is not compared to your sample files and still remains background noise.

Some steps I do:

Preprocessing Voicebank dataset with flag se
Training without any modification
And here is my loss figure:

Could you get some insight into what possibly was I doing wrong?

Resampling in training

Hello, thank you for making your code publicly available! Great work.

This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.

I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.

Indeed, your scripts resample audio signals in preprocessing and inference.

For preprocessing:

CDiffuSE/src/cdiffuse/preprocess.py

Line 37 in e4b069f

y, sr = librosa.load(filename, sr=16000)

For inference:

CDiffuSE/src/cdiffuse/inference.py

Line 185 in e4b069f

 noisy_signal, _ = librosa.load(os.path.join(args.wav_path,spec.split("/")[-1].replace(".spec.npy","")),sr=16000) 

But, audio signals are not resampled in training:

CDiffuSE/src/cdiffuse/dataset.py

Lines 56 to 57 in e4b069f

 signal, _ = torchaudio.load(audio_filename) 

 noisy_signal, _ = torchaudio.load(noisy_filename)

To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in dataset.py and given to the diffusion model as they are during training. When I checked an audio signal that is saved here

CDiffuSE/src/cdiffuse/learner.py

Line 172 in e4b069f

 writer.add_audio('feature/audio', features['audio'][0], step, sample_rate=self.params.sample_rate) 

, I confirmed that it seems like the signal is played slowly. This is because an audio signal of 48 kHz is saved as a 16kHz signal.

It might be better either to resample audio signals also in dataset.py or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.

Sorry for the long post. Best regards.

	signal, _ = torchaudio.load(audio_filename)
	noisy_signal, _ = torchaudio.load(noisy_filename)

neillu23 / cdiffuse Goto Github PK

cdiffuse's People

Contributors

Stargazers

Watchers

Forkers

cdiffuse's Issues

Recommend Projects

Recommend Topics

Recommend Org