neillu23 / cdiffuse Goto Github PK
View Code? Open in Web Editor NEWConditional Diffusion Probabilistic Model for Speech Enhancement
License: Apache License 2.0
Conditional Diffusion Probabilistic Model for Speech Enhancement
License: Apache License 2.0
Dear authors,
thank you very much for uploading the code for CDiffuSE! Unfortunately I have some problems with the execution.
Would be great if you could address the following issues:
1.) In __main__.py
there there is no --vocoder
argument for pretraining the Diffwave model. Should I use --pretrain
instead?
2.) In train.sh the argument --pretain
does come with path of the pretrained model although it is a store_false
argument. Should I use --pretrain_path
for passing the model path?
3.) In params.py
the inference noise schedule for the fast reverse sampling is different than the one described in the paper. Should it be inference_noise_schedule=[0.0001, 0.001, 0.01, 0.05, 0.2, 0.35]
?
4.) In the predict function of inference.py
the noisy signal y is not considered. This function seems to implement the sampling algorithm of DiffuSE. Could it be that you accidentally uploaded the inference.py
of your old repo?
Thanks a lot in advance!
Best, Julius
I try to train your model in my device or use your pretrained model directly on Voicebank dataset.But the both performance is different from that in your paper.I can only get the Unprocessed result in your paper.Can you give some tricks to get the same result in your paper?
Hi, thank you for sharing your work! I have some confusion about it.
audio = noisy_audio
(line 148), but according to Eq. (10), audio = alpha_cum[-1]**0.5 * noisy_signal + delta[-1] * torch.randn_like(noisy_signal).Hope to obtain your answer, thank you!
Hi!
Thanks for your nice work! It helps me a lot. I have 2 questions:
How do you test the PESQ? I found that there are some differences between 'pypesq' and 'pseq' libs.
Do you have plan to open source the pre-trained model?
Thanks again for your time, looking forward to your reply.
Best regards
The initial paper directly uses
I run the command "./train.sh 0 se model_se"
The issue is
"""""""""""""""""""""""""""""""""
Preprocessing: 0%| | 0/11572 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/cdiffuse/preprocess.py", line 140, in
main(parser.parse_args())
File "src/cdiffuse/preprocess.py", line 120, in main
list(tqdm(executor.map(spec_transform, filenames, repeat(args.dir), repeat(args.outdir)), desc='Preprocessing', total=len(filenames)))
File "/home/tiger/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/usr/lib/python3.7/concurrent/futures/process.py", line 476, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
"""""""""""""""""""""""""""""""""
How to solve this?
Although "se_pre" mode can run, with the dataset provided by your link, I MUST change the sample_rate to 48000 in params.py, otherwise this code will throw a wrong information. Does this correct for the reproduce?
Also, I run for 12 hours with 4 GPU at step 156600 for "se_pre" mode. How long(how much epoch) do we need to train your model?
Dear Author,
May I ask how to modify the training epoch and batch size?
Also, how to print out the whole parameters of the model?
Thank you!
train.sh: 16: [: -le: unexpected operator
train.sh: 31: [: -le: unexpected operator
I am a novice in this area. When I was reading the code, I couldn't find when the training would finish. Could you please tell me when should I finish my training? Thank you very much for your help.
Hello,
Can I try this model on my own 8kHz dataset? Will it have a good performance...?
I notice there is a "snr_process" function in inference.py and there may be a "scoring" file in valid.But I can't find them in this repository
First of all, thank you for your great work
I tried to reproduce on the Voicebank dataset with your code but got some problems. I try inference on checkpoint 100k but the result is not compared to your sample files and still remains background noise.
Some steps I do:
Could you get some insight into what possibly was I doing wrong?
Hello, thank you for making your code publicly available! Great work.
This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.
I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.
Indeed, your scripts resample audio signals in preprocessing and inference.
For preprocessing:
CDiffuSE/src/cdiffuse/preprocess.py
Line 37 in e4b069f
For inference:
CDiffuSE/src/cdiffuse/inference.py
Line 185 in e4b069f
But, audio signals are not resampled in training:
CDiffuSE/src/cdiffuse/dataset.py
Lines 56 to 57 in e4b069f
To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in dataset.py
and given to the diffusion model as they are during training. When I checked an audio signal that is saved here
CDiffuSE/src/cdiffuse/learner.py
Line 172 in e4b069f
It might be better either to resample audio signals also in dataset.py
or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.
Sorry for the long post. Best regards.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.