Giter VIP home page Giter VIP logo

cdiffuse's People

Contributors

neillu23 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdiffuse's Issues

Problems with execution

Dear authors,

thank you very much for uploading the code for CDiffuSE! Unfortunately I have some problems with the execution.

Would be great if you could address the following issues:

1.) In __main__.py there there is no --vocoder argument for pretraining the Diffwave model. Should I use --pretrain instead?

2.) In train.sh the argument --pretain does come with path of the pretrained model although it is a store_false argument. Should I use --pretrain_path for passing the model path?

3.) In params.py the inference noise schedule for the fast reverse sampling is different than the one described in the paper. Should it be inference_noise_schedule=[0.0001, 0.001, 0.01, 0.05, 0.2, 0.35]?

4.) In the predict function of inference.py the noisy signal y is not considered. This function seems to implement the sampling algorithm of DiffuSE. Could it be that you accidentally uploaded the inference.py of your old repo?

Thanks a lot in advance!

Best, Julius

The performance of CDiffuSE is unsatisfactory

I try to train your model in my device or use your pretrained model directly on Voicebank dataset.But the both performance is different from that in your paper.I can only get the Unprocessed result in your paper.Can you give some tricks to get the same result in your paper?

Some questions about inference.py

Hi, thank you for sharing your work! I have some confusion about it.

  • As shown in Eq.(17) in the paper,delta_bar[n] should be equal to (delta_cond[n])* delta[n] / delta[n-1]. However, the code shows that `delta_bar[n] = (delta_cond[n])* delta[n-1]/ delta[n]'.
  • The code shows that audio = noisy_audio (line 148), but according to Eq. (10), audio = alpha_cum[-1]**0.5 * noisy_signal + delta[-1] * torch.randn_like(noisy_signal).

Hope to obtain your answer, thank you!

PESQ Evaluation

Hi!

Thanks for your nice work! It helps me a lot. I have 2 questions:

  1. How do you test the PESQ? I found that there are some differences between 'pypesq' and 'pseq' libs.

  2. Do you have plan to open source the pre-trained model?

Thanks again for your time, looking forward to your reply.

Best regards

a easy problem

The initial paper directly uses $\epsilon$ minus the noise estimated by model in the training stage,why the way you use divides by square root 1-$a_0$ additionally?

Try to reproduce but some issues occur

I run the command "./train.sh 0 se model_se"

The issue is
"""""""""""""""""""""""""""""""""
Preprocessing: 0%| | 0/11572 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/cdiffuse/preprocess.py", line 140, in
main(parser.parse_args())
File "src/cdiffuse/preprocess.py", line 120, in main
list(tqdm(executor.map(spec_transform, filenames, repeat(args.dir), repeat(args.outdir)), desc='Preprocessing', total=len(filenames)))
File "/home/tiger/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/usr/lib/python3.7/concurrent/futures/process.py", line 476, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
"""""""""""""""""""""""""""""""""
How to solve this?

Although "se_pre" mode can run, with the dataset provided by your link, I MUST change the sample_rate to 48000 in params.py, otherwise this code will throw a wrong information. Does this correct for the reproduce?

Also, I run for 12 hours with 4 GPU at step 156600 for "se_pre" mode. How long(how much epoch) do we need to train your model?

A question about when to stop training

I am a novice in this area. When I was reading the code, I couldn't find when the training would finish. Could you please tell me when should I finish my training? Thank you very much for your help.

some code is lack

I notice there is a "snr_process" function in inference.py and there may be a "scoring" file in valid.But I can't find them in this repository

Reproduce on Voicebank dataset

First of all, thank you for your great work
I tried to reproduce on the Voicebank dataset with your code but got some problems. I try inference on checkpoint 100k but the result is not compared to your sample files and still remains background noise.

Some steps I do:

  • Preprocessing Voicebank dataset with flag se
  • Training without any modification
    And here is my loss figure:
    image

Could you get some insight into what possibly was I doing wrong?

Resampling in training

Hello, thank you for making your code publicly available! Great work.

This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.

I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.

Indeed, your scripts resample audio signals in preprocessing and inference.

For preprocessing:

y, sr = librosa.load(filename, sr=16000)

For inference:

noisy_signal, _ = librosa.load(os.path.join(args.wav_path,spec.split("/")[-1].replace(".spec.npy","")),sr=16000)

But, audio signals are not resampled in training:

signal, _ = torchaudio.load(audio_filename)
noisy_signal, _ = torchaudio.load(noisy_filename)

To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in dataset.py and given to the diffusion model as they are during training. When I checked an audio signal that is saved here

writer.add_audio('feature/audio', features['audio'][0], step, sample_rate=self.params.sample_rate)

, I confirmed that it seems like the signal is played slowly. This is because an audio signal of 48 kHz is saved as a 16kHz signal.

It might be better either to resample audio signals also in dataset.py or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.

Sorry for the long post. Best regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.