How long does it takes to train. about denoising-diffusion-pytorch HOT 24 OPEN

lucidrains commented on May 26, 2024

How long does it takes to train.

from denoising-diffusion-pytorch.

Comments (24)

Smith42 commented on May 26, 2024 8

I've been training using this repo and am getting (very) good results on 256x256 images after around 800,000 global steps (batch of 16). Score-based models are known to take more compute to train vs a comparable GAN, so perhaps more training time is required in your cases?

from denoising-diffusion-pytorch.

IceClear commented on May 26, 2024 1

@IceClear Do you mind sharing sample images if you could?

Sure, here it is (sample 186) after 186k.

from denoising-diffusion-pytorch.

ariel415el commented on May 26, 2024

Hi, I'm also traying to train this repo.
What image resolution are you using?
In the paper (Appendix B) they say they trained 256x256 CelebA-HQ for 500k steps of 64 batchsize.
Did your loss plateau or is it still decreasing?
And by the way, how much time did it take to train these 150k steps? what batchsize?

from denoising-diffusion-pytorch.

IceClear commented on May 26, 2024

Similar results after 145k on cifar. I wonder if it is harder to be trained than GAN or it is not stable enough yet...

from denoising-diffusion-pytorch.

qsh-zh commented on May 26, 2024

@ariel415el The loss plateau for the figure I show if my memory serves me well. I forget some details about the experiments, it ran like 36-48 hours on one 2080Ti. Batchsize is 32 with fp16, unet dim 64.

from denoising-diffusion-pytorch.

qsh-zh commented on May 26, 2024

@IceClear Do you mind sharing sample images if you could?

from denoising-diffusion-pytorch.

ariel415el commented on May 26, 2024

Thanks @Smith42 ,
The thing is for me and @qshzh the train loss plateaus so I'm not sure how more steps can help. Did your loss continue decreasing throughout training?
Can you share some of your result images here so that we know what to expect?
BTW, for how long did you train the model? i guess it was more than 2 days.

from denoising-diffusion-pytorch.

Smith42 commented on May 26, 2024

Can you share some of your result images here so that we know what to expect?

@ariel415el unfortunately I can't share the results just yet, but should have a preprint out soon that I can share.

The thing is for me and @qshzh the train loss plateaus so I'm not sure how more steps can help. Did your loss continue decreasing throughout training?

The loss didn't seem to plateau for me until very late in the training cycle, but this is with training on a dataset with order 10^6 examples.

BTW, for how long did you train the model? i guess it was more than 2 days.

On a single V100 it took around 2 weeks of training.

from denoising-diffusion-pytorch.

qsh-zh commented on May 26, 2024

@IceClear @ariel415el This is the fid curve on cifar10 for sampled 1k images.

The 26 step in the figure is global 108000 steps. For 50k samples, its fid is 15.13.

from denoising-diffusion-pytorch.

Sumching commented on May 26, 2024

The image size is 256, batchsize is 32, and 480k steps, which does not look good.

from denoising-diffusion-pytorch.

gwang-kim commented on May 26, 2024

@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right?

from denoising-diffusion-pytorch.

Smith42 commented on May 26, 2024

@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right?

That's way too high, I'm getting sub 0.1 once fully trained. Have you checked your normalisations?

from denoising-diffusion-pytorch.

jiangxiluning commented on May 26, 2024

@Smith42
hi， I trained it with cifar-10. The batch size is 16. The image size is 128. The loss is about 0.05. But the generated images are seemed as being blurred.

from denoising-diffusion-pytorch.

Smith42 commented on May 26, 2024

@Smith42 hi， I trained it with cifar-10. The batch size is 16. The image size is 128. The loss is about 0.05. But the generated images are seemed as being blurred.

I use a fork of Phil's code in my paper and am not getting blurring problems. Maybe there is something up with your hyperparameters?

from denoising-diffusion-pytorch.

cajoek commented on May 26, 2024

Hi @Smith42 & @jiangxiluning when you say you get a loss below 0.1 are you using a L1 or L2 loss?

from denoising-diffusion-pytorch.

jiangxiluning commented on May 26, 2024

@cajoek for me, it is L1.

from denoising-diffusion-pytorch.

Smith42 commented on May 26, 2024

L1 for me too

from denoising-diffusion-pytorch.

cajoek commented on May 26, 2024

Thanks @jiangxiluning @Smith42!

My loss unfortunately plateaus at about 0.10-0.15 so I decided to plot the mean L1 loss over one epoch versus the timestep t and I noticed that the loss stays quite high for low values of t, as can be seen i this figure. Do you know if that is expected?

(L1 loss vs timestep t after many epochs on a small dataset. Convergence is not quite reached yet)

from denoising-diffusion-pytorch.

malekinho8 commented on May 26, 2024

@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper.

from denoising-diffusion-pytorch.

Smith42 commented on May 26, 2024

@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper.

@malekinho8 I ran a fork of lucidrains' model on a large galaxy image data set here, not on CelebA. However, the galaxy imagery is well replicated with this codebase, so I expect it will work okay on CelebA too.

from denoising-diffusion-pytorch.

DushyantSahoo commented on May 26, 2024

@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model
trainer = Trainer(
diffusion,
new_train,
train_batch_size = 32,
train_lr = 1e-4,
train_num_steps = 500000, # total training steps
gradient_accumulate_every = 2, # gradient accumulation steps
ema_decay = 0.995, # exponential moving average decay
amp = True # turn on mixed precision
)
model = Unet(
dim = 16,
dim_mults = (1, 2, 4)
)

from denoising-diffusion-pytorch.

greens007 commented on May 26, 2024

Hi, I got the same problem in cifar10. The model generated failed images even after 150k steps. Did you succeeded?

@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model trainer = Trainer( diffusion, new_train, train_batch_size = 32, train_lr = 1e-4, train_num_steps = 500000, # total training steps gradient_accumulate_every = 2, # gradient accumulation steps ema_decay = 0.995, # exponential moving average decay amp = True # turn on mixed precision ) model = Unet( dim = 16, dim_mults = (1, 2, 4) )

from denoising-diffusion-pytorch.

yiyixuxu commented on May 26, 2024

Hi: so cifar10 contains tiny pictures 32x32 - it is naturally going to look blurry if you resize to 128x128

from denoising-diffusion-pytorch.

177488ZL commented on May 26, 2024

Thanks for your clean implementation sharing.

I try on celeba datasets. After 150k steps, the generated images are not well as it claimed in the paper and the flowers you show in the readme.

Is it something to do with the datasets or I need more time to train?

Excuse me, do you modify the code or parameters during training, or load the pre-training weight file, the loss will drop to nan during my training

from denoising-diffusion-pytorch.

How long does it takes to train. about denoising-diffusion-pytorch HOT 24 OPEN

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent