Giter VIP home page Giter VIP logo

Comments (24)

Smith42 avatar Smith42 commented on May 26, 2024 8

I've been training using this repo and am getting (very) good results on 256x256 images after around 800,000 global steps (batch of 16). Score-based models are known to take more compute to train vs a comparable GAN, so perhaps more training time is required in your cases?

from denoising-diffusion-pytorch.

IceClear avatar IceClear commented on May 26, 2024 1

@IceClear Do you mind sharing sample images if you could?

Sure, here it is (sample 186) after 186k.
sample-186

from denoising-diffusion-pytorch.

ariel415el avatar ariel415el commented on May 26, 2024

Hi, I'm also traying to train this repo.
What image resolution are you using?
In the paper (Appendix B) they say they trained 256x256 CelebA-HQ for 500k steps of 64 batchsize.
Did your loss plateau or is it still decreasing?
And by the way, how much time did it take to train these 150k steps? what batchsize?

from denoising-diffusion-pytorch.

IceClear avatar IceClear commented on May 26, 2024

Similar results after 145k on cifar. I wonder if it is harder to be trained than GAN or it is not stable enough yet...

from denoising-diffusion-pytorch.

qsh-zh avatar qsh-zh commented on May 26, 2024

@ariel415el The loss plateau for the figure I show if my memory serves me well. I forget some details about the experiments, it ran like 36-48 hours on one 2080Ti. Batchsize is 32 with fp16, unet dim 64.

from denoising-diffusion-pytorch.

qsh-zh avatar qsh-zh commented on May 26, 2024

@IceClear Do you mind sharing sample images if you could?

from denoising-diffusion-pytorch.

ariel415el avatar ariel415el commented on May 26, 2024

Thanks @Smith42 ,
The thing is for me and @qshzh the train loss plateaus so I'm not sure how more steps can help. Did your loss continue decreasing throughout training?
Can you share some of your result images here so that we know what to expect?
BTW, for how long did you train the model? i guess it was more than 2 days.

from denoising-diffusion-pytorch.

Smith42 avatar Smith42 commented on May 26, 2024

Can you share some of your result images here so that we know what to expect?

@ariel415el unfortunately I can't share the results just yet, but should have a preprint out soon that I can share.

The thing is for me and @qshzh the train loss plateaus so I'm not sure how more steps can help. Did your loss continue decreasing throughout training?

The loss didn't seem to plateau for me until very late in the training cycle, but this is with training on a dataset with order 10^6 examples.

BTW, for how long did you train the model? i guess it was more than 2 days.

On a single V100 it took around 2 weeks of training.

from denoising-diffusion-pytorch.

qsh-zh avatar qsh-zh commented on May 26, 2024

@IceClear @ariel415el This is the fid curve on cifar10 for sampled 1k images.
image
The 26 step in the figure is global 108000 steps. For 50k samples, its fid is 15.13.

from denoising-diffusion-pytorch.

Sumching avatar Sumching commented on May 26, 2024

The image size is 256, batchsize is 32, and 480k steps, which does not look good.
image

from denoising-diffusion-pytorch.

gwang-kim avatar gwang-kim commented on May 26, 2024

@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right?

from denoising-diffusion-pytorch.

Smith42 avatar Smith42 commented on May 26, 2024

@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right?

That's way too high, I'm getting sub 0.1 once fully trained. Have you checked your normalisations?

from denoising-diffusion-pytorch.

jiangxiluning avatar jiangxiluning commented on May 26, 2024

@Smith42
hi, I trained it with cifar-10. The batch size is 16. The image size is 128. The loss is about 0.05. But the generated images are seemed as being blurred.

from denoising-diffusion-pytorch.

Smith42 avatar Smith42 commented on May 26, 2024

@Smith42 hi, I trained it with cifar-10. The batch size is 16. The image size is 128. The loss is about 0.05. But the generated images are seemed as being blurred.

I use a fork of Phil's code in my paper and am not getting blurring problems. Maybe there is something up with your hyperparameters?

from denoising-diffusion-pytorch.

cajoek avatar cajoek commented on May 26, 2024

Hi @Smith42 & @jiangxiluning when you say you get a loss below 0.1 are you using a L1 or L2 loss?

from denoising-diffusion-pytorch.

jiangxiluning avatar jiangxiluning commented on May 26, 2024

@cajoek for me, it is L1.

from denoising-diffusion-pytorch.

Smith42 avatar Smith42 commented on May 26, 2024

L1 for me too

from denoising-diffusion-pytorch.

cajoek avatar cajoek commented on May 26, 2024

Thanks @jiangxiluning @Smith42!

My loss unfortunately plateaus at about 0.10-0.15 so I decided to plot the mean L1 loss over one epoch versus the timestep t and I noticed that the loss stays quite high for low values of t, as can be seen i this figure. Do you know if that is expected?
Loss_vs_timestep
(L1 loss vs timestep t after many epochs on a small dataset. Convergence is not quite reached yet)

from denoising-diffusion-pytorch.

malekinho8 avatar malekinho8 commented on May 26, 2024

@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper.

from denoising-diffusion-pytorch.

Smith42 avatar Smith42 commented on May 26, 2024

@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper.

@malekinho8 I ran a fork of lucidrains' model on a large galaxy image data set here, not on CelebA. However, the galaxy imagery is well replicated with this codebase, so I expect it will work okay on CelebA too.

from denoising-diffusion-pytorch.

DushyantSahoo avatar DushyantSahoo commented on May 26, 2024

@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model
trainer = Trainer(
diffusion,
new_train,
train_batch_size = 32,
train_lr = 1e-4,
train_num_steps = 500000, # total training steps
gradient_accumulate_every = 2, # gradient accumulation steps
ema_decay = 0.995, # exponential moving average decay
amp = True # turn on mixed precision
)
model = Unet(
dim = 16,
dim_mults = (1, 2, 4)
)

from denoising-diffusion-pytorch.

greens007 avatar greens007 commented on May 26, 2024

Hi, I got the same problem in cifar10. The model generated failed images even after 150k steps. Did you succeeded?

@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model trainer = Trainer( diffusion, new_train, train_batch_size = 32, train_lr = 1e-4, train_num_steps = 500000, # total training steps gradient_accumulate_every = 2, # gradient accumulation steps ema_decay = 0.995, # exponential moving average decay amp = True # turn on mixed precision ) model = Unet( dim = 16, dim_mults = (1, 2, 4) )

from denoising-diffusion-pytorch.

yiyixuxu avatar yiyixuxu commented on May 26, 2024

Hi: so cifar10 contains tiny pictures 32x32 - it is naturally going to look blurry if you resize to 128x128

from denoising-diffusion-pytorch.

177488ZL avatar 177488ZL commented on May 26, 2024

Thanks for your clean implementation sharing.

I try on celeba datasets. After 150k steps, the generated images are not well as it claimed in the paper and the flowers you show in the readme.

Is it something to do with the datasets or I need more time to train?

image

Excuse me, do you modify the code or parameters during training, or load the pre-training weight file, the loss will drop to nan during my training

from denoising-diffusion-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.