Giter VIP home page Giter VIP logo

denoising-diffusion-pytorch's Introduction

Denoising Diffusion Probabilistic Model, in Pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to generative modeling that may have the potential to rival GANs. It uses denoising score matching to estimate the gradient of the data distribution, followed by Langevin sampling to sample from the true distribution.

This implementation was inspired by the official Tensorflow version here

Youtube AI Educators - Yannic Kilcher | AI Coffeebreak with Letitia | Outlier

Flax implementation from YiYi Xu

Annotated code by Research Scientists / Engineers from ๐Ÿค— Huggingface

Update: Turns out none of the technicalities really matters at all | "Cold Diffusion" paper | Muse

PyPI version

Install

$ pip install denoising_diffusion_pytorch

Usage

import torch
from denoising_diffusion_pytorch import Unet, GaussianDiffusion

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True
)

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000    # number of steps
)

training_images = torch.rand(8, 3, 128, 128) # images are normalized from 0 to 1
loss = diffusion(training_images)
loss.backward()

# after a lot of training

sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)

Or, if you simply want to pass in a folder name and the desired image dimensions, you can use the Trainer class to easily train a model.

from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True
)

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000,           # number of steps
    sampling_timesteps = 250    # number of sampling timesteps (using ddim for faster inference [see citation for ddim paper])
)

trainer = Trainer(
    diffusion,
    'path/to/your/images',
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    amp = True,                       # turn on mixed precision
    calculate_fid = True              # whether to calculate fid during training
)

trainer.train()

Samples and model checkpoints will be logged to ./results periodically

Multi-GPU Training

The Trainer class is now equipped with ๐Ÿค— Accelerator. You can easily do multi-gpu training in two steps using their accelerate CLI

At the project root directory, where the training script is, run

$ accelerate config

Then, in the same directory

$ accelerate launch train.py

Miscellaneous

1D Sequence

By popular request, a 1D Unet + Gaussian Diffusion implementation.

import torch
from denoising_diffusion_pytorch import Unet1D, GaussianDiffusion1D, Trainer1D, Dataset1D

model = Unet1D(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    channels = 32
)

diffusion = GaussianDiffusion1D(
    model,
    seq_length = 128,
    timesteps = 1000,
    objective = 'pred_v'
)

training_seq = torch.rand(64, 32, 128) # features are normalized from 0 to 1

loss = diffusion(training_seq)
loss.backward()

# Or using trainer

dataset = Dataset1D(training_seq)  # this is just an example, but you can formulate your own Dataset and pass it into the `Trainer1D` below

trainer = Trainer1D(
    diffusion,
    dataset = dataset,
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    amp = True,                       # turn on mixed precision
)
trainer.train()

# after a lot of training

sampled_seq = diffusion.sample(batch_size = 4)
sampled_seq.shape # (4, 32, 128)

Trainer1D does not evaluate the generated samples in any way since the type of data is not known.

You could consider adding a suitable metric to the training loop yourself after doing an editable install of this package pip install -e ..

Citations

@inproceedings{NEURIPS2020_4c5bcfec,
    author      = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
    booktitle   = {Advances in Neural Information Processing Systems},
    editor      = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages       = {6840--6851},
    publisher   = {Curran Associates, Inc.},
    title       = {Denoising Diffusion Probabilistic Models},
    url         = {https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
    volume      = {33},
    year        = {2020}
}
@InProceedings{pmlr-v139-nichol21a,
    title       = {Improved Denoising Diffusion Probabilistic Models},
    author      = {Nichol, Alexander Quinn and Dhariwal, Prafulla},
    booktitle   = {Proceedings of the 38th International Conference on Machine Learning},
    pages       = {8162--8171},
    year        = {2021},
    editor      = {Meila, Marina and Zhang, Tong},
    volume      = {139},
    series      = {Proceedings of Machine Learning Research},
    month       = {18--24 Jul},
    publisher   = {PMLR},
    pdf         = {http://proceedings.mlr.press/v139/nichol21a/nichol21a.pdf},
    url         = {https://proceedings.mlr.press/v139/nichol21a.html},
}
@inproceedings{kingma2021on,
    title       = {On Density Estimation with Diffusion Models},
    author      = {Diederik P Kingma and Tim Salimans and Ben Poole and Jonathan Ho},
    booktitle   = {Advances in Neural Information Processing Systems},
    editor      = {A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan},
    year        = {2021},
    url         = {https://openreview.net/forum?id=2LdBqxc1Yv}
}
@article{Karras2022ElucidatingTD,
    title   = {Elucidating the Design Space of Diffusion-Based Generative Models},
    author  = {Tero Karras and Miika Aittala and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2206.00364}
}
@article{Song2021DenoisingDI,
    title   = {Denoising Diffusion Implicit Models},
    author  = {Jiaming Song and Chenlin Meng and Stefano Ermon},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2010.02502}
}
@misc{chen2022analog,
    title   = {Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning},
    author  = {Ting Chen and Ruixiang Zhang and Geoffrey Hinton},
    year    = {2022},
    eprint  = {2208.04202},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
@article{Salimans2022ProgressiveDF,
    title   = {Progressive Distillation for Fast Sampling of Diffusion Models},
    author  = {Tim Salimans and Jonathan Ho},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2202.00512}
}
@article{Ho2022ClassifierFreeDG,
    title   = {Classifier-Free Diffusion Guidance},
    author  = {Jonathan Ho},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2207.12598}
}
@article{Sunkara2022NoMS,
    title   = {No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects},
    author  = {Raja Sunkara and Tie Luo},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2208.03641}
}
@inproceedings{Jabri2022ScalableAC,
    title   = {Scalable Adaptive Computation for Iterative Generation},
    author  = {A. Jabri and David J. Fleet and Ting Chen},
    year    = {2022}
}
@article{Cheng2022DPMSolverPlusPlus,
    title   = {DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models},
    author  = {Cheng Lu and Yuhao Zhou and Fan Bao and Jianfei Chen and Chongxuan Li and Jun Zhu},
    journal = {NeuRips 2022 Oral},
    year    = {2022},
    volume  = {abs/2211.01095}
}
@inproceedings{Hoogeboom2023simpleDE,
    title   = {simple diffusion: End-to-end diffusion for high resolution images},
    author  = {Emiel Hoogeboom and Jonathan Heek and Tim Salimans},
    year    = {2023}
}
@misc{https://doi.org/10.48550/arxiv.2302.01327,
    doi     = {10.48550/ARXIV.2302.01327},
    url     = {https://arxiv.org/abs/2302.01327},
    author  = {Kumar, Manoj and Dehghani, Mostafa and Houlsby, Neil},
    title   = {Dual PatchNorm},
    publisher = {arXiv},
    year    = {2023},
    copyright = {Creative Commons Attribution 4.0 International}
}
@inproceedings{Hang2023EfficientDT,
    title   = {Efficient Diffusion Training via Min-SNR Weighting Strategy},
    author  = {Tiankai Hang and Shuyang Gu and Chen Li and Jianmin Bao and Dong Chen and Han Hu and Xin Geng and Baining Guo},
    year    = {2023}
}
@misc{Guttenberg2023,
    author  = {Nicholas Guttenberg},
    url     = {https://www.crosslabs.org/blog/diffusion-with-offset-noise}
}
@inproceedings{Lin2023CommonDN,
    title   = {Common Diffusion Noise Schedules and Sample Steps are Flawed},
    author  = {Shanchuan Lin and Bingchen Liu and Jiashi Li and Xiao Yang},
    year    = {2023}
}
@inproceedings{dao2022flashattention,
    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
    booktitle = {Advances in Neural Information Processing Systems},
    year    = {2022}
}
@article{Bondarenko2023QuantizableTR,
    title   = {Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},
    author  = {Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2306.12929},
    url     = {https://api.semanticscholar.org/CorpusID:259224568}
}
@article{Karras2023AnalyzingAI,
    title   = {Analyzing and Improving the Training Dynamics of Diffusion Models},
    author  = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2312.02696},
    url     = {https://api.semanticscholar.org/CorpusID:265659032}
}

denoising-diffusion-pytorch's People

Contributors

adversarian avatar amm1111 avatar aryaaftab avatar bautajd avatar denproc avatar kashif avatar klasocki avatar lhaippp avatar lucidrains avatar lukovnikov avatar npielawski avatar parskatt avatar pengzhangzhi avatar qiyan98 avatar ryanndagreat avatar siddancha avatar thedudefromci avatar wassname avatar xjlswd avatar yzx9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

denoising-diffusion-pytorch's Issues

Self attention computation

Hi, thanks for porting the tf project to torch and open sourcing the code.

Looking at the linear attention code, it seems quite different from the standard self attention in at least two ways:

  • the position of softmax is different from the usual softmax after computing QK^T.
  • summation is over the w*h dimension, as opposed to the usual channel dimension

multi-gpu training support?

it looks like it only supports a single GPU training rn. Do you have any plans to add multi-gpu training option any time soon? much appreciated

Possible bug in alpha_bar

Thanks for all the work you shared.

I noticed the recent update in ddpm code and found possible bug:

alphas_cumprod_prev = F.pad(alphas_cumprod[:-1], (0, 1), value = 1.)

The line was previously:

alphas_cumprod_prev = np.append(1., alphas_cumprod[:-1])

From my understanding, padding should be applied to the left instead of right, which may require modification:

alphas_cumprod_prev = F.pad(alphas_cumprod[:-1], (1, 0), value = 1.) 

to make them equivalent.

Alpha schedule instead of beta schedule?

In the "Improved Denoising Diffusion Probabilistic Models" paper, the authors claim that cosine schedule of beta makes alpha-bar change more smoothly, leading to better results. Then I wonder why not linearly schedule alpha-bar, instead of beta? We can compute beta from alpha-bar. Anyone tried that?

TQDM on trainer

I noticed the trainer reports loss + step manually with print.
Moving the TQDM or something the of the sort to match pytorch standard would be a QOL boost.
I'd be happy to make the update if needed.

argument 'size' must be tuple of ints, but found element of type tuple at pos 3 when changing image size to 256

Dear @lucidrains ,

Thanks for the great work.

I change the image_size to (256, 256) and encounter the following error.

It would be highly appreciated if you could give me some guidance to fix this error.

    trainer.train()
  File "/home/diffusion/denoising-diffusion-pytorch/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 566, in train
    all_images_list = list(map(lambda n: self.ema_model.sample(batch_size=n), batches))
  File "/home/diffusion/denoising-diffusion-pytorch/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 566, in <lambda>
    all_images_list = list(map(lambda n: self.ema_model.sample(batch_size=n), batches))
  File "/home/diffusion/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/diffusion/denoising-diffusion-pytorch/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 398, in sample
    return self.p_sample_loop((batch_size, channels, image_size, image_size))
  File "/home/diffusion/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/diffusion/denoising-diffusion-pytorch/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 388, in p_sample_loop
    img = torch.randn(shape, device=device)
TypeError: randn(): argument 'size' must be tuple of ints, but found element of type tuple at pos 3

[Already in] Feature: Output Channel Count Settings

Edit: Turns out it does have a channel argument, it just wasn't shown on the readme. Nevermind.

I'd like to train for gray images, without wasting computation on three channels. I reckon other folks might like the option for 4 channels for RGBA, or who knows what else.

LayerNorm appears different from Ba 2016

Looking at your ConvNextBlock I noticed your LayerNorm is different from torch.nn.LayerNorm(). In fact yours appears to normalize over the dimension of the Channels. In Liu et al 2022 they specifically mention LayerNormalization. It would put a crimp on the definition of the model as you have to fix the image plane dimension on model definition, but if Liu is correct it 'should' ? improve models.

Is there a reason why you are normalizing along channels that I somehow missed from the papers?

Thanks.

Questions About DDPM

Hi,

May I ask a question about the DDPM(based on implementation)?

I am a bit confused about its training loss: loss = (noise - x_recon).abs().mean().

The noise is random noise? I am confused why it (the loss) is based on the random noise here? Forgive my foolishness and misunderstanding. Could you please explain a bit?

Thanks for your help.

model image size conflict in Trainer unless specified as an argument

i denoising_diffusion_pytorch.py in the Trainer Class init there is an option for 'image_size' which has a default value of 128, and these arguments are not documented anywhere I could locate.

The issue then comes later inside the init method where you set:
self.image_size = diffusion_model.image_size

however, when the Dataset instance is declared is is initialized with:
self.ds = Dataset( folder, image_size, augment_horizontal_filip = augment_horizontal_filp)

So, if one has originally created a diffusion_model with an image size of 64, but does not know to set the argument on the Trainer, then a default image size of 128 will be used and cause a conflict.

I believe it can be fixed by simply declaring the Dataset with self.image_size, instead of image_size.

torch.pi expected

  File "/usr/local/lib/python3.6/dist-packages/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py", line 358, in cosine_beta_schedule
    alphas_cumprod = torch.cos(((x / timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
AttributeError: module 'torch' has no attribute 'pi'

I'm guessing this doesn't support some versions of pytorch?

Resnet block

Hi!

Thank you so much for your nice work.
I've a question regarding the Resnet structure that you used in the code. As I read Jonathan Ho's repository Resnet architecture, I saw some different implementation there. Could you please let me know what are the enhancement you applied on his implementation?

Thanks.

Why the model not converging?

Hi, thanks for the useful repo. I tried the code on a single spiral image. After 10k training steps, the model still doesn't converge. How could this happen? And I don't understand why the artifacts occur around the border?
I am showing the results (left) and the training image (right) below.
image

I'm using the default setting for the loss and the lr.

How to reduce time steps' computation time working on Google Colab

Hi,
first of all thanks for this repository and for your work. I'm a student that is working with this code for a project's university. My aim is to replicate the results written in this paper using the same parameters.
I tried to run the code with Colab, but the computation time for each step is really huge (e.g. 1000 steps in 4 hours); how can i reduce the computation time for each step? Is there some optimization that i can do to speed-up the training?
Any suggestions (also from other users, not only the creator of this repo) would be appreciated. Thank you!

PS i'm using the CIFAR10 as dataset

How long does it takes to train.

Thanks for your clean implementation sharing.

I try on celeba datasets. After 150k steps, the generated images are not well as it claimed in the paper and the flowers you show in the readme.

Is it something to do with the datasets or I need more time to train?

image

differences on sampling between the paper and your implementation

Thanks for sharing your code. When I am reading the code with the DDPM paper, I find that

  1. in the paper, the predicted noise is used to compute mean directly, then use the mean to do sampling
  2. in your implementation, the predicted noise is first used to compute x_start, then use x_start and x_t to compute mean, then use the mean to do sampling

I don't think these two ways are same in the formulation, which confuses me a lot. Can you give me some suggestions about it, thanks

self.update_ema_every set but never used

In denoising_diffusion_pytorch/denoising_diffusion_pytorch.py
class Trainer takes an argument update_ema_every = 10, and sets self.update_ema_every = update_ema_every but then this is not used. self.ema.update() is called every step.

Suggested fix:

if self.step != 0 and self.step % self.update_ema_every == 0:
    self.ema.update()

EDIT: I see ema_pytorch can take in update_every, so a better fix is to switch self.ema = EMA(diffusion_model, beta = ema_decay) with self.ema = EMA(diffusion_model, beta = ema_decay, update_every=self.update_ema_every).

Not sure if this is just a temporary issue due to the EMA bits being partly moved to ema_pytorch but figured I should point it out in case it got missed :)

How should I resume training?

I just conducted training this model, but i noticed that it took enormous hours.
According to READ.ME, check points are published while training, but i don't know how to use it to start training process again.
Could you tell me how to restart training?

Question about p2_loss_weight

Thank you such a great work.

I have question about p2_loss_weight in here.

# calculate p2 reweighting
register_buffer("p2_loss_weight", (p2_loss_weight_k + alphas_cumprod / (1 - alphas_cumprod)) ** -p2_loss_weight_gamma)

It is saying it followed the original paper paper.
Eq. 5 of the paper is: $\mathcal E_{x_{0,\epsilon}}\left[\frac{\beta_t}{(1-\beta_t)(1-\bar{\alpha}_t)}\right]$.
But your implementation is: $\left(\frac{\bar{\alpha}}{1-\bar{\alpha}}\right)^{-\gamma}$.

I don't understand how you induced this formula of the implementation.

Thank you.

inference results with noise

Hi, thanks for your excellent project !

I am new to diffusion models and I am now training a vanilla diffusion model over a small dataset (100 images). After being trained for 3000 epoches (around 10k iterations) with initial learning rate 5e-5, the trained model with small training loss produces images with much noises just like

bd52d51a-63a5-46f8-94a4-230ba55ca19a

How can I resolve the problem ? Should I train it for longer time or enlarge the learning rate ? Some useful tricks to debug ?

Appreciate any help !

Different image resolutions (training and sampling)

assert h == img_size and w == img_size, f'height and width of image must be {img_size}'

Hello,

Do you think it is possible to train a DDPM with images that have different resolutions? And therefore sample from the model some images at different resolutions?

I am currently trying it but the samples seem not good. Do you see any upgrades in the current architecture/training in order to face this issue?

Thanks for your help.

Technical question about sampling function

Hi,

I was wondering why every diffusion models implementation uses this specific sampling procedure?
When I take a look at the DDPM paper they show the sampling algorithm to be:
algorithm_sampling

However, it seems that no implementation follows that and rather takes a really complicated route of first predicting the noise, then calculating x_0, then the mean and logvariance and then construct x_t-1 from that.

I implemented the above algorithm while using your codebase:

@torch.no_grad()
    def my_sample(self, n):
        x = torch.randn((n, 3, self.image_size, self.image_size)).to(self.device)
        for i in tqdm(reversed(range(1, self.num_timesteps)), position=0):
            t = (torch.ones(n) * i).long().to(self.device)
            predicted_noise = self.denoise_fn(x, t)
            beta = extract(self.betas, t, x.shape)
            alpha_hat = extract(self.alphas_cumprod, t, x.shape)
            alpha = 1. - beta
            if i > 1:
                noise = torch.randn_like(x)
            else:
                noise = torch.zeros_like(x)
            x = 1 / torch.sqrt(alpha) * (x - ((1 - alpha) / (torch.sqrt(1 - alpha_hat))) * predicted_noise) + torch.sqrt(beta) * noise
            x = x.clamp(-1., 1.)
        return x.add(1).mul(0.5)

But the results are just gray images with a bit of shape and colour:
(top is the normal sampling, like your code, bottom is using the above sampling function)
image

Do you have any idea why this kind of sampling does not work?

the sample results are training images

Dear @lucidrains ,

Thanks for the great repo.

I was wondering whether the model is generating training set images?

Here is a saved sample during training. The results look very good but I found all the images are training images.

sample-179

How do epochs work with ddpm?

For example, if I set 700k training steps, and stop at ~30k when there are 15k images, is there an idea of how many times the training data has been cycled through?

Unet's "Groups" is unused

It seems that "groups" for the Unet aren't used. It's not passed to the ResNetBlock modules nor the Block in the final_conv.

How to compute Negative Log Likelihood?

Hi everyone,
There is some way to compute Negative Log Likelihood using the functiondef discretized_gaussian_log_likelihood(x, *, means, log_scales, thres = 0.999) defined in the file learning_gaussian_diffution? Which are the parameters of the function? How I get them after trained the model?

non-square images

the images must be square, as the image_size determines both the height and width,
but what if the dataset is not square, like CelebA or Fashion Product Images Dataset?

thanks

Outputs fixed in [-1,1] range

All of my outputs from the diffusion model, following the code snippets, are bounded between -1 to 1. Is this meant to be the case? I couldn't find anything obvious in this repo that would cause that.

My images are all square matching the same image size that's passed into the GaussianDiffusion model and pixel values are between 0-255. I saw that whenever I'm was trying to display the images as a result of model.sample(...) I was getting what seemed like noise on the output, and also noticed that during training my loss is stuck around 0.8 as shown below after 2.5k iterations (image size: 128, batch: 32, l1 loss, lr: 0.0001, time_steps: 4000 instead of the default 1000, also had similar with the default time steps):
Screenshot 2022-05-01 at 20 13 55

Load faster

The code seems that the image loading method is slow.
For example, I think the code should load learning images in parallel.
Current:

self.dl = cycle(data.DataLoader(self.ds, batch_size = train_batch_size, shuffle=True, pin_memory=True))

After:
self.dl = cycle(data.DataLoader(self.ds, batch_size = train_batch_size, num_workers=os.cpu_count(), shuffle=True, pin_memory=True))

May I make an pull request?

ConvNext appears to look worse, concerned

I'm training a model on illustrations with the new ConvNext modules and it's not giving me the results I expected from training with the same parameters and data before the ConvNext change. Previously it used to create lines and shapes of solid color, but now there aren't any smooth lines that appear when checking the output, it's like fuzzy scribbles instead.

Previous version without ConvNext: https://i.imgur.com/qeF0dNn.png
Current version with ConvNext: https://i.imgur.com/S3ierK5.png

These outputs are typical for the old and new code when I trained different models with differing parameters. Without ConvNext I get lines and shapes, with ConvNext I get lots of fuzzy but hard lines/edges.

It seems like maybe it's very slowly becoming less fuzzy over time but considering the amount of time that has to be put into training this I want to be sure that it's definitely going to work and that it has produced results with these changes.

Any suggestion to generate 1-D vectors using diffusion model

Recently I am trying to use the diffusion model to do 1-D vector generation task, such as to generate sentence embedding which is originally generated from Bert, I have some questions about it

  1. the diffusion model is mainly used to generate images recently, is it possible to use that to generate 1-D vector?
  2. if it is possible, which part do I should modify to do the training?
  3. if it is impossible, what is the problem?

thanks for your suggestions

About recent code commit updates

Hello, I downloaded your latest commit code. I feel that the reconstruction effect is much worse than the code downloaded in March. Have you observed the same phenomenon?
Thanks a lot!

No attention

It seems that even if you implemented a LinearAttention module, you are not actually using it since every time you instantiate it, you use

Residual(Rezero(LinearAttention(mid_dim)))

where the Rezero module discards its input parameter, resulting in an identity operation.

Possible bug in cosine_beta_schedule

Thanks for all the work you shared.

I found possible bug: the cosine_beta_schedule function should be

def cosine_beta_schedule(timesteps, s=0.008):
  steps = timesteps + 1
  x = torch.linspace(0, timesteps, steps)
  alphas_cumprod = torch.cos(((x / timesteps) + s) / (1 + s) * math.pi * 0.5) ** 2
  alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
  betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
  
  return torch.clip(betas, 0, 0.999)

Now the function is:

def cosine_beta_schedule(timesteps, s = 0.008):
    steps = timesteps + 1
    x = torch.linspace(0, steps, steps)
    alphas_cumprod = torch.cos(((x / steps) + s) / (1 + s) * torch.pi * 0.5) ** 2
    alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
    betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
    return torch.clip(betas, 0, 0.999)

Loss size

Hi!,
What loss (L1 and L2) should I expect for a properly trained model? And which loss usually performs better?

Saw you changed ResNet to ConvNext

I recently was thinking of doing the same but was wondering if you have tried it and seen any actual benefit?
I am currently using the UNet architecture from OpenAI improved/guided diffusion as well

Thanks

Code for sample.png

Could you please provide the code by which the "sample.png" is generated? (with the dataset path)
or better a sample code with MNIST
thanks

Sampling

Hey there,

After we trained the model, I would try to sample the images by

sampled_images = diffusion.sample(128, batch_size = 750).

My question is, do we sample or get new unique images every time that we execute the above line of code? like is the first images of batch=750 different from the 2nd time that I sample ?

Best

Sampled images are often mirrored across y-axis despite training images being oriented in same direction

Is there something in the repository that treats mirrors about the y-axis as equivalent? Wondering if this is just an artifact of the checkpoint sample-##.png stacking of the images or if those are the real outputs.

Reproducer

I'm using ~16k grayscale training images that are $64 \times 64$ pixels.

conda create -n xtal2png-ddpm python==3.9.*
conda activate xtal2png-ddpm
pip install denoising_diffusion_pytorch xtal2png
from os import path

import torch
from denoising_diffusion_pytorch import GaussianDiffusion, Trainer, Unet
from mp_time_split.core import MPTimeSplit

from xtal2png.core import XtalConverter

mpt = MPTimeSplit()
mpt.load()

fold = 0
train_inputs, val_inputs, train_outputs, val_outputs = mpt.get_train_and_val_data(fold)

data_path = path.join("data", "preprocessed", "mp-time-split")
xc = XtalConverter(save_dir=data_path)
xc.xtal2png(train_inputs.tolist())

model = Unet(dim=64, dim_mults=(1, 2, 4, 8), channels=1).cuda()

diffusion = GaussianDiffusion(
    model, channels=1, image_size=64, timesteps=1000, loss_type="l1"
).cuda()

trainer = Trainer(
    diffusion,
    data_path,
    image_size=64,
    train_batch_size=32,
    train_lr=2e-5,
    train_num_steps=700000,  # total training steps
    gradient_accumulate_every=2,  # gradient accumulation steps
    ema_decay=0.995,  # exponential moving average decay
    amp=True,  # turn on mixed precision
)

trainer.train()

sampled_images = diffusion.sample(batch_size=100)

Training examples

image

Notice how they're all oriented with the small square of zeros at the top-left (this is the case for all training data).

sample-25.png

loss: 0.0535:   4%|  | 25507/700000 [4:26:37<94:32:36,  1.98it/s]9it/s]

Notice how many of them are mirrored about the y-axis, which is not desired. If it's just an artifact of how the images are stacked, that's one thing - but if it's the actual sampled images it's a bit worrisome. Note also that it never seems to do an x-axis mirror with the small square of zeros at the bottom-right or bottom-left.

sample-25

cc my labmate @hasan-sayeed who has also been working on this

Question about DDPM: Meaning of L_{0} (in loss) , and how it gets simplified into L_{simple}?

What does the last term of the loss, -log p(x_0 | x_1) mean? It seems similar to the the log-likelihood of a single data point from VAE's ELBO. It that's what it is, I'm puzzled how to interpret the conditional, | x_1.

Also, the paper's authors mention that this L_{0} term is included in the L_{simple} loss that they used, saying:

The t = 1 case corresponds to L0 with the integral in the discrete decoder definition (13) approximated by the Gaussian probability density function times the bin width, ignoring ฯƒ21 and edge effects.

How does this correspondence work?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.