Giter VIP home page Giter VIP logo

pfgmpp's Introduction

PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

PWC

Pytorch implementation of the paper PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

by Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, Tommi S. Jaakkola

[Slide]

CIFAR-10 FFHQ-64 LSUN-Church-256
cifar_2 ffhq_2 lsun_2

๐Ÿ˜‡ Improvements over PFGM / Diffusion Models:

  • No longer require the large batch training target in PFGM, thus enable flexible conditional generation and more efficient training!
  • More general $D \in \mathbb{R}^+$ dimensional augmented variable. PFGM++ subsumes PFGM and Diffusion Models: PFGM correspond to $D=1$ and Diffusion Models correspond to $D\to \infty$.
  • Existence of sweet spot $D^*$ in the middle of $(1,\infty)$!
  • Smaller $D$ more robust than Diffusion Models ( $D\to \infty$ )
  • Enable the adjustment for model robustness and rigidity!
  • Enable direct transfer of well-tuned hyperparameters from any existing Diffusion Models ( $D\to \infty$ )

Abstract: We present a general framework termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for $N$ dimensional data by embedding paths in $N{+}D$ dimensional space while still controlling the progression with a simple scalar norm of the $D$ additional variables. The new models reduce to PFGM when $D{=}1$ and to diffusion models when $D{\to}\infty$. The flexibility of choosing $D$ allows us to trade off robustness against rigidity as increasing $D$ results in more concentrated coupling between the data and the additional variable norms. We dispense with the biased large batch field targets used in PFGM and instead provide an unbiased perturbation-based objective similar to diffusion models. To explore different choices of $D$, we provide a direct alignment method for transferring well-tuned hyperparameters from diffusion models ( $D{\to} \infty$ ) to any finite $D$ values. Our experiments show that models with finite $D$ can be superior to previous state-of-the-art diffusion models on CIFAR-10/FFHQ $64{\times}64$ datasets, with FID scores of $1.91/2.43$ when $D{=}2048/128$. In class-conditional generation, $D{=}2048$ yields current state-of-the-art FID of $1.74$ on CIFAR-10. In addition, we demonstrate that models with smaller $D$ exhibit improved robustness against modeling errors.

schematic


Outline

Our implementation is built upon the EDM repo. We first provide an guidance on how to quickly transfer the hyperparameter from well-tuned diffusion models ( $D\to \infty$ ), such as EDM and DDPM, to the PFGM++ family ( $D\in \mathbb{R}^+$ ) in a task/dataset agnostic way (We provide more details in Sec 4 ( Transfer hyperparameters to finite $D$ ) and Appendix C.2 in our paper). We highlight our modifications based on their original command lines for training, sampling and evaluation. We provide checkpoints in checkpoints section.

We also provide the original instruction for set-ups, such as environmental requirements and dataset preparation, from EDM repo.

Transfer guidance by $r=\sigma\sqrt{D}$ formula

Below we provide the guidance for how to quick transfer the well-tuned hyperparameters for diffusion models ( $D\to \infty$ ), such as $\sigma_{\textrm{max}}$ and $p(\sigma)$ to finite $D$s. We adopt the $r=\sigma\sqrt{D}$ formula in our paper for the alignment (c.f. Section 4). Please use the following guidance as a prototype.

๐Ÿ˜€ Please adjust the augmented dimension $D$ according to your task/dataset/model.

Training hyperparameter transfer. The example we provide is a simplified version of loss.py in this repo.

schematic

def train(y, N, D, pfgmpp):
  '''
  y: mini-batch clean images
  N: data dimension
  D: augmented dimension
  pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1
  '''
  
  if not pfgmpp:
    ###################### === Diffusion Model === ######################
    rnd_normal = torch.randn([images.shape[0], 1, 1, 1], device=images.device)
    sigma = (rnd_normal * self.P_std + self.P_mean).exp() # sample sigma from p(\sigma)
    n = torch.randn_like(y) * sigma
    D_yn = net(y + n, sigma)
    loss = (D_yn - y) ** 2
    ###################### === Diffusion Model === ######################
  else: 
    ###################### === PFGM++ === ######################
    rnd_normal = torch.randn(images.shape[0], device=images.device)
    sigma = (rnd_normal * self.P_std + self.P_mean).exp() # sample sigma from p(\sigma)
    r = sigma.double() * np.sqrt(self.D).astype(np.float64) # r=sigma\sqrt{D} formula

    # = sample noise from perturbation kernel p_r = #
    # Sampling form inverse-beta distribution
    samples_norm = np.random.beta(a=self.N / 2., b=self.D / 2.,
                                 size=images.shape[0]).astype(np.double)
    inverse_beta = samples_norm / (1 - samples_norm +1e-8)
    inverse_beta = torch.from_numpy(inverse_beta).to(images.device).double()
    # Sampling from p_r(R) by change-of-variable (c.f. Appendix B)
    samples_norm = (r * torch.sqrt(inverse_beta +1e-8)).view(len(samples_norm), -1)
    # Uniformly sample the angle component
    gaussian = torch.randn(images.shape[0], self.N).to(samples_norm.device)
    unit_gaussian = gaussian / torch.norm(gaussian, p=2, dim=1, keepdim=True)
    # Construct the perturbation 
    perturbation_x = (unit_gaussian * samples_norm).float()
    # = sample noise from perturbation kernel p_r = #

    sigma = sigma.reshape((len(sigma), 1, 1, 1))
    n = perturbation_x.view_as(y)
    D_yn = net(y + n, sigma)
    loss = (D_yn - y) ** 2
    ###################### === PFGM++ === ######################

Sampling hyperparameter transfer. The example we provide is a simplified version of generate.py in this repo. As shown in the figure below, the only modification is the prior sampling process. Hence we only include the comparison of prior sampling for diffusion models / PFGM++ in the code snippet.

schematic

def generate(sigma_max, N, D, pfgmpp)
  '''
  sigma_max: starting condition for diffusion models
  N: data dimension
  D: augmented dimension
  pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1
  '''
  if not pfgmpp:
    ###################### === Diffusion Model === ######################
    x = torch.randn_like(data_size) * sigma_max
    ###################### === Diffusion Model === ######################
  else:
    ###################### === PFGM++ === ######################
    # Sampling form inverse-beta distribution
    r = sigma_max * np.sqrt(self.D) # r=sigma\sqrt{D} formula
    samples_norm = np.random.beta(a=self.N / 2., b=self.D / 2.,
                                  size=data_size).astype(np.double)
    inverse_beta = samples_norm / (1 - samples_norm +1e-8)
    inverse_beta = torch.from_numpy(inverse_beta).to(images.device).double()
    # Sampling from p_r(R) by change-of-variable (c.f. Appendix B)
    samples_norm = (r * torch.sqrt(inverse_beta +1e-8)).view(len(samples_norm), -1)
    # Uniformly sample the angle component
    gaussian = torch.randn(images.shape[0], self.N).to(samples_norm.device)
    unit_gaussian = gaussian / torch.norm(gaussian, p=2, dim=1, keepdim=True)
    # Construct the perturbation 
    x = (unit_gaussian * samples_norm).float().view(data_size)
    ###################### === PFGM++ === #######################
    
    
  ########################################################
    
  # Heun's 2nd order method (aka improved Euler method)  #
    
  ########################################################

Please refer to Appendix C.2 for detailed hyperparameter transfer procedures from EDM and DDPMโ€‹.

Training PFGM++

You can train new models using train.py. For example:

torchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --name exp_name \
--data=datasets/cifar10-32x32.zip --cond=0 --arch=arch \
--pfgmpp=1 --batch 512 \
--aug_dim aug_dim (--resume resume_path)

exp_name: name of experiments
aug_dim: D (additional dimensions)  
arch: model architectures. options: ncsnpp | ddpmpp
pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1
resume_path: path to the resuming checkpoint

The above example uses the default batch size of 512 images (controlled by --batch) that is divided evenly among 8 GPUs (controlled by --nproc_per_node) to yield 64 images per GPU. Training large models may run out of GPU memory; the best way to avoid this is to limit the per-GPU batch size, e.g., --batch-gpu=32. This employs gradient accumulation to yield the same results as using full per-GPU batches. See python train.py --help for the full list of options.

The results of each training run are saved to a newly created directory training-runs/exp_name . The training loop exports network snapshots training-state-*.pt) at regular intervals (controlled by --dump). The network snapshots can be used to generate images with generate.py, and the training states can be used to resume the training later on (--resume). Other useful information is recorded in log.txt and stats.jsonl. To monitor training convergence, we recommend looking at the training loss ("Loss/loss" in stats.jsonl) as well as periodically evaluating FID for training-state-*.pt using generate.py and fid.py.

For FFHQ dataset, replacing --data=datasets/cifar10-32x32.zip with --data=datasets/ffhq-64x64.zip

Sidenote: The original EDM repo provide more dataset: FFHQ, AFHQv2, ImageNet-64. We did not test the performance of PFGM++ on these datasets due to limited computational resources. However, we believe that the some finte $D$s (sweet spots) would beat the diffusion models (the $D\to\infty$ case). Please let us know if you have those results ๐Ÿ˜€

Generate & Evaluations

  • Generate 50k samples:

    torchrun --standalone --nproc_per_node=8 generate.py \
    --seeds=0-49999 --outdir=./training-runs/exp_name \
    --pfgmpp=1 --aug_dim=aug_dim (--use_pickle=1)(--save_images)
       
    exp_name: name of experiments
    aug_dim: D (additional dimensions)  
    arch: model architectures. options: ncsnpp | ddpmpp
    pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1. (default:0)
    use_pickle: when the checkpoints are stored in pickle format (.pkl). (default:0)

Note that the numerical value of FID varies across different random seeds and is highly sensitive to the number of images. By default, fid.py will always use 50,000 generated images; providing fewer images will result in an error, whereas providing more will use a random subset. To reduce the effect of random variation, we recommend repeating the calculation multiple times with different seeds, e.g., --seeds=0-49999, --seeds=50000-99999, and --seeds=100000-149999. In the EDM paper, they calculated each FID three times and reported the minimum.

For the FID versus controlled $\alpha$/NFE/quantization, please use generate_alpha.py/generate_steps.py/generate_quant.py for generation.

  • FID evaluation

    torchrun --standalone --nproc_per_node=8 fid.py calc --images=training-runs/exp_name --ref=fid-refs/cifar10-32x32.npz --num 50000 
    
    exp_name: name of experiments

Checkpoints

All checkpoints are provided in this Google drive folder. We borrow the dataset specific hyperparameters, e.g. batch size, learning rate, etc, from EDM repo. Please refer to that repo for hyperparameters if you wish to try more datasets, like ImageNet 64. Some of the checkpoints are in .pkl format (due to a historical reason), please the --use_pickle=1 flag when using the generate.py for image generation. Please download the checkpoint into the designated ./training-runs/exp_name folder before running the generation command above.

Model Checkpoint path $D$ FID Options
cifar10-ncsnpp-D-128 pfgmpp/cifar10_ncsnpp_D_128/ 128 1.92 --cond=0 --arch=ncsnpp --pfgmpp=1 --aug_dim=128
cifar10-ncsnpp-D-2048 pfgmpp/cifar10_ncsnpp_D_2048/ 2048 1.91 --cond=0 --arch=ncsnpp --pfgmpp=1 --aug_dim=2048
cifar10-ncsnpp-D-2048-conditional pfgmpp/cifar10_ncsnpp_D_2048_conditional/ 2048 1.74 --cond=1 --arch=ncsnpp --pfgmpp=1 --aug_dim=2048
cifar10-ncsnpp-D-inf (EDM) pfgmpp/cifar10_ncsnpp_D_inf/ $\infty$ 1.98 --cond=0 --arch=ncsnpp
ffhq-ddpm-D-128 pfgmpp/ffhq_ddpm_D_128/ 128 2.43 --cond=0 --arch=ddpmpp --batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.05 --augment=0.15 --pfgmpp=1 --aug_dim=128
ffhq-ddpm-D-inf (EDM) pfgmpp/ffhq_ddpm_D_inf/ $\infty$ 2.53 --cond=0 --arch=ddpmpp --batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.05 --augment=0.15

The instructions for set-ups from EDM repo

Requirements

  • Python libraries: See environment.ymlfor exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
    • conda env create -f environment.yml -n edm
    • conda activate edm
  • Docker users:

Preparing datasets

Datasets are stored in the same format as in StyleGAN: uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information.

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz \
    --dest=datasets/cifar10-32x32.zip
python fid.py ref --data=datasets/cifar10-32x32.zip --dest=fid-refs/cifar10-32x32.npz

FFHQ: Download the Flickr-Faces-HQ dataset as 1024x1024 images and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/ffhq/images1024x1024 \
    --dest=datasets/ffhq-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/ffhq-64x64.zip --dest=fid-refs/ffhq-64x64.npz

AFHQv2: Download the updated Animal Faces-HQ dataset (afhq-v2-dataset) and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/afhqv2 \
    --dest=datasets/afhqv2-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/afhqv2-64x64.zip --dest=fid-refs/afhqv2-64x64.npz

ImageNet: Download the ImageNet Object Localization Challenge and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/imagenet/ILSVRC/Data/CLS-LOC/train \
    --dest=datasets/imagenet-64x64.zip --resolution=64x64 --transform=center-crop
python fid.py ref --data=datasets/imagenet-64x64.zip --dest=fid-refs/imagenet-64x64.npz

pfgmpp's People

Contributors

eltociear avatar hobbitlong avatar newbeeer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pfgmpp's Issues

run generate.py error

I can't run the code on kaggle.It tell me RuntimeError: Distributed package doesn't have NCCL built in.What should i do?

opts.pfgm is not defined

It seems opts.pfgm in train.py Line 162 c.update(rbatch=opts.rbatch, stf=opts.stf, pfgm=opts.pfgm, D=opts.aug_dim, pfgmpp=opts.pfgmpp) is not defined.

How long is the training time?

Hi, Yilun,

Thank you so much for sharing this wonderful work.
I wonder how long is training time with 8 GPUs under the following config according to your experience
"
torchrun --standalone --nproc_per_node=2 train.py --outdir=training-runs --name 1st_exp --data=datasets/cifar10-32x32.zip --batch 512 --cond=1 --arch=ncsnpp --aug_dim=2048
"
Thank you.

Guided sampling, and knowledge distillation questions!

This is great work!

Is there any theoretical basis of how a diffusion model could be adapted or distilled into a pfgm model? If I understand correctly, the diffusion model has already captured the data distribution, so could this theoretically be more efficient than training a PFGM from data? I'm particularly interested in DDIM inversion, null text inversion in diffusion models. PFGM++ seems to excel at inversion!

EDIT - I had various other questions but reading the paper again I think you covered them, in the one sample per condition part.

conditional PFGM๏ผŸ

Dear Sir,
Thanks for your great work and sharing the code.
This may be a stupid question but I wonder know if PFGM can implement a conditional version? We may encounter situations where we have to require control input in specific situation, such as generating a high-resolution image given a low-resolution image

What's the options when training on 256*256 dataset?

Hi, I just want to know what's the options when training on a 256*256 dataset?

Here is my options:

--outdir=C:\Users\ENeS\Desktop\outputs
--name=train_14
--data=C:\Users\ENeS\Desktop\Dataset\1\FFHQ_256\ffhq-train-256x256.zip
--cond=0
--arch=ddpmpp
--pfgmpp=1
--batch=1
--aug_dim=2048
--duration=50
--lr=1e-3
--tick=10
--dump=1
--cres=1,4,4,4
--lsun=True

I just use your dataset_tool.py to convert the ffhq dataset to 256*256 resolution, and use it as the dataset. Considering my GPU so I changed some training details to accelerate the training speed.

However, the generated images are not well (I would like to say they are painted by Picasso, but not by the AI, and even worse). I didn't finish the full training, but I used the converged checkpoint (It is also stranged that the loss can be down to 0.01).

Thank you very much!

It seems opts.small is not defined

It seems that opts.small in train.py Line124/Line129 is not defined and assigned a value. It will give the following error:

File "pfgmpp/train.py", line 124, in main
if opts.small:
File "pfgmpp/dnnlib/util.py", line 46, in getattr
raise AttributeError(name)
AttributeError: small
python-BaseException

A generate.py bug for loading trained model

Hi,

When I generate images by using my model, I found it will output nothing.

I have debugded the code and I found that you didn't pass the option 'network' into the 'main' function.

def main(ckpt, end_ckpt, outdir, subdirs, seeds, class_idx, max_batch_size, save_images, pfgmpp, aug_dim, edm, use_pickle, device=torch.device('cuda'), **sampler_kwargs):

And the 'main' function will search the model checkpoint in the 'outdir' path.
stats = glob.glob(os.path.join(outdir, "training-state-*.pkl"))

Now I put the model checkpoint into the 'outdir' folder, and it works.

However, it may be confused. Please check it, thank you.

Issues in pfgmpp_toy.ipynb

Y not defined before use:

train_dataset = Gaussain_data(X=X, Y=Y)

Is this ok?

# Integer labels 
Y = torch.cat([torch.zeros(size, dtype=torch.long), 
               torch.ones(size, dtype=torch.long)], dim=0)

Also, replace:

plt.savefig(title +'.png', bbox='tight', dpi=300)

with

plt.savefig(title +'.png', bbox_inches='tight', dpi=300)

whether the pfgmpp algorithm can handle temporal data in a similar manner as the ddpm algorithm

Dear Newbeeer,

I hope this letter finds you in good health and high spirits. I would like to express my utmost admiration for the incredible work you have done. It's truly remarkable!

I am particularly curious about whether the pfgmpp algorithm can handle temporal data in a similar manner as the ddpm algorithm, and produce satisfactory results.

Actually, I wanted to inquire if it would be possible to apply the pfgmpp algorithm in a conditional generation manner on trajectory data in the context of reinforcement learning. It would be fascinating to investigate its effectiveness in generating improved trajectories based on certain conditions.

Your expertise in this field is highly valued, and I would greatly appreciate your insights on both matters. Your guidance and responses are eagerly awaited.

Thank you in advance for your time and consideration.

Warm regards,
HQ

Question regarding `ema` in `training_loop`

Hello,

First of all, thank you for this amazing repo. the framework is quite powerful.

I've been reading your code, and have a questions. In the training_loop function in training/training_loop.py, what is the function of the variable ema? you can find it in line 107 when its initialized, as well as line 187 when you perform parameter update using the net's weight.

I dont see where its being used, other than being saved during checkpointing.

Michael

Problem about resume my own checkpoint and some strange loss

Hi, when I tried to resume my checkpoint, it always failed. However, when I resume your checkpoint, it was fine.

I configure the environment as you given, but I only use one 3090Ti to train the model. I use the following training config:
--outdir C:\Users\ENeS\Desktop\outputs --name train_2 --data C:\Users\ENeS\Desktop\Dataset\1\cifar10-32x32.zip --cond 0 --arch ddpmpp --pfgmpp 1 --batch 32 --aug_dim 128

I found this problem because of another problem. In tick 230, kimg=11503.7, the loss first keep normal at around 0.2, but it suddenly become very high (over 18, just one time) and then become around 1. The loss down to around 0.7 before tick 240, after that the loss become around 1 and never go down. So I exit it, try to resume the checkpoint, and I found I cannot resume.

The error message when I resume my checkpoint is:

Setting up optimizer...
In EDM loss: D:128, N:3072
Loading training state from "C:\Users\ENeS\Desktop\outputs\train_2\training-state-010003.pt"...
Traceback (most recent call last):
File "C:\Users\ENeS\Documents\GitHub\pfgmpp\train.py", line 255, in
main()
File "C:\Users\ENeS\anaconda3\envs\edm\lib\site-packages\click\core.py", line 1128, in call
return self.main(*args, **kwargs)
File "C:\Users\ENeS\anaconda3\envs\edm\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "C:\Users\ENeS\anaconda3\envs\edm\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\ENeS\anaconda3\envs\edm\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "C:\Users\ENeS\Documents\GitHub\pfgmpp\train.py", line 248, in main
training_loop.training_loop(**c)
File "C:\Users\ENeS\Documents\GitHub\pfgmpp\training\training_loop.py", line 132, in training_loop
misc.copy_params_and_buffers(src_module=data['net'], dst_module=net, require_all=True)
KeyError: 'net'

Process finished with exit code 1

Hope you can help me, thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.