Giter VIP home page Giter VIP logo

k-diffusion's Introduction

k-diffusion

DOI

An implementation of Elucidating the Design Space of Diffusion-Based Generative Models (Karras et al., 2022) for PyTorch, with enhancements and additional features, such as improved sampling algorithms and transformer-based diffusion models.

Hourglass diffusion transformer

k-diffusion contains a new model type, image_transformer_v2, that uses ideas from Hourglass Transformer and DiT.

Requirements

To use the new model type you will need to install custom CUDA kernels:

  • NATTEN for the sparse (neighborhood) attention used at low levels of the hierarchy. There is a shifted window attention version of the model type which does not require a custom CUDA kernel, but it does not perform as well and is slower to train and inference.

  • FlashAttention-2 for global attention. It will fall back to plain PyTorch if it is not installed.

Also, you should make sure your PyTorch installation is capable of using torch.compile(). It will fall back to eager mode if torch.compile() is not available, but it will be slower and use more memory in training.

Usage

Demo

To train a 256x256 RGB model on Oxford Flowers without installing custom CUDA kernels, install Hugging Face Datasets:

pip install datasets

and run:

python train.py --config configs/config_oxford_flowers_shifted_window.json --name flowers_demo_001 --evaluate-n 0 --batch-size 32 --sample-n 36 --mixed-precision bf16

If you run out of memory, try adding --checkpointing or reducing the batch size. If you are using an older GPU (pre-Ampere), omit --mixed-precision bf16 to train in FP32. It is not recommended to train in FP16.

If you have NATTEN installed and working (preferred), you can train with neighborhood attention instead of shifted window attention by specifying --config configs/config_oxford_flowers.json.

Config file

In the "model" key of the config file:

  1. Set the "type" key to "image_transformer_v2".

  2. The base patch size is set by the "patch_size" key, like "patch_size": [4, 4].

  3. Model depth for each level of the hierarchy is specified by the "depths" config key, like "depths": [2, 2, 4]. This constructs a model with two transformer layers at the first level (4x4 patches), followed by two at the second level (8x8 patches), followed by four at the highest level (16x16 patches), followed by two more at the second level, followed by two more at the first level.

  4. Model width for each level of the hierarchy is specified by the "widths" config key, like "widths": [192, 384, 768]. The widths must be multiples of the attention head dimension.

  5. The self-attention mechanism for each level of the hierarchy is specified by the "self_attns" config key, like:

    "self_attns": [
        {"type": "neighborhood", "d_head": 64, "kernel_size": 7},
        {"type": "neighborhood", "d_head": 64, "kernel_size": 7},
        {"type": "global", "d_head": 64},
    ]

    If not specified, all levels of the hierarchy except for the highest use neighborhood attention with 64 dim heads and a 7x7 kernel. The highest level uses global attention with 64 dim heads. So the token count at every level but the highest can be very large.

  6. As a fallback if you or your users cannot use NATTEN, you can also train a model with shifted window attention at the low levels of the hierarchy. Shifted window attention does not perform as well as neighborhood attention and it is slower to train and inference, but it does not require custom CUDA kernels. Specify it like:

    "self_attns": [
        {"type": "shifted-window", "d_head": 64, "window_size": 8},
        {"type": "shifted-window", "d_head": 64, "window_size": 8},
        {"type": "global", "d_head": 64},
    ]

    The window size at each level must evenly divide the image size at that level. Models trained with one attention type must be fine-tuned to be used with a different type.

Inference

TODO: write this section

Installation

k-diffusion can be installed via PyPI (pip install k-diffusion) but it will not include training and inference scripts, only library code that others can depend on. To run the training and inference scripts, clone this repository and run pip install -e <path to repository>.

Training

To train models:

$ ./train.py --config CONFIG_FILE --name RUN_NAME

For instance, to train a model on MNIST:

$ ./train.py --config configs/config_mnist_transformer.json --name RUN_NAME

The configuration file allows you to specify the dataset type. Currently supported types are "imagefolder" (finds all images in that folder and its subfolders, recursively), "cifar10" (CIFAR-10), and "mnist" (MNIST). "huggingface" Hugging Face Datasets is also supported.

Multi-GPU and multi-node training is supported with Hugging Face Accelerate. You can configure Accelerate by running:

$ accelerate config

then running:

$ accelerate launch train.py --config CONFIG_FILE --name RUN_NAME

Enhancements/additional features

  • k-diffusion supports a highly efficient hierarchical transformer model type.

  • k-diffusion supports a soft version of Min-SNR loss weighting for improved training at high resolutions with less hyperparameters than the loss weighting used in Karras et al. (2022).

  • k-diffusion has wrappers for v-diffusion-pytorch, OpenAI diffusion, and CompVis diffusion models allowing them to be used with its samplers and ODE/SDE.

  • k-diffusion implements DPM-Solver, which produces higher quality samples at the same number of function evalutions as Karras Algorithm 2, as well as supporting adaptive step size control. DPM-Solver++(2S) and (2M) are implemented now too for improved quality with low numbers of steps.

  • k-diffusion supports CLIP guided sampling from unconditional diffusion models (see sample_clip_guided.py).

  • k-diffusion supports log likelihood calculation (not a variational lower bound) for native models and all wrapped models.

  • k-diffusion can calculate, during training, the FID and KID vs the training set.

  • k-diffusion can calculate, during training, the gradient noise scale (1 / SNR), from An Empirical Model of Large-Batch Training, https://arxiv.org/abs/1812.06162).

To do

  • Latent diffusion

k-diffusion's People

Contributors

crowsonkb avatar johnowhitaker avatar rom1504 avatar storyicon avatar tmabraham avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k-diffusion's Issues

Question of sampler

Hi, @crowsonkb

I'd like to ask you the roles of your implemented samples.
sample_heun represents a stochastic sampling and sample_lms represents a deterministic sampler, right?

Iโ€™m sorry if I misunderstand your implementation and Karras' theory.

FID scores from paper

Hello, in the original K-Diffusion paper the authors report FID scores for CIFAR in the low-single-digits range (eg 1.8). However, the FID scores from this repo all give in the high teens: like 27, 36, 48.

Is the difference due to the FID calculation strategy? It's hard to imagine that the FID is over an order of magnitude off...

Sample many files more efficiently

sample.py's implementation depends on K.evaluation.compute_features() and it generates n samples and return them in the memory at once. And it looks not efficient in terms of memoy usage. It should save a chunk of generate image files and free or reuse the memory instead before it generates and gathers all the samples in the memory.

Example for how to use the grow option?

I am trying to use the progressive growth option but I am getting an error when trying to use it as I think it is supposed to be used:

I have a trained 32x32 checkpoint which I am now trying to grow to a 64x64 one, so I am using the following arguments:

python3 train.py --config configs/config_64x64.json --name chkpt_64_1 --batch-size 100 --grow chkpt_32_2.pth --grow-config configs/config_32x32.json

The config_32x32.json is the default one from the repository, the config_64x64.json is using the additional layers and changed values as mentioned in #9:

#from 32x32:

 "model": {
        "type": "image_v1",
        "input_channels": 3,
        "input_size": [32, 32],
        "patch_size": 1,
        "mapping_out": 256,
        "depths": [2, 4, 4],
        "channels": [128, 256, 512],
        "self_attn_depths": [false, true, true],
        "dropout_rate": 0.05,
        "augment_prob": 0.12,
        "sigma_data": 0.5,
        "sigma_min": 1e-2,
        "sigma_max": 80,
        "sigma_sample_density": {
            "type": "lognormal",
            "mean": -1.2,
            "std": 1.2
        }
    },
    
#from 64x64:

"model": {
        "type": "image_v1",
        "input_channels": 3,
        "input_size": [64, 64],
        "patch_size": 1,
        "mapping_out": 256,
        "depths": [2, 2, 4, 4],
        "channels": [128, 256, 256, 512],
        "self_attn_depths": [false, false, true, true],
        "dropout_rate": 0.05,
        "augment_prob": 0.12,
        "sigma_data": 0.5,
        "sigma_min": 1e-2,
        "sigma_max": 80,
        "sigma_sample_density": {
            "type": "lognormal",
            "mean": -1.2,
            "std": 1.2
        }
    },

But when trying to run train.py I am getting a whole lot of "key missing" and "size mismatch" errors in
inner_model.load_state_dict(old_inner_model.state_dict())
Missing key(s) in state_dict: "inner_model.u_net.d_blocks.1.2.main.0.mapper.weight", "inner_model.u_net.d_blocks.1.2.main.0.mapper.bias", "inner_model.u_net.d_blocks.1.2.main.2.weight", "inner_model.u_net.d_blocks.1.2.main.2.bias", "inn....

So I am wondering whether I am doing something wrong here or if this just one of those "work in progress" issues.

Is suspect that I might rather have to do something that involves patch_size and skip_stages since those are used in the wrapper, but I have no idea what their function is.

DPM2 ancestral produce odd nosiy/sharpened output during final iteration when using with Stable Difussion

Hi, I'm cross-posting this issue from AUTOMATIC1111/stable-diffusion-webui#1435 since it seems like may be more relevant here.

Basically, when using DPM2a with the above Stable Diffusion UI, the output becomes "noisy" or like it had some kind of over sharpening filter applied to it during the final iteration. If you interrupt the image generation before the final pass, the output looks more or less normal.

Here's an example taken from the above issue:
broken
fixed

The top one is the default behavior, the bottom one is one I created by modifying sampler.py to skip the final DPM2 pass (effectively the same result you get when you interrupt the image generation).

Other people in the above issue have said there are similar issues with other samplers as well, but personally I've only confirmed it while using DPM2a.

Config for training other resolutions

Hello and thanks for the implementation of the paper!
I ran the code with the current config and it seems to do very good, how would one go about training a model with images of size 64x64 or 128x128?

Thanks,
Eliahu

Pretrained models of Hourglass Diffusion Transformers

Hourglass Diffusion Transformers (HDiT) are amazing for generating high-resolution images in pixel-space. Iโ€™m looking forward to the pretrained model release. Will the pretrained model be available soon?

UnboundLocalError: local variable 'h' referenced before assignment

Automatic1111 startup options
--port 7800 --xformers --api --disable-safe-unpickle --skip-install

When using Sampler :
DPM++ 2M SDE
DPM++ 2M SDE Karras

Script X/Y/Z Plot with X defined as
Steps:1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,55,60,65,70,75,80,85,90,95,100

I get this error:

*** Error completing request | 1/2070 [00:02<1:14:01, 2.15s/it]
*** Arguments: ('task(psiv3pussxe3h92)', 'masterpiece, realistic, portrait of a girl, medieval armor, upper body, outdoors, far away castle, metal reflections, intense sunlight, dramatic, cinematic lighting, octane render, unreal engine', 'frame, bad anatomy, hand, hands', [], 20, 9, False, False, 1, 1, 7, 1587979927.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.25, 2, 'A-ArtStation1337-4x-v2', 5, 0, 0, 0, '', '', [], 3, 0, 0, 0, 0, 0.25, False, True, False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x00000231F51E19F0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x00000231F51E31F0>, None, False, '0', 'D:\Apps\sd\sd-webui\repo\models\roop\inswapper_128.onnx', 'CodeFormer', 1, '', 1, 1, False, True, False, False, '', False, False, 'positive', 'comma', 0, 4, '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,46,37,38,49,40,41,42,43,44,45,46,47,48,49,50,55,60,65,70,75,80,85,90,95,100', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, None, False, None, None, False, 50) {}
Traceback (most recent call last):
File "D:\Apps\sd\sd-webui\repo\modules\call_queue.py", line 55, in f
res = list(func(*args, **kwargs))
File "D:\Apps\sd\sd-webui\repo\modules\call_queue.py", line 35, in f
res = func(*args, **kwargs)
File "D:\Apps\sd\sd-webui\repo\modules\txt2img.py", line 54, in txt2img
processed = modules.scripts.scripts_txt2img.run(p, *args)
File "D:\Apps\sd\sd-webui\repo\modules\scripts.py", line 456, in run
processed = script.run(p, *script_args)
File "D:\Apps\sd\sd-webui\repo\scripts\too_std-xyz.py", line 678, in run
processed = draw_xyz_grid(
File "D:\Apps\sd\sd-webui\repo\scripts\too_std-xyz.py", line 317, in draw_xyz_grid
process_cell(x, y, z, ix, iy, iz)
File "D:\Apps\sd\sd-webui\repo\scripts\too_std-xyz.py", line 260, in process_cell
processed: Processed = cell(x, y, z, ix, iy, iz)
File "D:\Apps\sd\sd-webui\repo\scripts\too_std-xyz.py", line 641, in cell
res = process_images(pc)
File "D:\Apps\sd\sd-webui\repo\modules\processing.py", line 620, in process_images
res = process_images_inner(p)
File "D:\Apps\sd\sd-webui\repo\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
File "D:\Apps\sd\sd-webui\repo\modules\processing.py", line 739, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "D:\Apps\sd\sd-webui\repo\modules\processing.py", line 992, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "D:\Apps\sd\sd-webui\repo\modules\sd_samplers_kdiffusion.py", line 439, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "D:\Apps\sd\sd-webui\repo\modules\sd_samplers_kdiffusion.py", line 278, in launch_sampling
return func()
File "D:\Apps\sd\sd-webui\repo\modules\sd_samplers_kdiffusion.py", line 439, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "D:\Apps\sd\sd-webui\repo\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Apps\sd\sd-webui\repo\repositories\k-diffusion\k_diffusion\sampling.py", line 650, in sample_dpmpp_2m_sde
h_last = h
UnboundLocalError: local variable 'h' referenced before assignment

I fixed it adding h = None in
\k-diffusion\k_diffusion\sampling.py
line 623

old_denoised = None
h = None
h_last = None

Feature Request: Support the "correct" SDE variant of DPM-Solver++

Hi crowsonkb, recently I'm studying the SDE solvers and I find that the current implementation of "DPM++2M SDE" is not exactly the solver of the reverse diffusion SDE.

I've implemented the correct version of SDE-variant DPM++2M, and its performance is quite amazing in DeepFloyd-IF. Could you please check this PR in huggingface diffusers: huggingface/diffusers#3344 and implement it in your own repo? Thank you so much!

Stabilize the sampling of DPM-Solver++2M by a stabilizing trick

Hi Katherine,

Thank you for your great work on supporting DPM-Solver++, and I've found that it has been used in stable-diffusion-webui and has a great performance: AUTOMATIC1111/stable-diffusion-webui#4304. Thank you for your contribution again!

However, the sampling by DPM-Solver++2M with steps <= 10 often suffers from instability issues (the image quality is much worse than DDIM). In my recent experience, I found that it is due to the non-Lipschitzness near t=0.
(In fact, the score function has numerical issues for t near 0, and it has been revealed in many previous papers, such as CLD and SoftTruncation. )

Therefore, in my recent PR to diffusers, I further added a new "stabilizing" trick to reduce such instability by using lower-order solvers at the final steps (e.g., for 2nd-order DPM-Solver++, I used DPM-Solver++2M at the first N-1 steps and DDIM at the final step.) I find it can greatly stabilize the sampling by DPM-Solver++2M. Please check this PR for details:
huggingface/diffusers#1132

Excuse me for my frequent issues, but could you please further support this "stabilizing" trick in k-diffusion, so that other projects, such as stable-diffusion-webui can further support it? Thank you very much!

CUBLAS_STATUS_ALLOC_FAILED error

I'm getting an error from the text transformer component when trying to run the CLIP guided generation. Any ideas as to how I might approach debugging here?

CUDA_LAUNCH_BLOCKING=1 python sample_clip_guided.py "depth map image of intricate floral pattern" --config configs/config_256x256_depth.json --checkpoint depth_00100000.pth --batch-size 1 --clip-model RN50x4 2> error.txt
  File "/home/kevin/src/k-diffusion/sample_clip_guided.py", line 133, in <module>
    main()
  File "/home/kevin/src/k-diffusion/sample_clip_guided.py", line 99, in main
    target_embed = F.normalize(clip_model.encode_text(clip.tokenize(args.prompt, truncate=True).to(device)).float())
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/clip/model.py", line 348, in encode_text
    x = self.transformer(x)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/clip/model.py", line 203, in forward
    return self.resblocks(x)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/clip/model.py", line 190, in forward
    x = x + self.attention(self.ln_1(x))
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/clip/model.py", line 187, in attention
    return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1153, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/functional.py", line 5066, in multi_head_attention_forward
    q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias)
  File "/home/kevin/anaconda3/envs/k-diffusion/lib/python3.10/site-packages/torch/nn/functional.py", line 4745, in _in_projection_packed
    return linear(q, w, b).chunk(3, dim=-1)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)````

Req: TQDM trange() update frequency / miniters option

I'm having a problem in Colab where the cell won't be responsive and can't be stopped if left running for over 10 minutes, and because I can reload the page and suddenly see a lot of new images at once I think there's a backlog of cell outputs that weren't displayed yet and because they weren't displayed yet I couldn't stop the cell?

The only possible solution to this I can find is the miniters argument for TQDM, where a value above zero will skip that many iterations between updates to the output progress bar.

I don't know of a good way to implement this option that wouldn't just stick it in the sampler arguments, but maybe that would be fine?

Seek for help about reconstructing the denoiser and sample function

Hello, I am trying to use the sampler with a custom openAI model, so I reconstruct the Denoiser and sample function as below, while it seems to lead to a wrong output, such as a nearly total-yellow image after decode. So I wonder whether there is something wrong about the usage.
Can you have a look at my code if you are free? Thank you.

Below is my code.
input below are all means a dict which containing x and timesteps

class NewOpenAIDenoiser(OpenAIDenoiser):
    def __init__(self, model, diffusion, quantize=False, has_learned_sigmas=True, device='cpu'):
        super().__init__(model, diffusion, quantize, has_learned_sigmas, device)
    
    def forward(self, input, sigma, **kwargs):
        c_out, c_in = [k_diffusion.utils.append_dims(x, input['x'].ndim) for x in self.get_scalings(sigma)]
        temp_input = input
        temp_input['x'] = input['x'] * c_in
        temp_input["timesteps"] = self.sigma_to_t(sigma)
        eps = self.get_eps(temp_input, **kwargs)
        return input['x'] + eps * c_out
    
    def get_eps(self, *args, **kwargs):
        model_output = self.inner_model(*args, **kwargs)
        if self.has_learned_sigmas:
            return model_output.chunk(2, dim=1)[0]
        return model_output

class KDiffusionSampler(object):
    def __init__(self, funcname, diffusion, model) -> None:
        super().__init__()
        denoiser = NewOpenAIDenoiser
        
        self.diffusion = diffusion
        self.device = diffusion.betas.device
        self.model_wrap = denoiser(model, diffusion, device=self.device, has_learned_sigmas=False)
        self.funcname = funcname
        self.func = getattr(self, funcname)
        self.extra_params = sampler_extra_params.get(funcname, [])
        
        self.sampler_noises = None
        self.eta = None
        self.last_latent = None
        
        self.config = None

        self.total_steps = 0

  def launch_sampling(self, steps, func):
        self.total_steps = steps
        
        return func()

  def initialize(self):
        self.eta = 1.
        
        extra_params_kwargs = {}
                
        if 'eta' in inspect.signature(self.func).parameters:
            extra_params_kwargs['eta'] = self.eta
            
        return extra_params_kwargs
    
    def get_sigmas(self, steps):
        discard_next_to_last_sigma = self.config is not None and self.config.get('discard_next_to_last_sigma', False)
        
        steps += 1 if discard_next_to_last_sigma else 0
        
        sigmas = self.model_wrap.get_sigmas(steps)
            
        if discard_next_to_last_sigma:
            sigmas = torch.cat([sigmas[:-2], sigmas[-1:]], dim=0)
        
        return sigmas

    def sample(self, steps, shape, input):
        
        h = input['x']
        if h == None:     
            h = torch.randn(shape, device=self.device)
        steps = steps
        
        sigmas = self.get_sigmas(steps)
        
        h = h * sigmas[0]
        input['x'] = h
        
        extra_params_kwargs = self.initialize()
        parameters = inspect.signature(self.func).parameters
        
        if 'sigma_min' in parameters:
            extra_params_kwargs['sigma_min'] = self.model_wrap.sigmas[0].item()
            extra_params_kwargs['sigma_max'] = self.model_wrap.sigmas[-1].item()
            if 'n' in parameters:
                extra_params_kwargs['n'] = steps
        else:
            extra_params_kwargs['sigmas'] = sigmas
        
        self.last_latent = h
        samples = self.launch_sampling(
            steps, 
            lambda: self.func(
                self.model_wrap,
                input,
                **extra_params_kwargs
            )
        )
        
        return samples

  @torch.no_grad()
    def sample_euler(self, model, input, sigmas, extra_args=None, callback=None, disable=None, s_churn=0., s_tmin=0., s_tmax=float('inf'), s_noise=1.):
        """Implements Algorithm 2 (Euler steps) from Karras et al. (2022)."""
        x = input['x']
        extra_args = {} if extra_args is None else extra_args
        s_in = x.new_ones([x.shape[0]])
        for i in trange(len(sigmas) - 1, disable=disable):
            gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
            eps = torch.randn_like(x) * s_noise
            sigma_hat = sigmas[i] * (gamma + 1)
            if gamma > 0:
                x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
            input['x'] = x
            denoised = model(input, sigma_hat * s_in, **extra_args)
            d = to_d(x, sigma_hat, denoised)
            if callback is not None:
                callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigma_hat, 'denoised': denoised})
            dt = sigmas[i + 1] - sigma_hat
            # Euler method
            x = x + d * dt
            with open("./record_{}.txt".format(i), "w") as file:
                for i in range(x.shape[0]):
                    print(x[i], file=file)
                file.close()
        return x

White line augmentation artifact

Thank you for open sourcing this! I tried out your implementation of the non-leaky augmentations. In case it's helpful to you I noticed that there seem to be some artifacts created in the augmentation pipeline that will probably not help in training

Left is normal. Right has an added white line
image

(I have a different implementation that I built a couple of weeks ago but I didn't do the non-leaky augmentations then. For what it's worth I can say that I've also gotten better results with these techniques than for example v-diff on small real world datasets)

Loading kdiffy.py results in this error

[2022-08-23 02:19:51] scripts/kdiffy.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: Error when calling Cognitive Face API:
status_code: 401
code: 401
message: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

[2022-08-23 02:19:51] scripts/kdiffy.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: img_url:https://raw.githubusercontent.com/Microsoft/Cognitive-Face-Windows/master/Data/detection1.jpg
[2022-08-23 02:19:52] scripts/kdiffy.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: Error when calling Cognitive Face API:
status_code: 401
code: 401
message: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

[2022-08-23 02:19:52] scripts/kdiffy.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: img_url:/data1/mingmingzhao/label/data_sets_teacher_1w/47017613_1510574400_out-video-jzc70f41fa6f7145b4b66738f81f082b65_f_1510574403268_t_1510575931221.flv_0001.jpg
[]
Traceback (most recent call last):
File "scripts/kdiffy.py", line 27, in
from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm.util'; 'ldm' is not a package

Cache FID computation

Computing Inception takes up to half a minute at the start of the program as of now, so caching it seems obvious.

The only thing I'm not sure about is what to use as key - probably hash for HF, path + modification date for image folder and a plain string for MNIST/CIFAR?

Stabilizing DPM++2M SDE for SDXL

Hi @crowsonkb , long time no see! I'm opening this issue to discuss the potential improvement for sampling methods with SDXL.

As I listed in #43, SDXL with DPM++2M will have apparent artifacts due to the numerical instability, especially for SDE solvers.

One possible way is to let the final step be the first-order solver, e.g., sampling with 5 steps will be [1,2,2,2,1] orders instead of [1,2,2,2,2] orders, as discussed in #43 and I also list more examples in huggingface/diffusers#5541 .

Another possible way is to change the step size scheduler. For example, your implemented Karra's step size scheduler is the most widely-used step size scheduler in the community, and it can significantly improve the sample quality. Recently I find that Karra's step size with $\rho=7$ is much related to my "uniform logSNR" scheduler, which is proposed in the original paper of DPM-Solver.

Specifically, note that the definition of "Karras sigmas" is equivalent to $\alpha_t / \sigma_t = \exp(\lambda_t)$, so the "log sigmas" in Karras' setting is just $\lambda_t$. Moreover, as Karras uses an exponential splitting for sigmas with a hyperparameter
, we can prove that when $\rho$ goes to infinity, the step sizes are equivalent to uniform $\lambda_t$, because of the definition of the exponential function, $\exp(x) = \lim_{\rho \rightarrow \infty} (1 + \frac{x}{\rho})^{\rho}$. As $\rho=7$ is already quite large, the samples by Karras sigmas and my uniform lambdas are similar when using ODE solvers, and both can reduce the discretization errors.

However, for SDE solvers, Karra's step size and my uniform logSNR step size are quite different, due to the Gaussian noise during the trajectory. For example, here is an example for a cat, DPM++2M SDE, steps=25, with SDXL (no refiner):

image

I think the uniform logSNR step size is quite interesting and it can also provide beautiful samples, so it may bring new insights to the community. Could you please also integrate this step size scheduler in your k-diffusion?

The code is quite simple, for example: huggingface/diffusers@892fec9

sample_dpmpp_2m has a bug?

Hi,

I've been playing around with the sample_dpmpp_2m sampling and found that swapping one variable changes/fixes blur. I don't know the math formula for this, so I might be wrong. But I think there might be a bug in the code?
Let me know what you think. And if you want me to create a PR for it.

Here are my results

[Feature request] Let user provide his own randn data for samplers in sampling.py

Please add an option for samplers to accept an argument with random data and use that if it is provided.

The reason for this is as follows.

We use samplers in stable diffusion to generate pictures, and we use seeds to make it possible for other users to reproduce results.

In a batch of one image, everything works perfectly: set seed beforehand, generate noise, run sampler, and get the image everyone else will be able to get.

If the user produces a batch of multiple images (which is desirable because it works faster than multiple independent batches), the expectation is that each image will have its own seed and will be reproducible individually outside of the batch. I achieve that for DDIM and PLMS samplers from stable diffusion by preparing the correct random noise according to seeds beforehand, and since those samplers do not have randomness in them, it works well.

Samplers here use torch.randn in a loop, so samples in a batch will get different random data than samples produced individually, which results in different output.

An example of what I want to have:

from

def sample_euler_ancestral(model, x, sigmas, extra_args=None, callback=None, disable=None):
    """Ancestral sampling with Euler method steps."""
    extra_args = {} if extra_args is None else extra_args
    s_in = x.new_ones([x.shape[0]])
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
        sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1])
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
        d = to_d(x, sigmas[i], denoised)
        # Euler method
        dt = sigma_down - sigmas[i]
        x = x + d * dt
        x = x + torch.randn_like(x) * sigma_up
    return x

to

def sample_euler_ancestral(model, x, sigmas, extra_args=None, callback=None, disable=None, user_random_data=None):
    """Ancestral sampling with Euler method steps."""
    extra_args = {} if extra_args is None else extra_args
    s_in = x.new_ones([x.shape[0]])
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
        sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1])
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
        d = to_d(x, sigmas[i], denoised)
        # Euler method
        dt = sigma_down - sigmas[i]
        x = x + d * dt
        x = x + (torch.randn_like(x) if user_random_data is None else user_random_data[i]) * sigma_up
    return x

(difference only in next-to-last line)

[Feature request] Combination of Inpainting and ImgToImg Generation for finetunig

Combination of InPainting and ImgToImg Generation for fine tuning:

You would go to InPainting
select an option to upload an second layer
you can cut,trim,rotate,flip,make transparent with an strength slider and scale the second layer
than you place the second layer where you want over the first layer(picture u use for imageToimage generation.
So you can add good generations of details like (faces,hands,eyes,armor patterns and so on) so small parts of an picture to the picture you want to use for generating an totally new one.
Nor with the mask option you also only mask the positions of the second layer so the generating would fit in the new second layer on the first without destroying the first layer you like too.

This process would save time and energy to get what you had in mind when you type the prompt.

k-diffusion triggering of torch.compile/torch.multiprocessing leaves multiple child processes

torch.compile triggered here

if not flags.get_use_compile():
raise RuntimeError
geglu = torch.compile(_geglu)
rms_norm = torch.compile(_rms_norm)

has a very bad side-effect of triggering torch.multiprocessing since it executes on cpu.
as a result, torch will start cpu cores number of child processes (on my system its 32 child python processes).

  • best-case: clean exit from a parent app using ctrl+c is no longer possible as KeyboardInterrupt triggers
    a massive traceback (over 200 lines).
  • worst-case: they do not exit do not exit and become defunct. in that case, they also do not release gpu resources, so hard-reboot is needed. yes, torch.multiprocessing actually has a warning in their docs that this is a possible scenario.

traceback looks like:

Process ForkProcess-2:
Process ForkProcess-3:
Process ForkProcess-8:
Process ForkProcess-6:
Process ForkProcess-1:
...
KeyboardInterrupt
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^

simply setting K_DIFFUSION_USE_COMPILE=0 env variable disables compile and issue is gone.
but default behavior is more than suspect - i suggest to revisit this.

FID increases with training

I just run train_cifar.py on a single GPU and found that the FID increased from 29.68 (50k steps) to 45.16 (260k steps). Is there any idea for this?

Fine-tuning/compatible pre-trained models

Is there currently a way to fine-tune an existing model with k-diffusion? And/or are there any existing large pre-trained models that would be compatible with the patch feature?

sample sigma scheduler bug

I found that k_diffusion.sampling.get_sigmas_karras returns a list endswith 0,
it leads to last dt (sigma[i+1]-sigma[i]) maybe bigger than others,
for example:

dt: [-1.3240442276000977, 
-1.1708478927612305, 
-1.0327138900756836, 
-0.9084315299987793, 
-0.7968769073486328, 
-0.6969814300537109, 
-0.6077454090118408, 
-0.5282387733459473, 
-0.4575979709625244, 
-0.39501309394836426, 
-0.3397289514541626, 
-0.2910478115081787, 
-0.24832558631896973, 
-0.21096330881118774, 
-0.17840701341629028, 
-0.1501503586769104, 
-0.12572622299194336, 
-0.10470682382583618, 
-0.08670148253440857, 
-0.07135394215583801, 
-0.05834062397480011, 
-0.04736843705177307, 
-0.03817273676395416, 
-0.030515573918819427,
-0.10000000149011612 **********
]

I think it may be a bug.

This issues is found at AUTOMATIC1111/stable-diffusion-webui#2794

Add the newest DPM-Solver

Thanks for your amazing work and interest in DPM-Solver! We have updated DPM-Solver v2.0, which supports four types of diffusion models: noise prediction model, data prediction model, v-prediction model, and score function.

Moreover, we supported both single-step and multi-step versions and the corresponding algorithms for the exponential integrators for both the noise prediction model and the data prediction model.

I'm glad to help if you want to further support our DPM-Solver in this repo :)

UnboundLocalError: local variable 'h' referenced before assignment

A exception occurs when using sd-webui to generate a image.
This is the stack:

dump json failed,data:{'file': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>}
    Traceback (most recent call last):
      File "/app/modules/api/api.py", line 204, in exception_handling
        return await call_next(request)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
        raise app_exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in __call__
        response = await self.dispatch_func(request, call_next)
      File "/app/modules/api/api.py", line 151, in log_and_time
        res: Response = await call_next(req)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
        raise app_exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in __call__
        response = await self.dispatch_func(request, call_next)
      File "/app/modules/he_ware/run_mode_middle_ware.py", line 28, in dispatch
        response = await call_next(request)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
        raise app_exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in __call__
        response = await self.dispatch_func(request, call_next)
      File "/app/modules/he_ware/progress_res_middle_ware.py", line 24, in dispatch
        response = await call_next(request)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
        raise app_exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
dump json failed,data:{'file': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>}
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in __call__
        response = await self.dispatch_func(request, call_next)
      File "/app/modules/he_ware/task_state_middle_ware.py", line 52, in dispatch
        response = await call_next(request)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
        raise app_exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in __call__
        response = await self.dispatch_func(request, call_next)
      File "/app/venv/lib/python3.10/site-packages/starlette_prometheus/middleware.py", line 57, in dispatch
        raise e from None
      File "/app/venv/lib/python3.10/site-packages/starlette_prometheus/middleware.py", line 53, in dispatch
        response = await call_next(request)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
        raise app_exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 84, in __call__
        await self.app(scope, receive, send)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 24, in __call__
        await responder(scope, receive, send)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 44, in __call__
        await self.app(scope, receive, self.send_with_gzip)
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
        raise exc
      File "/app/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
        await self.app(scope, receive, sender)
      File "/app/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
        raise e
      File "/app/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
        await self.app(scope, receive, send)
      File "/app/venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
        await route.handle(scope, receive, send)
      File "/app/venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
        await self.app(scope, receive, send)
      File "/app/venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
        response = await func(request)
      File "/app/venv/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
        raw_response = await run_endpoint_function(
      File "/app/venv/lib/python3.10/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
        return await run_in_threadpool(dependant.call, **values)
      File "/app/venv/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
        return await anyio.to_thread.run_sync(func, *args)
      File "/app/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
        return await get_asynclib().run_sync_in_worker_thread(
      File "/app/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
        return await future
      File "/app/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
        result = context.run(func, *args)
      File "/app/modules/api/api.py", line 457, in img2imgapi
        processed = process_images(p)
      File "/app/modules/processing.py", line 733, in process_images
        res = process_images_inner(p)
      File "/app/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "/app/modules/processing.py", line 868, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "/app/modules/processing.py", line 1529, in sample
        samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
      File "/app/modules/sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/app/modules/sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "/app/modules/sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "/app/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/app/repositories/k-diffusion/k_diffusion/sampling.py", line 651, in sample_dpmpp_2m_sde
        h_last = h
    UnboundLocalError: local variable 'h' referenced before assignment
---

This is the code link:


Change the code like this can avoid the exception, but I'm not sure whether the logic is correct.
img_v3_025f_18961979-25dc-45ff-bf5d-dfd06266caag
The request parameters are sent as the files
req.json

How to cite this?

Dear k-diffusion Maintainers,

I hope this message finds you well. I am currently working on an academic paper that references some of your samplers as a resource for our research. I would like to ensure that we properly attribute your project in our paper, as it has been useful for our work.

In accordance with academic standards, I kindly request information on how you would prefer to be cited in our paper. Specifically, I am seeking guidance on the following details:

The recommended citation format for the k-diffusion project.
Any specific information or details you would like us to include in the citation.
Please provide this information at your earliest convenience so that we can appropriately acknowledge your project in our research. We appreciate the availability of k-diffusion and want to ensure that your project is appropriately recognized in our paper.

If you have any additional requests or specific formatting preferences for the citation, please do not hesitate to let us know. We are committed to giving proper credit to your project and following your guidelines.

Thank you for your time and for maintaining the K-diffusion project. We look forward to your response.

Conditional *image* generation (img2img)

Hi,

In order to add support for conditional image generation, in addition to the initial image embedding into unet_cond,
(extra_args['unet_cond'] = img_cond) what should I put in extra_args['cross_cond'] and extra_args['cross_cond_padding'] ?

(before the loss calculation in the line: losses = model.loss(reals, noise, sigma, aug_cond=aug_cond, **extra_args))

@crowsonkb
@nekoshadow1
@brycedrennan

Thanks !

Both the FID scores and visualization are pretty bad in FFHQ dataset

Hi authors:

Thanks for your work.

The original K-Diffusion paper report FID scores for FFHQ dataset (64x64) is about 2, but when we use your code repo, the FID is over 60 and the visualization results are very bad after 100 epoch with 256 batch size training. What do you think?
image

Best,
Tutu Iroh Zhao

Accelerate is not working fine

I think there is a bug using "accelerate launch" for launching train.py with acceleration. If you do not use "accelerate" FID is close to 2 points, but if you use "accelerate" FID reports very high values.... Please, could you fix it?

LMS sampler question

Hi Katherine,

Wondering if something is wrong with the LMS sampler. We've had several issues open for quite a long time in the stable-diffusion-webui repo but were never really investigated until recently. DPM2 was resolved a long while back but LMS was not.

DPM2: #43, AUTOMATIC1111/stable-diffusion-webui#5797
LMS: AUTOMATIC1111/stable-diffusion-webui#1973, AUTOMATIC1111/stable-diffusion-webui#7244

Below is an example of a result produced currently by LMS using the default scheduler in the webui. PNG info is embedded in the image to reference if needed.

lms1

I had attempted to make a PR in AUTOMATIC1111/stable-diffusion-webui#12349 to resolve this. The way I did this was by discarding the penultimate sigma and to also use the penultimate latent (what the last callback would return for the last step). A naive approach I'm sure, but this is that result:

lms2

@AUTOMATIC1111 reviewed this PR and had suggested the issue stems from the last step combining the previous 4 steps, and suggested a fix.

AUTOMATIC1111/stable-diffusion-webui#12349 (comment)

After going into that function with a debugger, it looks like it combines denoised image from 4 last sampling steps to produce the next denoised image. cur_order variable determines how many previous steps to combine - it starts at 1, and grows to 4, and stays at 4. The last step where it combines four last denoised images into the final result produces those artifacts. I found that making cur_order go back to 1 at the end of sampling instead of staying at 4 improves the result dramatically. The relevant change is from cur_order = min(i + 1, order) to cur_order = min(i + 1, order, len(sigmas) - i - 1).

When instead only applying that change, I get this result:

lms3

Both of these modifications are clearly better than how the current code behaves but neither of us are sure what is correct here. Hoping you could provide some explanation on whether this is expected or not and how it should be properly resolved.

(Missing) Loss weights for the Diffusion Loss?

Hi,

thank you for the great work! I have a general question regarding the loss weighting for the k-diffusion variant after reading the related paper from Karras etl al., (2022):

The loss for training the diffusion model does not have any additional scaling https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/layers.py#L31, while there exist additional Loss scaling in the k-diffusion model, which is described in Table 1 of the paper:
$$\lambda(\sigma) = ( - \sigma^{2} + \sigma_{\text{data}} ) / (\sigma * \sigma_{\text{data}})^2$$

Did I miss it somewhere else in the code or is there a reason for not using it?
Thanks!

Edit:

My mistake, it is actually there: The general loss of the diffusion model is defined in Eq. (2) in the paper:
$$E_{y \sim p_{\text{data}}}E_{n \sim \mathbb{N}(0, \sigma^2 \mathbf{I})}\lvert \lvert \rvert D(\mathbf{y} + \mathbf{n} -\mathbf{y}) \rvert_{2}$$
The loss computation of https://github.com/crowsonkb/k-diffusion/blob/master/k_diffusion/layers.py#L31 does not compute $$D_{\theta}(x, \sigma) = c_{\text{skip}} (\sigma) \mathbf{x} + c_{\text{out}} (\sigma) F_{\theta}(c_{in}(\sigma)x; c_{\text{noise}})$$ and uses the inner network output instead: $$F_{\theta}(c_{in}(\sigma)x; c_{\text{noise}}))$$ By using this version we can compute the loss, where the weighting term is cancelled out see Eq. (8) from the paper:
$$\mathbb{E}_{\sigma, \mathbf{y}, \mathbf{n}}[ \lambda (\sigma) a ]$$ with:

$$a = c_{\text{out}}(\sigma)^2 \lvert \lvert F_{\theta}(c_{\text{in}}(\sigma)(\mathbf{y} + \mathbf{n}); c_{\text{noise}} - \frac{1}{c_{\text{out}}}(\mathbf{y} -c_{\text{skip}}(\sigma)(\mathbf{y} + \mathbf{n} )) \rvert \rvert^2_{2}$$
So no scaling is needed in the code.

Standard FID

I created a nicer wrapper around cleanfid that works like your implementation with the intention of creating a PR for this repo but it's not working with your accelerator/multiprocessing.

I'm considering if I should try to use your code and switch the model to what they use in cleanfid to try and reproduce the results but before I spend more time on this I thought I should ask if you think it's possible to reproduce or if I will run into issues?

Your implemention looks a lot nicer than cleanfid so I expect it will be easy to work with at least

Image inpainting

Hi,

how useful would a script for inpainting and image modification with a pre-trained model be at this stage? Do K-models need some modification to use the basic DDIM inpainting setup?

How to perform a forward process from x_0 to x_t

Hi, @crowsonkb !

I confuse a forward process from x_0 to x_t. Would you teach me?
I'd like to implement conditional augmentation in Imagen paper for a super-resolution.
It perturbs x_0 and obtains x_t and t (in your implementation, I guess t means sigma).

I know that your implementation uses randomly sigma here and create a noisy x_t samples here.
noised_input = c_in * (input + noise * utils.append_dims(sigma, input.ndim)) as shown in Eq. 7
Then, does noised_input means x_t images, which are created by a Karras' forward diffusion process, right?

Is it possible to add any arbitrary loss terms into k-diffusion, e.g. to use lpips, edge stabilization, etc.

I used PyTTI a while back and it was easy to guide the animation into exhibiting various desired properties. If we wanted to keep the composition more stable between each frame, we could convolve the last frame and implement a loss which would attempt to preserve these edges. In Disco Diffusion, lpips is used to keep the image perceptually similar between frames and decrease flickering.

With k-diffusion, I can't figure out how to do this! I thought I might be able to do it similar to CFGDenoiser, but no dice. I think these things must be possible because you were able to implement CLIP guidance which as I understand would be a similar challenge, but the way it's implemented looks completely different from PyTTI!

Naturally my skills in DL are very surface level and I can only implement new things by reverse engineering other similar features, unfortunately I still can't make sense of the way CLIP guidance was implemented here to relate it with my goals. It looks like it has to be written as part of the sampling but it's still a mystery how this sampling process works. Any hints to push me into the right direction?

If laymen like me could better understand how to guide the generation process to any desired properties, that would be huge for AI animation! Cheers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.