yuanzhi-zhu / prolific_dreamer2d Goto Github PK

View Code? Open in Web Editor NEW

136.0 136.0 6.0 13.69 MB

Unofficial implementation of 2D ProlificDreamer

Python 92.18% Shell 6.41% Jupyter Notebook 1.42%

prolific_dreamer2d's People

Contributors

Stargazers

Watchers

Forkers

phymhan daveredrum hjwdzh 87003697 issue-forbiddened sharpiless

prolific_dreamer2d's Issues

Intuition behind `nerf_init`?

Hi there,

Thank you for providing a good reference code of prolific dreamers for the community. May I ask what the idea behind args.nerf_init is for sds latent-nerf?

    if args.nerf_init and args.rgb_as_latents and not args.use_mlp_particle:
        # current only support sds and experimental for only rgb_as_latents==True
        assert args.generation_mode == 'sds'
        with torch.no_grad():
            noise_pred = predict_noise0_diffuser(unet, particles, text_embeddings_vsd, t=999, guidance_scale=7.5, scheduler=scheduler)
        particles = scheduler.step(noise_pred, 999, particles).pred_original_sample

Could you please also give the command of running the sds?

DeepFloyd IF

Can you provide a version of DeepFloyd IF? Thank you very much!

Question: About the unet and unet_phi

In this implementation of VSD, the unet is the same as unet_phi. However, in threestudio's implementation, the unets are different.

According to the paper, shouldn't the unets be different? The phi model is initialized the same as unet, but after optimization, they should be quite different.

Question: About the calculation of x0

When save_x0=True, the following code is used to compute x0.

pred_latents = scheduler.step(noise_pred-noise_pred_phi+noise, t, noisy_latents).pred_original_sample.to(dtype).clone().detach()

May I ask the rationale behind this formula? To my understanding, to compute $\hat{x_0}$ via pretrained model, one should use:

pred_latents = scheduler.step(noise_pred, t, noisy_latents).pred_original_sample.to(dtype).clone().detach()

The code which I'm referring to is

prolific_dreamer2d/prolific_dreamer2d.py

Line 458 in e2ffc93

 pred_latents = scheduler.step(noise_pred-noise_pred_phi+noise, t, noisy_latents).pred_original_sample.to(dtype).clone().detach() 

What's the mathematical meaning of calculating noise_pred-noise_pred_phi+noise?

Thanks!

Strange artifacts after setting rgb_as_latents to false

Hi @yuanzhi-zhu,

thanks a lot for this awesome implementation of VSD!

I noticed that the training objective (i.e. “particle”) is a Gaussian white noise in latent space (see: https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/prolific_dreamer2d.py#L274), and the final image was decoded by VAE of the diffusion model. Comments suggest that pure Gaussian in image space will result in weird artifacts. Similar behavior was observed after I set rgb_as_latents to false (see the attached figure).

I'm wondering what could be the reason for this? Is this some trick from the original paper?

particle-based variational inference

Hello,

Thank you so much for your great work, it helps a lot in understanding the papers and concepts.

However, I am very much confused about the particle-based variational inference in Prolific Dreamer and hope to get some insights from your code, but I could not find any reference. Not sure if I understand it correctly, but it seems theta is represented by the "latent" at line227 ? . Does that mean particle-based variational inference is not included in the code piece?

Thanks again for the great work and I look forward to your reply :)

What is the version of diffusers?

The version of diffusers in my environment is 0.14.0, and there is not the package diffusers.models.attention_processor. I also look up the newest version of diffusers 0.17.0, there is not this package too.

Code question

First, thank you for the great implementation!

https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L242

Could you describe why the lora scale "0" is depends on lora_v?

Why use DDIM scheduler?

Hi, Thanks for uploading your work. I really enjoy this!

I have a question about your code.
In prolific_dreamer2d.py(https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/prolific_dreamer2d.py#L163), you use DDIM scheduler for scheduling.
Can I ask why you use DDIM scheduler, not DDPM or others?
Thanks!

What's the differences between VSD command line multiple particles and VSD command line?

thanks for your greate implementation!

Why the loss term is always 1.0 in VSD?

If I set the loss_weight_type to none, it seems that the loss is always 1.0, quite curious about this. After I look into the code, still don't know why.

Classifer-free guidance

Hi,

Thanks for your awesome implementation. I have one question regarding the CFG guidance. Dose this line should be

noise_pred = noise_pred_text + guidance_scale * (noise_pred_text - noise_pred_uncond)

what does 2d mean?

thanks for your nice work, but I want to ask what does 2d mean?

replacement of sds loss

I want to try dreamfusion's simple replacement of vsd loss, can I use the code in this project directly, because it is a 2d prolificdreamer, do I need to make any modifications? please give me some guidance