ailab-cvc / videocrafter Goto Github PK

View Code? Open in Web Editor NEW

4.2K 68.0 314.0 175.67 MB

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Home Page: https://ailab-cvc.github.io/videocrafter2/

License: Other

Python 99.56% Shell 0.44%

image-to-video text-to-video video-generation

videocrafter's People

Contributors

Stargazers

Watchers

Forkers

stanleyjacob johndpope keyboardcartel camenduru syntaxslinger wuranium asmedeus998 nadialuo kustomzone feiward hanzhen mran genecyber traviscooper wonglynn sean810720 gailwang cxq80803716 juangon eltociear jinzaizhichi amirthalingamiyyappan deniska83 ramora83 markravenlacap12 cellinlab russ168 techthiyanes deeplearning-cn jacobsikombe hangeramber prakash-rokade techventurebuilder paperwave lyrl hufeihu oo-simbo jeo1kim aibuffs ailabteam watson1101 tiizho arunbanswal therealmichaelwang jackylee1 primemeridianeth khryptorgraphics goswamig naibble08 hhy5277 keyman9848 orlgln lucoo01 utkarshx asdlei99 elaa0505 elaaisolution jeremyaanthony pubfork yijiuzai shenglihu thegenerativegeneration shevavm csharplus ewave33 codeaudit clzx09 xymfei zxbin2000 r-fumachi mayuelala qingqingniu errno7 nathanie cyt1984 wangzhiwei-ai guoyilin lyf6 mario-nowak-masters-thesis edenbuaa scutpaul xwyangjshb liushuchun drroad irvingshu nemonameless anylee2021 syntheticthinkers xzitlou jessicalily fangshliu maikehongg mrderrekteng ahuirecome joseph16388 omarnagy91 absalan lasyka zecloud liuwenran

videocrafter's Issues

NSFW videos

Is there a way to restrict the creation of videos? I would like to prevent not safe for work videos.

ERROR: No matching distribution found for torch==1.12.1+cu113

(base) ➜  VideoCrafter conda activate lvdm
(lvdm) ➜  VideoCrafter pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.12.1+cu113

The same default prompt parameters generate significantly different effects during the inference process compared to those on the Discord server. What is the reason for this? Looking forward to your reply.

How to generate growth videos

default is 2s
How to generate growth videos

different from LVDM

HI, thank you for your open source. The project says it is based on LVDM, but I only saw the VDM part in the code. There seems to be no relevant explanation in the code about autoregressive and Hierarchical LVDM for Long Video Generation. When will this part of the code be open source?

How to use with other NVIDIA GPU

I have a Radeon graphic card and when I run

sh scripts/run_image2video.sh

I have this error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm' error

When I want to generate video from image, run shell with follow error:
(videocrafter) [x@public VideoCrafter]# sh scripts/run_image2video.sh
@CoLVDM Inference: 2023-11-02-11-52-43
Global seed set to 123
AE working on z of shape (1, 4, 64, 64) = 16384 dimensions.

model checkpoint loaded.
[rank:0] 2/2 samples loaded.
[rank:0] batch-1 (1)x1 ...
Traceback (most recent call last):
File "scripts/evaluation/inference.py", line 139, in
run_inference(args, gpu_num, rank)
File "scripts/evaluation/inference.py", line 117, in run_inference
img_emb = model.get_image_embeds(cond_images)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 691, in get_image_embeds
img_token = self.embedder(batch_imgs)
File "/opt/miniconda3/envs/videocrafter/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 341, in forward
z = self.encode_with_vision_transformer(image)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 348, in encode_with_vision_transformer
if self.model.visual.input_patchnorm:
File "/opt/miniconda3/envs/videocrafter/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm'

How is RunwayML's gen-1 different from this?

what kind of gpu?that's most important

train

Thank you very much for your excellent works. I don't seem to see the training code, how can I use the pre-trained t2v model to train my own videocontrol model?

shape mismatch when running default example

I'm getting

einops.EinopsError:  Error while processing rearrange-reduction pattern "(b h) n d -> b n (h d)".
 Input tensor shape: torch.Size([16, 2560, 320]). Additional info: {'h': 5}.
 Shape mismatch, can't divide axis of length 16 in chunks of 5

when running the default sh scripts/run_image2video.sh

About training code

Hi, can you release your training code?

Encountered 2 file(s) that may not have been copied correctly on Windows: models/base_t2v/model_rm_wtm.ckpt models/base_t2v/model.ckpt

Getting this error:

Encountered 2 file(s) that may not have been copied correctly on Windows:
models/base_t2v/model_rm_wtm.ckpt
models/base_t2v/model.ckpt

Any plans to release the training code?

Hello,

I notice that you have only released the inference codes for the model. Since this model claims to be open-source, do you have plans on releasing the training/model code as well?

I believe it would help researchers more if they're able to play around with the model architecture code so there exists possibility of fine-tuning, LoRA training, etc..

pretrained T2V models download link not working anymore: quota exceeded

The link you provide to download the T2V model is not working anymore. When you try to download the file, Google tells you this:

Sorry, you can't view or download this file at this time.

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

The link that does not work anymore: https://drive.google.com/file/d/13ZZTXyAKM3x0tObRQOQWdtnrI2ARWYf_/view?usp=share_link

I checked and most of the links you are providing for other models are also exceeding their download quota - the message I get when trying to download the LoRa models is this (can't copy the text, so it's a screenshot)

512 t2v model link 404

miss Text2Video-512-v1 model file

man you should put your new weights in Inference as well rather than only in Changelog

Is this an issue with my version？

with gr.Row().style(equal_height=False): in gradio_app.py . it 's not work.

missing keys in checkpoint of base_512_v1

Hi authors, thanks so much for releasing the model.
I run into issue when trying to load the checkpoint of lower-resolution t2v model, i.e. "base_512_v1". When I load it using the provided text2video script, I run into

RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Missing key(s) in state_dict: "scale_arr", "model.diffusion_model.fps_embedding.0.weight", ...... ".

F.Y.I. I have no problem loading the other two models, i.e. "base_1024_v1" and "i2v_512_v1".
Thank you again for helping address this issue!

Is this like text2video-zero and followyourpose?

Is this like https://github.com/Picsart-AI-Research/Text2Video-Zero and https://github.com/mayuelala/FollowYourPose?

A Bug in codes

Thanks for opening codes. However, I notice that some codes in VideoCrafter/lvdm/modules/attention.py have a few mistake.
First, in Line 94 and Line 110, the relative position seems wrong.
Then, in Line 123, I believe that "out_ip = rearrange(out, '(b h) n d -> b n (h d)', h=h" should be "out_ip = rearrange(out_ip, '(b h) n d -> b n (h d)', h=h".

Please authors check the code and fix them soon if they are bugs. Otherwise, I hope authors could explain this code.
Thanks!

Running sample script

Just installed everything according to your setup. When I try to run the sample script I get the attached errors. So what am I doing wrong? Many thanks
error.txt

Is there existing or planned Windows support?

I've got the model.ckpt and all the requirements installed through miniconda, but I don't think I can run .sh files.

Colab notebook doesn't run

Hello,
Thank you for this research!

I've tried to run the colab notebook demo that is provided, however it doesn't actually work. I believe there are some issues with the installation section. This carries forward to "Base T2V: Generic Text-to-video Generation", where I get the error:

" File "/content/VideoCrafter/VideoCrafter/scripts/sample_text2video.py", line 8, in
from omegaconf import OmegaConf
ModuleNotFoundError: No module named 'omegaconf'"

Manually installing omegaconf also doesn't work, which indicates to me probably there is an issue with setting the python version in the installation section

PackagesNotFoundError: - python=3.8.5

When I am trying to create the env, I get the error: PackagesNotFoundError: ```
The following packages are not available from current channels:

python=3.8.

I am using MacOS, is it only for Linux systems?

Collab

Fantastic work, do you plan on releasing a collaborator by any chance?

What are the GPU requirements, can I run from a 8gb card? Thanks

Got TypeError when running default image2video script

Traceback (most recent call last):
File "/VideoCrafter/scripts/evaluation/inference.py", line 137, in
run_inference(args, gpu_num, rank)
File "/VideoCrafter/scripts/evaluation/inference.py", line 115, in run_inference
img_emb = model.get_image_embeds(cond_images)
File "/VideoCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 691, in get_image_embeds
img_token = self.embedder(batch_imgs)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 341, in forward
z = self.encode_with_vision_transformer(image)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 345, in encode_with_vision_transformer
x = self.preprocess(x)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 326, in preprocess
x = kornia.geometry.resize(x, (224, 224),
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/geometry/transform/affwarp.py", line 570, in resize
input_tmp = kornia.filters.gaussian_blur2d(input_tmp, ks, sigmas)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/gaussian.py", line 38, in gaussian_blur2d
get_gaussian_kernel2d(kernel_size, sigma), dim=0)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/kernels.py", line 541, in get_gaussian_kernel2d
kernel_x: torch.Tensor = get_gaussian_kernel1d(ksize_x, sigma_x, force_even)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/kernels.py", line 418, in get_gaussian_kernel1d
raise TypeError(
TypeError: kernel_size must be an odd positive integer. Got 6

Video Crafter 2 , Models and checkpoints

Thanks for you amazing work..
could you please update us about time of release ,second version

256 t2v inference config?

about VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

I can't find version 2

About the training details

Hello, there

Great work firstly. I am wondering when you train the videocrafter1 text2video model, did you train the spatial and temporal layer together, or tune the temporal layer and spatial layers to keep frozen?

waiting for update!

waiting for update! 😁

Dockerfile

I made a Dockerfile with the xformers setup and also a version that contains all the models.

The files aren't clean enough yet to make a pull request but feel free to use.

The Gradio demo can be run like

docker run --gpus all -it -p 7860:7860 wawa9000/videocrafter:latest-xformers-full python gradio_app.py

Longer videos then 1 second?

Hi, is it possible to generate longer videos with VideoCrafter2 than 1 second? If I set the ETA to e.g. 4, my resulting video is just black screen. Thank you

Estimated training code timeframe

Hi, thanks for this amazing work. Will the training code be made public in April as well or is it planned for later down the line

decord is not installing properly can not use this on Mac unfortunately

can't be used on Mac unless you do a lot of tweaking not worth the effort in my opinion good idea but not executed properly

When will your other models release?

about deflicker

Do you have any good solutions for the flickering issue in generated videos?

about run sh scripts/run_image2video.sh,have error?

LORA finetuning

Is there any plan to add LORA finetuning? If no, could you provide some simple advice how to do this on one's own?

AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm'

when i try to run sh scripts/run_image2video.sh i got AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm' All models have been downloaded and placed in the designated location.

How to customize the duration of video generation

How to customize the duration of video generation？

about NO WATERMARK model

it is a really wonderfull work !
the new NO WATERMARK model means that you remove watermark from webvid dataset or use other dataset (public or private )?

Safetensors

Any chance we get .safetensors models instead of pickle?

Could you add License?

Hi, @YingqingHe, thanks a lot for your interesting work! I am wondering whether you can add a license (e.g., MIT or Apache, etc.) to ease our further usage.

After following all steps still get huge missmatch in loading the checkpoint

Im running with the following command:
python inference.py --seed 123 --mode 'i2v' --ckpt_path checkpoints/i2v_512_v1/model.ckpt --config configs/inference_i2v_512_v1.0.yaml --savedir results/i2v_512_test --n_samples 1 --bs 1 --height 320 --width 512 --unconditional_guidance_scale 12.0 --ddim_steps 50 --ddim_eta 1.0 --prompt_file prompts/i2v_prompts/test_prompts.txt --cond_input prompts/i2v_prompts --fps 8

RuntimeError: Error(s) in loading state_dict for LatentVisualDiffusion:
Missing key(s) in state_dict: "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.bias", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.bias",
.....
"model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.bias",
...
"model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k_ip.weight",
...
"embedder.model.visual.transformer.resblocks.1.ln_2.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_1.weight", "embedder.model.visual.transformer.resblocks.2.ln_1.bias", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_2.weight", "embedder.model.visual.transformer.resblocks.2.ln_2.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_1.weight", "embedder.model.visual.transformer.resblocks.3.ln_1.bias", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_2.weight", "embedder.model.visual.transformer.resblocks.3.ln_2.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_1.weight", "embedder.model.visual.transformer.resblocks.4.ln_1.bias", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_2.weight", "embedder.model.visual.transformer.resblocks.4.ln_2.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_1.weight", "embedder.model.visual.transformer.resblocks.5.ln_1.bias", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_2.weight", "embedder.model.visual.transformer.resblocks.5.ln_2.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_proj.weight",
....
"embedder.model.visual.transformer.resblocks.29.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.29.ln_2.weight", "embedder.model.visual.transformer.resblocks.29.ln_2.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_1.weight", "embedder.model.visual.transformer.resblocks.30.ln_1.bias", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_2.weight", "embedder.model.visual.transformer.resblocks.30.ln_2.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.31.ln_1.weight", "embedder.model.visual.transformer.resblocks.31.ln_1.bias", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_bias",
"image_proj_model.layers.1.1.1.weight", "image_proj_model.layers.1.1.3.weight", "image_proj_model.layers.2.0.norm1.weight", "image_proj_model.layers.2.0.norm1.bias", "image_proj_model.layers.2.0.norm2.weight", "image_proj_model.layers.2.0.norm2.bias", "image_proj_model.layers.2.0.to_q.weight", "image_proj_model.layers.2.0.to_kv.weight", "image_proj_model.layers.2.0.to_out.weight", "image_proj_model.layers.2.1.0.weight", "image_proj_model.layers.2.1.0.bias", "image_proj_model.layers.2.1.1.weight", "image_proj_model.layers.2.1.3.weight", "image_proj_model.layers.3.0.norm1.weight", "image_proj_model.layers.3.0.norm1.bias", "image_proj_model.layers.3.0.norm2.weight", "image_proj_model.layers.3.0.norm2.bias", "image_proj_model.layers.3.0.to_q.weight", "image_proj_model.layers.3.0.to_kv.weight", "image_proj_model.layers.3.0.to_out.weight", "image_proj_model.layers.3.1.0.weight", "image_proj_model.layers.3.1.0.bias", "image_proj_model.layers.3.1.1.weight", "image_proj_model.layers.3.1.3.weight".
Unexpected key(s) in state_dict: "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table",
...
"model.diffusion_model.init_attn.0.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table",
...
"model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table".
size mismatch for scale_arr: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([1400]).

Any idea what Im missing?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.