ailab-cvc / videocrafter Goto Github PK
View Code? Open in Web Editor NEWVideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Home Page: https://ailab-cvc.github.io/videocrafter2/
License: Other
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Home Page: https://ailab-cvc.github.io/videocrafter2/
License: Other
Is there a way to restrict the creation of videos? I would like to prevent not safe for work videos.
(base) ➜ VideoCrafter conda activate lvdm
(lvdm) ➜ VideoCrafter pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.12.1+cu113
The same default prompt parameters generate significantly different effects during the inference process compared to those on the Discord server. What is the reason for this? Looking forward to your reply.
default is 2s
How to generate growth videos
HI, thank you for your open source. The project says it is based on LVDM, but I only saw the VDM part in the code. There seems to be no relevant explanation in the code about autoregressive and Hierarchical LVDM for Long Video Generation. When will this part of the code be open source?
I have a Radeon graphic card and when I run
sh scripts/run_image2video.sh
I have this error
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
When I want to generate video from image, run shell with follow error:
(videocrafter) [x@public VideoCrafter]# sh scripts/run_image2video.sh
@CoLVDM Inference: 2023-11-02-11-52-43
Global seed set to 123
AE working on z of shape (1, 4, 64, 64) = 16384 dimensions.
model checkpoint loaded.
[rank:0] 2/2 samples loaded.
[rank:0] batch-1 (1)x1 ...
Traceback (most recent call last):
File "scripts/evaluation/inference.py", line 139, in
run_inference(args, gpu_num, rank)
File "scripts/evaluation/inference.py", line 117, in run_inference
img_emb = model.get_image_embeds(cond_images)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 691, in get_image_embeds
img_token = self.embedder(batch_imgs)
File "/opt/miniconda3/envs/videocrafter/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 341, in forward
z = self.encode_with_vision_transformer(image)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 348, in encode_with_vision_transformer
if self.model.visual.input_patchnorm:
File "/opt/miniconda3/envs/videocrafter/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm'
what kind of gpu?that's most important
Thank you very much for your excellent works. I don't seem to see the training code, how can I use the pre-trained t2v model to train my own videocontrol model?
I'm getting
einops.EinopsError: Error while processing rearrange-reduction pattern "(b h) n d -> b n (h d)".
Input tensor shape: torch.Size([16, 2560, 320]). Additional info: {'h': 5}.
Shape mismatch, can't divide axis of length 16 in chunks of 5
when running the default sh scripts/run_image2video.sh
Hi, can you release your training code?
Getting this error:
Encountered 2 file(s) that may not have been copied correctly on Windows:
models/base_t2v/model_rm_wtm.ckpt
models/base_t2v/model.ckpt
Hello,
I notice that you have only released the inference codes for the model. Since this model claims to be open-source, do you have plans on releasing the training/model code as well?
I believe it would help researchers more if they're able to play around with the model architecture code so there exists possibility of fine-tuning, LoRA training, etc..
The link you provide to download the T2V model is not working anymore. When you try to download the file, Google tells you this:
Sorry, you can't view or download this file at this time.
Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.
The link that does not work anymore: https://drive.google.com/file/d/13ZZTXyAKM3x0tObRQOQWdtnrI2ARWYf_/view?usp=share_link
I checked and most of the links you are providing for other models are also exceeding their download quota - the message I get when trying to download the LoRa models is this (can't copy the text, so it's a screenshot)
miss Text2Video-512-v1 model file
with gr.Row().style(equal_height=False): in gradio_app.py . it 's not work.
Hi authors, thanks so much for releasing the model.
I run into issue when trying to load the checkpoint of lower-resolution t2v model, i.e. "base_512_v1". When I load it using the provided text2video script, I run into
RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Missing key(s) in state_dict: "scale_arr", "model.diffusion_model.fps_embedding.0.weight", ...... ".
F.Y.I. I have no problem loading the other two models, i.e. "base_1024_v1" and "i2v_512_v1".
Thank you again for helping address this issue!
Thanks for opening codes. However, I notice that some codes in VideoCrafter/lvdm/modules/attention.py have a few mistake.
First, in Line 94 and Line 110, the relative position seems wrong.
Then, in Line 123, I believe that "out_ip = rearrange(out, '(b h) n d -> b n (h d)', h=h" should be "out_ip = rearrange(out_ip, '(b h) n d -> b n (h d)', h=h".
Please authors check the code and fix them soon if they are bugs. Otherwise, I hope authors could explain this code.
Thanks!
Just installed everything according to your setup. When I try to run the sample script I get the attached errors. So what am I doing wrong? Many thanks
error.txt
I've got the model.ckpt and all the requirements installed through miniconda, but I don't think I can run .sh files.
Hello,
Thank you for this research!
I've tried to run the colab notebook demo that is provided, however it doesn't actually work. I believe there are some issues with the installation section. This carries forward to "Base T2V: Generic Text-to-video Generation", where I get the error:
" File "/content/VideoCrafter/VideoCrafter/scripts/sample_text2video.py", line 8, in
from omegaconf import OmegaConf
ModuleNotFoundError: No module named 'omegaconf'"
Manually installing omegaconf also doesn't work, which indicates to me probably there is an issue with setting the python version in the installation section
When I am trying to create the env, I get the error: PackagesNotFoundError: ```
The following packages are not available from current channels:
I am using MacOS, is it only for Linux systems?
Fantastic work, do you plan on releasing a collaborator by any chance?
What are the GPU requirements, can I run from a 8gb card? Thanks
Traceback (most recent call last):
File "/VideoCrafter/scripts/evaluation/inference.py", line 137, in
run_inference(args, gpu_num, rank)
File "/VideoCrafter/scripts/evaluation/inference.py", line 115, in run_inference
img_emb = model.get_image_embeds(cond_images)
File "/VideoCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 691, in get_image_embeds
img_token = self.embedder(batch_imgs)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 341, in forward
z = self.encode_with_vision_transformer(image)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 345, in encode_with_vision_transformer
x = self.preprocess(x)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 326, in preprocess
x = kornia.geometry.resize(x, (224, 224),
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/geometry/transform/affwarp.py", line 570, in resize
input_tmp = kornia.filters.gaussian_blur2d(input_tmp, ks, sigmas)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/gaussian.py", line 38, in gaussian_blur2d
get_gaussian_kernel2d(kernel_size, sigma), dim=0)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/kernels.py", line 541, in get_gaussian_kernel2d
kernel_x: torch.Tensor = get_gaussian_kernel1d(ksize_x, sigma_x, force_even)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/kernels.py", line 418, in get_gaussian_kernel1d
raise TypeError(
TypeError: kernel_size must be an odd positive integer. Got 6
Thanks for you amazing work..
could you please update us about time of release ,second version
I can't find version 2
Hello, there
Great work firstly. I am wondering when you train the videocrafter1 text2video model, did you train the spatial and temporal layer together, or tune the temporal layer and spatial layers to keep frozen?
waiting for update! 😁
I made a Dockerfile with the xformers setup and also a version that contains all the models.
The files aren't clean enough yet to make a pull request but feel free to use.
The Gradio demo can be run like
docker run --gpus all -it -p 7860:7860 wawa9000/videocrafter:latest-xformers-full python gradio_app.py
Hi, is it possible to generate longer videos with VideoCrafter2 than 1 second? If I set the ETA to e.g. 4, my resulting video is just black screen. Thank you
Hi, thanks for this amazing work. Will the training code be made public in April as well or is it planned for later down the line
can't be used on Mac unless you do a lot of tweaking not worth the effort in my opinion good idea but not executed properly
Do you have any good solutions for the flickering issue in generated videos?
Is there any plan to add LORA finetuning? If no, could you provide some simple advice how to do this on one's own?
when i try to run sh scripts/run_image2video.sh i got AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm'
All models have been downloaded and placed in the designated location.
How to customize the duration of video generation?
it is a really wonderfull work !
the new NO WATERMARK model means that you remove watermark from webvid dataset or use other dataset (public or private )?
Any chance we get .safetensors
models instead of pickle?
Hi, @YingqingHe, thanks a lot for your interesting work! I am wondering whether you can add a license (e.g., MIT or Apache, etc.) to ease our further usage.
Im running with the following command:
python inference.py --seed 123 --mode 'i2v' --ckpt_path checkpoints/i2v_512_v1/model.ckpt --config configs/inference_i2v_512_v1.0.yaml --savedir results/i2v_512_test --n_samples 1 --bs 1 --height 320 --width 512 --unconditional_guidance_scale 12.0 --ddim_steps 50 --ddim_eta 1.0 --prompt_file prompts/i2v_prompts/test_prompts.txt --cond_input prompts/i2v_prompts --fps 8
RuntimeError: Error(s) in loading state_dict for LatentVisualDiffusion:
Missing key(s) in state_dict: "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.bias", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.bias",
.....
"model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.bias",
...
"model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k_ip.weight",
...
"embedder.model.visual.transformer.resblocks.1.ln_2.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_1.weight", "embedder.model.visual.transformer.resblocks.2.ln_1.bias", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_2.weight", "embedder.model.visual.transformer.resblocks.2.ln_2.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_1.weight", "embedder.model.visual.transformer.resblocks.3.ln_1.bias", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_2.weight", "embedder.model.visual.transformer.resblocks.3.ln_2.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_1.weight", "embedder.model.visual.transformer.resblocks.4.ln_1.bias", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_2.weight", "embedder.model.visual.transformer.resblocks.4.ln_2.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_1.weight", "embedder.model.visual.transformer.resblocks.5.ln_1.bias", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_2.weight", "embedder.model.visual.transformer.resblocks.5.ln_2.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_proj.weight",
....
"embedder.model.visual.transformer.resblocks.29.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.29.ln_2.weight", "embedder.model.visual.transformer.resblocks.29.ln_2.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_1.weight", "embedder.model.visual.transformer.resblocks.30.ln_1.bias", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_2.weight", "embedder.model.visual.transformer.resblocks.30.ln_2.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.31.ln_1.weight", "embedder.model.visual.transformer.resblocks.31.ln_1.bias", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_bias",
"image_proj_model.layers.1.1.1.weight", "image_proj_model.layers.1.1.3.weight", "image_proj_model.layers.2.0.norm1.weight", "image_proj_model.layers.2.0.norm1.bias", "image_proj_model.layers.2.0.norm2.weight", "image_proj_model.layers.2.0.norm2.bias", "image_proj_model.layers.2.0.to_q.weight", "image_proj_model.layers.2.0.to_kv.weight", "image_proj_model.layers.2.0.to_out.weight", "image_proj_model.layers.2.1.0.weight", "image_proj_model.layers.2.1.0.bias", "image_proj_model.layers.2.1.1.weight", "image_proj_model.layers.2.1.3.weight", "image_proj_model.layers.3.0.norm1.weight", "image_proj_model.layers.3.0.norm1.bias", "image_proj_model.layers.3.0.norm2.weight", "image_proj_model.layers.3.0.norm2.bias", "image_proj_model.layers.3.0.to_q.weight", "image_proj_model.layers.3.0.to_kv.weight", "image_proj_model.layers.3.0.to_out.weight", "image_proj_model.layers.3.1.0.weight", "image_proj_model.layers.3.1.0.bias", "image_proj_model.layers.3.1.1.weight", "image_proj_model.layers.3.1.3.weight".
Unexpected key(s) in state_dict: "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table",
...
"model.diffusion_model.init_attn.0.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table",
...
"model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table".
size mismatch for scale_arr: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([1400]).
Any idea what Im missing?
Hi! Thank you for the excellent library and utilities. Is it possible to convert the models to the diffusers format?
An attempt to use the diffusers conversion script failed. Is it the same architecture?
Thanks for your wonderful work!
Do you have plans to also release the code used to train the t2v model on the webvid dataset?
Thank you very much for your excellent works. I see that more than one thousand frames of video can be generated. How to set this, my test only has 16 frames of video
hi!
any plans for us onnx directml users in the future?
kind regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.