Giter VIP home page Giter VIP logo

videocrafter's People

Contributors

chenxwh avatar eltociear avatar mayuelala avatar menghanxia avatar scutpaul avatar vinthony avatar yingqinghe avatar yzhang2016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

videocrafter's Issues

NSFW videos

Is there a way to restrict the creation of videos? I would like to prevent not safe for work videos.

ERROR: No matching distribution found for torch==1.12.1+cu113

(base) ➜  VideoCrafter conda activate lvdm
(lvdm) ➜  VideoCrafter pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
Looking in links: https://download.pytorch.org/whl/torch_stable.html
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0)
ERROR: No matching distribution found for torch==1.12.1+cu113

Generate Quality

The same default prompt parameters generate significantly different effects during the inference process compared to those on the Discord server. What is the reason for this? Looking forward to your reply.

different from LVDM

HI, thank you for your open source. The project says it is based on LVDM, but I only saw the VDM part in the code. There seems to be no relevant explanation in the code about autoregressive and Hierarchical LVDM for Long Video Generation. When will this part of the code be open source?

How to use with other NVIDIA GPU

I have a Radeon graphic card and when I run

sh scripts/run_image2video.sh

I have this error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm' error

When I want to generate video from image, run shell with follow error:
(videocrafter) [x@public VideoCrafter]# sh scripts/run_image2video.sh
@CoLVDM Inference: 2023-11-02-11-52-43
Global seed set to 123
AE working on z of shape (1, 4, 64, 64) = 16384 dimensions.

model checkpoint loaded.
[rank:0] 2/2 samples loaded.
[rank:0] batch-1 (1)x1 ...
Traceback (most recent call last):
File "scripts/evaluation/inference.py", line 139, in
run_inference(args, gpu_num, rank)
File "scripts/evaluation/inference.py", line 117, in run_inference
img_emb = model.get_image_embeds(cond_images)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 691, in get_image_embeds
img_token = self.embedder(batch_imgs)
File "/opt/miniconda3/envs/videocrafter/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 341, in forward
z = self.encode_with_vision_transformer(image)
File "/home/share/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 348, in encode_with_vision_transformer
if self.model.visual.input_patchnorm:
File "/opt/miniconda3/envs/videocrafter/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VisionTransformer' object has no attribute 'input_patchnorm'

train

Thank you very much for your excellent works. I don't seem to see the training code, how can I use the pre-trained t2v model to train my own videocontrol model?

shape mismatch when running default example

I'm getting

einops.EinopsError:  Error while processing rearrange-reduction pattern "(b h) n d -> b n (h d)".
 Input tensor shape: torch.Size([16, 2560, 320]). Additional info: {'h': 5}.
 Shape mismatch, can't divide axis of length 16 in chunks of 5

when running the default sh scripts/run_image2video.sh

Any plans to release the training code?

Hello,

I notice that you have only released the inference codes for the model. Since this model claims to be open-source, do you have plans on releasing the training/model code as well?

I believe it would help researchers more if they're able to play around with the model architecture code so there exists possibility of fine-tuning, LoRA training, etc..

pretrained T2V models download link not working anymore: quota exceeded

The link you provide to download the T2V model is not working anymore. When you try to download the file, Google tells you this:

Sorry, you can't view or download this file at this time.

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

The link that does not work anymore: https://drive.google.com/file/d/13ZZTXyAKM3x0tObRQOQWdtnrI2ARWYf_/view?usp=share_link

I checked and most of the links you are providing for other models are also exceeding their download quota - the message I get when trying to download the LoRa models is this (can't copy the text, so it's a screenshot)
Untitled-1

missing keys in checkpoint of base_512_v1

Hi authors, thanks so much for releasing the model.
I run into issue when trying to load the checkpoint of lower-resolution t2v model, i.e. "base_512_v1". When I load it using the provided text2video script, I run into

RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Missing key(s) in state_dict: "scale_arr", "model.diffusion_model.fps_embedding.0.weight", ...... ".

F.Y.I. I have no problem loading the other two models, i.e. "base_1024_v1" and "i2v_512_v1".
Thank you again for helping address this issue!

A Bug in codes

Thanks for opening codes. However, I notice that some codes in VideoCrafter/lvdm/modules/attention.py have a few mistake.
First, in Line 94 and Line 110, the relative position seems wrong.
Then, in Line 123, I believe that "out_ip = rearrange(out, '(b h) n d -> b n (h d)', h=h" should be "out_ip = rearrange(out_ip, '(b h) n d -> b n (h d)', h=h".

Please authors check the code and fix them soon if they are bugs. Otherwise, I hope authors could explain this code.
Thanks!

Running sample script

Just installed everything according to your setup. When I try to run the sample script I get the attached errors. So what am I doing wrong? Many thanks
error.txt

Colab notebook doesn't run

Hello,
Thank you for this research!

I've tried to run the colab notebook demo that is provided, however it doesn't actually work. I believe there are some issues with the installation section. This carries forward to "Base T2V: Generic Text-to-video Generation", where I get the error:

" File "/content/VideoCrafter/VideoCrafter/scripts/sample_text2video.py", line 8, in
from omegaconf import OmegaConf
ModuleNotFoundError: No module named 'omegaconf'"

Manually installing omegaconf also doesn't work, which indicates to me probably there is an issue with setting the python version in the installation section

PackagesNotFoundError: - python=3.8.5

When I am trying to create the env, I get the error: PackagesNotFoundError: ```
The following packages are not available from current channels:

  • python=3.8.
I am using MacOS, is it only for Linux systems?

Collab

Fantastic work, do you plan on releasing a collaborator by any chance?

What are the GPU requirements, can I run from a 8gb card? Thanks

Got TypeError when running default image2video script

Traceback (most recent call last):
File "/VideoCrafter/scripts/evaluation/inference.py", line 137, in
run_inference(args, gpu_num, rank)
File "/VideoCrafter/scripts/evaluation/inference.py", line 115, in run_inference
img_emb = model.get_image_embeds(cond_images)
File "/VideoCrafter/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 691, in get_image_embeds
img_token = self.embedder(batch_imgs)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 341, in forward
z = self.encode_with_vision_transformer(image)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 345, in encode_with_vision_transformer
x = self.preprocess(x)
File "/VideoCrafter/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 326, in preprocess
x = kornia.geometry.resize(x, (224, 224),
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/geometry/transform/affwarp.py", line 570, in resize
input_tmp = kornia.filters.gaussian_blur2d(input_tmp, ks, sigmas)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/gaussian.py", line 38, in gaussian_blur2d
get_gaussian_kernel2d(kernel_size, sigma), dim=0)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/kernels.py", line 541, in get_gaussian_kernel2d
kernel_x: torch.Tensor = get_gaussian_kernel1d(ksize_x, sigma_x, force_even)
File "//miniconda3/envs/dev/lib/python3.10/site-packages/kornia/filters/kernels.py", line 418, in get_gaussian_kernel1d
raise TypeError(
TypeError: kernel_size must be an odd positive integer. Got 6

About the training details

Hello, there

Great work firstly. I am wondering when you train the videocrafter1 text2video model, did you train the spatial and temporal layer together, or tune the temporal layer and spatial layers to keep frozen?

Dockerfile

I made a Dockerfile with the xformers setup and also a version that contains all the models.

The files aren't clean enough yet to make a pull request but feel free to use.

The Gradio demo can be run like

docker run --gpus all -it -p 7860:7860 wawa9000/videocrafter:latest-xformers-full python gradio_app.py

Longer videos then 1 second?

Hi, is it possible to generate longer videos with VideoCrafter2 than 1 second? If I set the ETA to e.g. 4, my resulting video is just black screen. Thank you

Estimated training code timeframe

Hi, thanks for this amazing work. Will the training code be made public in April as well or is it planned for later down the line

about deflicker

Do you have any good solutions for the flickering issue in generated videos?

LORA finetuning

Is there any plan to add LORA finetuning? If no, could you provide some simple advice how to do this on one's own?

about NO WATERMARK model

it is a really wonderfull work !
the new NO WATERMARK model means that you remove watermark from webvid dataset or use other dataset (public or private )?

Safetensors

Any chance we get .safetensors models instead of pickle?

Could you add License?

Hi, @YingqingHe, thanks a lot for your interesting work! I am wondering whether you can add a license (e.g., MIT or Apache, etc.) to ease our further usage.

After following all steps still get huge missmatch in loading the checkpoint

Im running with the following command:
python inference.py --seed 123 --mode 'i2v' --ckpt_path checkpoints/i2v_512_v1/model.ckpt --config configs/inference_i2v_512_v1.0.yaml --savedir results/i2v_512_test --n_samples 1 --bs 1 --height 320 --width 512 --unconditional_guidance_scale 12.0 --ddim_steps 50 --ddim_eta 1.0 --prompt_file prompts/i2v_prompts/test_prompts.txt --cond_input prompts/i2v_prompts --fps 8

RuntimeError: Error(s) in loading state_dict for LatentVisualDiffusion:
Missing key(s) in state_dict: "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.1.0.temopral_conv.conv4.3.bias", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.2.0.temopral_conv.conv2.3.bias",
.....
"model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv1.2.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv2.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv3.3.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.0.bias", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.weight", "model.diffusion_model.input_blocks.11.0.temopral_conv.conv4.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv1.2.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv2.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv3.3.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.0.bias", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.weight", "model.diffusion_model.middle_block.0.temopral_conv.conv4.3.bias",
...
"model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.5.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k_ip.weight", "model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v_ip.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv1.2.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv2.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv3.3.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.0.bias", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.weight", "model.diffusion_model.output_blocks.6.0.temopral_conv.conv4.3.bias", "model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k_ip.weight",
...
"embedder.model.visual.transformer.resblocks.1.ln_2.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.1.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_1.weight", "embedder.model.visual.transformer.resblocks.2.ln_1.bias", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.2.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.2.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.2.ln_2.weight", "embedder.model.visual.transformer.resblocks.2.ln_2.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.2.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_1.weight", "embedder.model.visual.transformer.resblocks.3.ln_1.bias", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.3.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.3.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.3.ln_2.weight", "embedder.model.visual.transformer.resblocks.3.ln_2.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.3.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_1.weight", "embedder.model.visual.transformer.resblocks.4.ln_1.bias", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.4.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.4.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.4.ln_2.weight", "embedder.model.visual.transformer.resblocks.4.ln_2.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.4.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_1.weight", "embedder.model.visual.transformer.resblocks.5.ln_1.bias", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.5.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.5.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.5.ln_2.weight", "embedder.model.visual.transformer.resblocks.5.ln_2.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.5.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.5.mlp.c_proj.weight",
....
"embedder.model.visual.transformer.resblocks.29.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.29.ln_2.weight", "embedder.model.visual.transformer.resblocks.29.ln_2.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.29.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_1.weight", "embedder.model.visual.transformer.resblocks.30.ln_1.bias", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.30.attn.in_proj_bias", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.weight", "embedder.model.visual.transformer.resblocks.30.attn.out_proj.bias", "embedder.model.visual.transformer.resblocks.30.ln_2.weight", "embedder.model.visual.transformer.resblocks.30.ln_2.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_fc.bias", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.weight", "embedder.model.visual.transformer.resblocks.30.mlp.c_proj.bias", "embedder.model.visual.transformer.resblocks.31.ln_1.weight", "embedder.model.visual.transformer.resblocks.31.ln_1.bias", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_weight", "embedder.model.visual.transformer.resblocks.31.attn.in_proj_bias",
"image_proj_model.layers.1.1.1.weight", "image_proj_model.layers.1.1.3.weight", "image_proj_model.layers.2.0.norm1.weight", "image_proj_model.layers.2.0.norm1.bias", "image_proj_model.layers.2.0.norm2.weight", "image_proj_model.layers.2.0.norm2.bias", "image_proj_model.layers.2.0.to_q.weight", "image_proj_model.layers.2.0.to_kv.weight", "image_proj_model.layers.2.0.to_out.weight", "image_proj_model.layers.2.1.0.weight", "image_proj_model.layers.2.1.0.bias", "image_proj_model.layers.2.1.1.weight", "image_proj_model.layers.2.1.3.weight", "image_proj_model.layers.3.0.norm1.weight", "image_proj_model.layers.3.0.norm1.bias", "image_proj_model.layers.3.0.norm2.weight", "image_proj_model.layers.3.0.norm2.bias", "image_proj_model.layers.3.0.to_q.weight", "image_proj_model.layers.3.0.to_kv.weight", "image_proj_model.layers.3.0.to_out.weight", "image_proj_model.layers.3.1.0.weight", "image_proj_model.layers.3.1.0.bias", "image_proj_model.layers.3.1.1.weight", "image_proj_model.layers.3.1.3.weight".
Unexpected key(s) in state_dict: "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.1.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.input_blocks.2.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table",
...
"model.diffusion_model.init_attn.0.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.middle_block.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.3.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.4.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.5.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.6.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.7.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table",
...
"model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.9.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.10.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn1.relative_position_v.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_k.embeddings_table", "model.diffusion_model.output_blocks.11.2.transformer_blocks.0.attn2.relative_position_v.embeddings_table".
size mismatch for scale_arr: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([1400]).

Any idea what Im missing?

Training code of the t2v model

Thanks for your wonderful work!

Do you have plans to also release the code used to train the t2v model on the webvid dataset?

about long video

Thank you very much for your excellent works. I see that more than one thousand frames of video can be generated. How to set this, my test only has 16 frames of video

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.