dvlab-research / video-p2p Goto Github PK

View Code? Open in Web Editor NEW

363.0 9.0 24.0 5.47 MB

Video-P2P: Video Editing with Cross-attention Control

Home Page: https://video-p2p.github.io/

Python 99.98% CSS 0.02%

image-editing stable-diffusion text-driven-editing video-editing generative-model

video-p2p's Introduction

[CVPR 2024] Video-P2P: Video Editing with Cross-attention Control

The official implementation of Video-P2P.

Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

Changelog

2023.03.20 Release Demo.
2023.03.19 Release Code.
2023.03.09 Paper preprint on arxiv.

Todo

Setup

pip install -r requirements.txt

The code was tested on both Tesla V100 32GB and RTX3090 24GB. At least 20GB VRAM is required.

The environment is similar to Tune-A-Video and prompt-to-prompt.

xformers on 3090 may meet this issue.

Quickstart

Please replace pretrained_model_path with the path to your stable-diffusion.

To download the pre-trained model, please refer to diffusers.

# Stage 1: Tuning to do model initialization.

# You can minimize the tuning epochs to speed up.
python run_tuning.py  --config="configs/rabbit-jump-tune.yaml"

# Stage 2: Attention Control

# We develop a faster mode (1 min on V100):
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml" --fast

# The official mode (10 mins on V100, more stable):
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml"

Find your results in Video-P2P/outputs/xxx/results.

Dataset

We release our dataset here.

Download them under ./data and explore your creativity!

Results

configs/rabbit-jump-p2p.yaml	configs/penguin-run-p2p.yaml

configs/man-motor-p2p.yaml	configs/car-drive-p2p.yaml

configs/tiger-forest-p2p.yaml	configs/bird-forest-p2p.yaml

Gradio demo

Running the following command to launch the local demo built with gradio:

python app_gradio.py

Find the demo on HuggingFace here. The demo code borrows heavily from Tune-A-Video.

Citation

@misc{liu2023videop2p,
      author={Liu, Shaoteng and Zhang, Yuechen and Li, Wenbo and Lin, Zhe and Jia, Jiaya},
      title={Video-P2P: Video Editing with Cross-attention Control}, 
      journal={arXiv:2303.04761},
      year={2023},
}

References

prompt-to-prompt: https://github.com/google/prompt-to-prompt
Tune-A-Video: https://github.com/showlab/Tune-A-Video
diffusers: https://github.com/huggingface/diffusers

video-p2p's People

Contributors

Stargazers

Watchers

video-p2p's Issues

Error

OSError: Can't load tokenizer for '/cpfs01/user/lixiaohui/code/Video-P2P/outputs/rabbitjump'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Video-P2P/outputs/rabbitjump' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

Qs about the paper

Thanks for your excellent job! I'm wondering the difference between initialized unconditional embedding and optimized unconditional embedding. The latter is optimized by ddim inversion, but there is no explanation for the generation of initialized one.

The use of config

hi,
I am so amazed at your work,But I still have doubts about some parameters in config p2p.I want to know the specific role of blend_word and eq_params.

for EX.

"authentic style, a man is driving a motorbike in the forest"
"cartoon style, a Iron-Man is driving a motorbike in the river"
How should I set these two items?

thx

I have completed the training, and the reasoning reports an error

Can't load tokenizer for './outputs/rabbit-jump'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure './outputs/rabbit-jump' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

An error occurs when running to 20%

run_tuning.py cannot work

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This
error indicates that your module has parameters that were not used in producing loss. You can enable
unused parameter detection by passing the keyword argument find_unused_parameters=True to
torch.nn.parallel.DistributedDataParallel, and by
making sure all forward function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the
output tensors in the return value of your module's forward function. Please include the loss function
and the structure of the return value of forward of your module when reporting this issue (e.g. list,
dict, iterable).
Parameters which did not receive grad for rank 0:
down_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_out.0.bias
Parameter indices which did not receive grad for rank 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

have a error MODEL_LIBRARY_ORG_NAME

Hi I have a error with python app_gradio.py
"Traceback (most recent call last):
File "D:\Video-P2P\app_gradio.py", line 14, in
from gradio_utils.app_training import create_training_demo
File "D:\Video-P2P\gradio_utils\app_training.py", line 9, in
from constants import MODEL_LIBRARY_ORG_NAME, SAMPLE_MODEL_REPO, UploadTarget
ImportError: cannot import name 'MODEL_LIBRARY_ORG_NAME' from 'constants' (C:\ProgramData\Anaconda3\envs\p2p\lib\site-packages\constants.py)"

Version of xformers

Thanks for your great work.

I cannot run the script run_videop2p.py on either 4090 or A5000. I have tried three versions of xformers.

xformers 0.0.15.dev0+0bad001.d20230429 will lead to this error:

NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cuda', index=0), k=80, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=1024, q_len=1024, kv=80, batch_size=64, num_heads=1, has_custom_scale=False, requires_grad=True)

xformers 0.0.16 and xformers 0.0.17 will lead to out-of-memoryerror.

Also, I have tried both pytorch 1.12.1 and 1.13.1. Neither of them work.

May I know the xformers version you use?

what is the VRAM requirement?

what is the VRAM requirement on a 3090 or other consumer GPUs?

Error downloading model

Upon running python run_tuning.py --config="configs/rabbit-jump-tune.yaml" I am getting the error

We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like /data/stable-diffusion/stable-diffusion-v1-5 is not the path to a directory containing a scheduler_config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.

Is there somewhere I should download the model from? The README doesn't mention where to download a pretrained model and the huggingface space doesn't have any model checkpoints.

Run run_videop2p.py but get nan

As shown in this fig, the LDM outputs nan and the generated images are all black, could you please tell me some possible reason？

Cannot down load the pre-trained model

hi, I tried to down the pre-trained model from this hugging face page https://huggingface.co/spaces/video-p2p-library/Video-P2P-Demo. But the page shows:

Runtime error
Launch timed out, space was not healthy after 30 min
Container logs:

Fetching error logs...

Did you remove the pre-trained model in hugging face ?

The problem with rabbit-jump-p2p.yaml

Hi, I have a problem with rabbit-jump-p2p.yaml.
After I train the model using rabbit-jump-tune.yaml, the fine-tuned checkpoint is stored in the output folder, and when I use rabbit-jump-tune.yaml to edit the video, what the road should I use in pretrained_model_path config(line 1), since there are two folders (stable-diffusion-v1.5 and output) I have to load the model.
Thanks for answering!

Stage 2 Video Length

Thanks for releasing this great work!

Is it possible to increase the video_len from 8 frames without incurring significant additional memory cost in stage 2 editing?

How to calculate quantitative metrics

Hi,
I am really appreciate your excellent work.
Could you please tell more details about how you calculate theses metrics?
the paper demonstrate 4 different quantitative metrics(3 proposed before, 1 proposed in your paper) and said can find details in appendix, however, I can't find the appendix. I am wondering if you can tell more details about how to calculate them. e.g. what is the mask in M.PSNR, how to calculate PSNR(averaged PSNR of R,G,B channel or take RGB as a whole then divided by 3?) the version you use for LPIPS, averaged frame LPIPS to represent video LPIPS?, the definition of OSV, etc.

Thanks

Complete dataset

Hi,

Thank you for sharing the code. I was wondering if you will release the complete data on which the model was trained on because the dataset you shared here contains only the testing data.