Giter VIP home page Giter VIP logo

video-p2p's Introduction

[CVPR 2024] Video-P2P: Video Editing with Cross-attention Control

The official implementation of Video-P2P.

Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

Project Website arXiv Hugging Face Demo

Teaser

Changelog

  • 2023.03.20 Release Demo.
  • 2023.03.19 Release Code.
  • 2023.03.09 Paper preprint on arxiv.

Todo

  • Release the code with 6 examples.
  • Update a faster version.
  • Release data.
  • Release the Gradio Demo.
  • Add local Gradio Demo.
  • Release more configs and new applications.

Setup

pip install -r requirements.txt

The code was tested on both Tesla V100 32GB and RTX3090 24GB. At least 20GB VRAM is required.

The environment is similar to Tune-A-Video and prompt-to-prompt.

xformers on 3090 may meet this issue.

Quickstart

Please replace pretrained_model_path with the path to your stable-diffusion.

To download the pre-trained model, please refer to diffusers.

# Stage 1: Tuning to do model initialization.

# You can minimize the tuning epochs to speed up.
python run_tuning.py  --config="configs/rabbit-jump-tune.yaml"
# Stage 2: Attention Control

# We develop a faster mode (1 min on V100):
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml" --fast

# The official mode (10 mins on V100, more stable):
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml"

Find your results in Video-P2P/outputs/xxx/results.

Dataset

We release our dataset here.

Download them under ./data and explore your creativity!

Results

configs/rabbit-jump-p2p.yaml configs/penguin-run-p2p.yaml
configs/man-motor-p2p.yaml configs/car-drive-p2p.yaml
configs/tiger-forest-p2p.yaml configs/bird-forest-p2p.yaml

Gradio demo

Running the following command to launch the local demo built with gradio:

python app_gradio.py

Find the demo on HuggingFace here. The demo code borrows heavily from Tune-A-Video.

Citation

@misc{liu2023videop2p,
      author={Liu, Shaoteng and Zhang, Yuechen and Li, Wenbo and Lin, Zhe and Jia, Jiaya},
      title={Video-P2P: Video Editing with Cross-attention Control}, 
      journal={arXiv:2303.04761},
      year={2023},
}

References

video-p2p's People

Contributors

shaotengliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video-p2p's Issues

Error

OSError: Can't load tokenizer for '/cpfs01/user/lixiaohui/code/Video-P2P/outputs/rabbitjump'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Video-P2P/outputs/rabbitjump' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.

Qs about the paper

Thanks for your excellent job! I'm wondering the difference between initialized unconditional embedding and optimized unconditional embedding. The latter is optimized by ddim inversion, but there is no explanation for the generation of initialized one.

The use of config

hi,
I am so amazed at your work,But I still have doubts about some parameters in config p2p.I want to know the specific role of blend_word and eq_params.

for EX.

  • "authentic style, a man is driving a motorbike in the forest"
  • "cartoon style, a Iron-Man is driving a motorbike in the river"
    How should I set these two items?

thx

run_tuning.py cannot work

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This
error indicates that your module has parameters that were not used in producing loss. You can enable
unused parameter detection by passing the keyword argument find_unused_parameters=True to
torch.nn.parallel.DistributedDataParallel, and by
making sure all forward function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the
output tensors in the return value of your module's forward function. Please include the loss function
and the structure of the return value of forward of your module when reporting this issue (e.g. list,
dict, iterable).
Parameters which did not receive grad for rank 0:
down_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight,
down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight,
down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.0.attentions.0.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.0.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.2.attentions.0.transformer_blocks.0.attn_temp.to_out.0.bias,
down_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_q.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_k.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_v.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_out.0.weight,
down_blocks.2.attentions.1.transformer_blocks.0.attn_temp.to_out.0.bias
Parameter indices which did not receive grad for rank 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

have a error MODEL_LIBRARY_ORG_NAME

Hi I have a error with python app_gradio.py
"Traceback (most recent call last):
File "D:\Video-P2P\app_gradio.py", line 14, in
from gradio_utils.app_training import create_training_demo
File "D:\Video-P2P\gradio_utils\app_training.py", line 9, in
from constants import MODEL_LIBRARY_ORG_NAME, SAMPLE_MODEL_REPO, UploadTarget
ImportError: cannot import name 'MODEL_LIBRARY_ORG_NAME' from 'constants' (C:\ProgramData\Anaconda3\envs\p2p\lib\site-packages\constants.py)"

Version of xformers

Thanks for your great work.

I cannot run the script run_videop2p.py on either 4090 or A5000. I have tried three versions of xformers.

xformers 0.0.15.dev0+0bad001.d20230429 will lead to this error:

NotImplementedError: No operator found for this attention: AttentionOpDispatch(dtype=torch.float32, device=device(type='cuda', index=0), k=80, has_dropout=False, attn_bias_type=<class 'NoneType'>, kv_len=1024, q_len=1024, kv=80, batch_size=64, num_heads=1, has_custom_scale=False, requires_grad=True)

xformers 0.0.16 and xformers 0.0.17 will lead to out-of-memoryerror.

Also, I have tried both pytorch 1.12.1 and 1.13.1. Neither of them work.

May I know the xformers version you use?

Error downloading model

Upon running python run_tuning.py --config="configs/rabbit-jump-tune.yaml" I am getting the error

We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like /data/stable-diffusion/stable-diffusion-v1-5 is not the path to a directory containing a scheduler_config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.

Is there somewhere I should download the model from? The README doesn't mention where to download a pretrained model and the huggingface space doesn't have any model checkpoints.

Run run_videop2p.py but get nan

25c4b26148ccbd10d353de32c570157 As shown in this fig, the LDM outputs nan and the generated images are all black, could you please tell me some possible reason?

The problem with rabbit-jump-p2p.yaml

Hi, I have a problem with rabbit-jump-p2p.yaml.
After I train the model using rabbit-jump-tune.yaml, the fine-tuned checkpoint is stored in the output folder, and when I use rabbit-jump-tune.yaml to edit the video, what the road should I use in pretrained_model_path config(line 1), since there are two folders (stable-diffusion-v1.5 and output) I have to load the model.
Thanks for answering!

Stage 2 Video Length

Thanks for releasing this great work!

Is it possible to increase the video_len from 8 frames without incurring significant additional memory cost in stage 2 editing?

How to calculate quantitative metrics

Hi,
I am really appreciate your excellent work.
Could you please tell more details about how you calculate theses metrics?
the paper demonstrate 4 different quantitative metrics(3 proposed before, 1 proposed in your paper) and said can find details in appendix, however, I can't find the appendix. I am wondering if you can tell more details about how to calculate them. e.g. what is the mask in M.PSNR, how to calculate PSNR(averaged PSNR of R,G,B channel or take RGB as a whole then divided by 3?) the version you use for LPIPS, averaged frame LPIPS to represent video LPIPS?, the definition of OSV, etc.

Thanks

Complete dataset

Hi,

Thank you for sharing the code. I was wondering if you will release the complete data on which the model was trained on because the dataset you shared here contains only the testing data.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.