g-u-n / be-your-outpainter Goto Github PK

View Code? Open in Web Editor NEW

187.0 11.0 5.0 4.2 MB

[ECCV 2024] Be-Your-Outpainter https://arxiv.org/abs/2403.13745

Home Page: https://be-your-outpainter.github.io

Shell 0.05% Python 99.95%

video-generation video-outpainting

be-your-outpainter's Introduction

Be-Your-Outpainter

outpaint.mp4

Run

Install Environment

conda env create -f environment.yml

Downloads

Download the models folder from Huggingface.

git clone https://huggingface.co/wangfuyun/Be-Your-Outpainter

Run the code for basic testing. Single GPU with 20GB memory is required for current code version. Reduce the video length if GPU memory is limited.

bash run.sh

Check the outpainted results from the `results' folder.

Outpaint Your Own Videos

Edit the exp.yaml to outpaint your own videos.

exp: # Name of your task

  train_data:
    video_path: "data/outpaint_videos/SB_Dog1.mp4"                            # source video path
    prompt: "a cute dog, garden, flowers"                                     # source video prompts for tuning
    n_sample_frames: 16                                                       # source video length
    width: 256                                                                # source video width
    height: 256                                                               # source video height
    sample_start_idx: 0                                                       # set to 0 by default. Sampling frames from the beginning of the video
    sample_frame_rate: 1                                                      # fps of video 
  
  validation_data:
    prompts:
      - "a cute dog, garden, flowers"                                         # prompts applied for outpainting. 
    prompts_l:
      - "wall"
    prompts_r:
      - "wall"
    prompts_t:
      - ""
    prompts_b:
      - ""

    prompts_neg:
      - ""


    is_grid: False                                                            # set as True to enable prompts_r, prompts_l, prompts_t, prompts_b 
    video_length: 16                                                          # video length. The same as in the train_data config
    width: 256
    height: 256

    scale_l: 0
    scale_r: 0
    scale_t: 0.5                                                              # How to expand the video field. For a 512x512 source video. Set scale_l and scale_r to 0.5, and it will generate 512x(512 + 512 * 0.5 + 512 * 0.5) = 512 x 1024 video.
    scale_b: 0.5

    window_size: 16                                                           # only used in longer video outpainting
    stride: 4


    repeat_time: 0                                                            # set to 4 enable noise regret
    jump_length: 3

    num_inference_steps: 50                                                   # inference steps for outpainting
    guidance_scale: 7.5             


    bwd_mask: null                                                            # not applied
    fwd_mask: null
    bwd_flow: null
    fwd_flow: null

    warp_step: [0,0.5]
    warp_time: 3

  mask_config:                                                                # how to set mask for tuning
    mask_l: [0., 0.4]
    mask_r: [0., 0.4]
    mask_t: [0., 0.4]
    mask_b: [0., 0.4]

Cite

@article{wang2024your,
  title={Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation},
  author={Wang, Fu-Yun and Wu, Xiaoshi and Huang, Zhaoyang and Shi, Xiaoyu and Shen, Dazhong and Song, Guanglu and Liu, Yu and Li, Hongsheng},
  journal={arXiv preprint arXiv:2403.13745},
  year={2024}
}

be-your-outpainter's People

Contributors

Stargazers

Watchers

Forkers

logtd wikipedia2008 paperwave jackzhousz aman-ladwa

be-your-outpainter's Issues

reproduce results on DAVIS datasets

Hi! Thanks for the great work!

I’ve been trying to reproduce the results on the DAVIS dataset shown in the demo video, but the results were not satisfactory. I assume it is caused by the hyper parameters and text prompt. Could you please share the configs and the prompts you are using on the DAVIS datasets?

Thank you very much!

lucia	rollerblade

What configuration to tune to avoid out of memory error?

Hi again,

I tried to execute run.sh in which I kept only one task (SB_dog1) in config/exp.yaml. However, I faced with out of memory issue.

I understand that it requires a lot of memory but wondering if there are any parameters in config/exp.yaml or elsewhere I could adjust to reduce the amount of memory usage.

Thank you very much for your help.

One question about the paper: what is the "Direct-tune"?

Excellent work. In looking at your paper, it mentions "Direct-tune" which is described as:
"Direct-tune" refers to the approach of directly fitting the original video without outpainting training.
I don't understand it very well here, can I ask for a more specific explanation?

Bugs for is_gird=True

When turn on is_grid, errors happen.

In pipelineoutpaint.py line 795, should change [..., :height,:, :] to [..., :height, :].
In pipelineoutpaint.py line 815, should change text_embeddings to text_embeddings["text_embedding"]

K value

Excellent work. In the output dir there are results with different K values(1/2, 1, 2). Which value are used in the paper?

Long video outpainting

Hi,
I noticed that currently, the inference does not support long video outpainting. The window size and stride parameters are not used in the inference pipeline.
When will the code be updated?

Generally speaking, how long does it take for us to train on a specific 5s video? thanks

Is the project suitable for image?

Are such experimental results normal

I tried run_long.sh to reproduce the result of Davis. The result suffers from obvious artifact.
I noticed that in another issue you mentioned that run_long.sh produce worse result than run.sh. Which one do you use to conduct the experiment?

Will the code become available and questions on pipeline.

Hello, your work on video outpainting is very impressive!

I have a couple of questions and look forward to your answers :)

Will the code become available at any point and if so, when?
How did you outpaint a video using AnimateDiff? Can you explain how you manage to do that please?

Thank you very much!

Will you release the training code?

Thank you for the awesome work, is there any plan to release the training code?

about the temporal module

Hi! Thanks for sharing such great work!

I have a question about the temporal module you are using. According to your paper, your temporal module is initialized with AnimateDiff's. Did you fine-tune it, or just directly integrate in your model?

Thanks!