Giter VIP home page Giter VIP logo

be-your-outpainter's Introduction

Be-Your-Outpainter

diversity

outpaint.mp4

Run

  1. Install Environment
conda env create -f environment.yml
  1. Downloads

Download the models folder from Huggingface.

git clone https://huggingface.co/wangfuyun/Be-Your-Outpainter
  1. Run the code for basic testing. Single GPU with 20GB memory is required for current code version. Reduce the video length if GPU memory is limited.
bash run.sh

Check the outpainted results from the `results' folder.

Outpaint Your Own Videos

Edit the exp.yaml to outpaint your own videos.

exp: # Name of your task

  train_data:
    video_path: "data/outpaint_videos/SB_Dog1.mp4"                            # source video path
    prompt: "a cute dog, garden, flowers"                                     # source video prompts for tuning
    n_sample_frames: 16                                                       # source video length
    width: 256                                                                # source video width
    height: 256                                                               # source video height
    sample_start_idx: 0                                                       # set to 0 by default. Sampling frames from the beginning of the video
    sample_frame_rate: 1                                                      # fps of video 
  
  validation_data:
    prompts:
      - "a cute dog, garden, flowers"                                         # prompts applied for outpainting. 
    prompts_l:
      - "wall"
    prompts_r:
      - "wall"
    prompts_t:
      - ""
    prompts_b:
      - ""

    prompts_neg:
      - ""


    is_grid: False                                                            # set as True to enable prompts_r, prompts_l, prompts_t, prompts_b 
    video_length: 16                                                          # video length. The same as in the train_data config
    width: 256
    height: 256

    scale_l: 0
    scale_r: 0
    scale_t: 0.5                                                              # How to expand the video field. For a 512x512 source video. Set scale_l and scale_r to 0.5, and it will generate 512x(512 + 512 * 0.5 + 512 * 0.5) = 512 x 1024 video.
    scale_b: 0.5

    window_size: 16                                                           # only used in longer video outpainting
    stride: 4


    repeat_time: 0                                                            # set to 4 enable noise regret
    jump_length: 3

    num_inference_steps: 50                                                   # inference steps for outpainting
    guidance_scale: 7.5             


    bwd_mask: null                                                            # not applied
    fwd_mask: null
    bwd_flow: null
    fwd_flow: null

    warp_step: [0,0.5]
    warp_time: 3

  mask_config:                                                                # how to set mask for tuning
    mask_l: [0., 0.4]
    mask_r: [0., 0.4]
    mask_t: [0., 0.4]
    mask_b: [0., 0.4]

Cite

@article{wang2024your,
  title={Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation},
  author={Wang, Fu-Yun and Wu, Xiaoshi and Huang, Zhaoyang and Shi, Xiaoyu and Shen, Dazhong and Song, Guanglu and Liu, Yu and Li, Hongsheng},
  journal={arXiv preprint arXiv:2403.13745},
  year={2024}
}

be-your-outpainter's People

Contributors

g-u-n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

be-your-outpainter's Issues

reproduce results on DAVIS datasets

Hi! Thanks for the great work!

I’ve been trying to reproduce the results on the DAVIS dataset shown in the demo video, but the results were not satisfactory. I assume it is caused by the hyper parameters and text prompt. Could you please share the configs and the prompts you are using on the DAVIS datasets?

Thank you very much!

lucia rollerblade
sample-800-long_resized sample-800-long_resized

What configuration to tune to avoid out of memory error?

Hi again,

I tried to execute run.sh in which I kept only one task (SB_dog1) in config/exp.yaml. However, I faced with out of memory issue.

I understand that it requires a lot of memory but wondering if there are any parameters in config/exp.yaml or elsewhere I could adjust to reduce the amount of memory usage.

Thank you very much for your help.

One question about the paper: what is the "Direct-tune"?

Excellent work. In looking at your paper, it mentions "Direct-tune" which is described as:
"Direct-tune" refers to the approach of directly fitting the original video without outpainting training.
I don't understand it very well here, can I ask for a more specific explanation?

Bugs for is_gird=True

When turn on is_grid, errors happen.

In pipelineoutpaint.py line 795, should change [..., :height,:, :] to [..., :height, :].
In pipelineoutpaint.py line 815, should change text_embeddings to text_embeddings["text_embedding"]

K value

Excellent work. In the output dir there are results with different K values(1/2, 1, 2). Which value are used in the paper?

Long video outpainting

Hi,
I noticed that currently, the inference does not support long video outpainting. The window size and stride parameters are not used in the inference pipeline.
When will the code be updated?

Are such experimental results normal

Snipaste_2024-07-25_18-41-59 Snipaste_2024-07-25_18-41-42

I tried run_long.sh to reproduce the result of Davis. The result suffers from obvious artifact.
I noticed that in another issue you mentioned that run_long.sh produce worse result than run.sh. Which one do you use to conduct the experiment?

Will the code become available and questions on pipeline.

Hello, your work on video outpainting is very impressive!

I have a couple of questions and look forward to your answers :)

  1. Will the code become available at any point and if so, when?
  2. How did you outpaint a video using AnimateDiff? Can you explain how you manage to do that please?

Thank you very much!

about the temporal module

Hi! Thanks for sharing such great work!

I have a question about the temporal module you are using. According to your paper, your temporal module is initialized with AnimateDiff's. Did you fine-tune it, or just directly integrate in your model?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.