Giter VIP home page Giter VIP logo

m3ddm-video-outpainting's Introduction

M3DDM-Video-Outpainting

Official code for the paper:

Hierarchical Masked 3D Diffusion Model for Video Outpainting, ACM MM 23.

Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning Jiang, Chunjie Luo, Jianfeng Zhan

We propose a Masked 3D Diffusion Model (M3DDM) and a hybrid coarse-to-fine inference pipeline for video outpainting. Our method can not only generate high temporal consistency and reasonable outpainting results but also alleviate the problem of artifact accumulation in long video outpainting.

🔥News

Environment Setup

  1. Install PyTorch 2.0.1 with CUDA support via conda:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

Make sure you have Anaconda or Miniconda installed before running this command. This is our testing environment, but it can also run on versions of PyTorch greater than 1.10.0.

  1. Install the required dependencies from the requirements.txt file in this repository:
pip install -r requirements.txt

Downloads

Before you can run the project, you need to download the following:

  1. Pre-trained Stable Diffusion Model Weights:

    We used the VAE encoder and decoder inside Stable Diffusion Model. To get the pre-trained stable diffusion v1.5 weights, download them from the following link:

    https://huggingface.co/runwayml/stable-diffusion-v1-5

  2. Our Video-Outpainting Model Checkpoints:

    Our network architecture is based on modifications of the Stable Diffusion Model v1.5. To get the pre-trained model weights, download them from the following link:

    https://huggingface.co/alimama-creative/M3DDM-Video-Outpainting

Usage

You can run the inference code with the following command:

python src/inference.py --input_video_path "/path/to/your/input_video.mp4" \
        --pretrained_sd_dir "/path/to/your/stable_diffusion_weights" \
        --video_outpainting_model_dir "/path/to/your/video_outpainting_model" \
        --output_dir "/path/to/your/output_directory" \
        --target_ratio_list "9:16" \
        --copy_original

Parameters

target_ratio_list: This parameter specifies the aspect ratio for the output video. You can input a single value such as "1:1", "16:9", or "9:16", or you can input a list like "16:9,9:16". For better results, we recommend inputting a single value.

copy_original: This parameter specifies whether to replace the corresponding parts of the generated video with the original video. It is worth noting that the parts of the original video being replaced have already been resized to meet a resolution of 256.

Citation

If this repo is useful to you, please cite our paper.

@inproceedings{fan2023hierarchical,
  title={Hierarchical Masked 3D Diffusion Model for Video Outpainting},
  author={Fan, Fanda and Guo, Chaoxu and Gong, Litong and Wang, Biao and Ge, Tiezheng and Jiang, Yuning and Luo, Chunjie and Zhan, Jianfeng},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={7890--7900},
  year={2023}
}

Contact Us

Please feel free to reach out to us:

Acknowledgement

We heavily borrow the code from diffusers. Thanks for open-sourcing! We also gratefully acknowledge the Stable Diffusion for providing the SD1.5 Model Weights. Any third-party packages are owned by their respective authors and must be used under their respective licenses.

m3ddm-video-outpainting's People

Contributors

fanfanda avatar culeao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.