Giter VIP home page Giter VIP logo

modelscope-image-to-video-v2's Introduction

Modelscope text to video synthesis model transformed to be text + image to video

Training and inference scripts for creating videos from a static image

This project contains the code for transforming the modelscope text to video synthesis model to a text + image to video model and the code for generating videos with the new model.

This is an alternative and improved version of this repo.

Getting Started

Installation

git clone https://github.com/motexture/ms-image-to-video
cd ms-image-to-video

Python Requirements

pip install deepspeed
pip install -r requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Deepspeed is required if you want to use the training script.

Preparing the config file for training

Open the training.yaml file and modify the parameters according to your needs.
upgrade_model should be True if you want to transform an old text to video modelscope model to the new text + image to video model.

Train

deepspeed train.py --config training.yaml

Running inference

The inference.py script can be used to render videos with trained checkpoints.

Using a custom 2d text to image diffusion model for image conditioning:

python inference.py \
  --model checkpoint-path \
  --prompt "an astronaut is walking on the moon" \
  --model-2d stabilityai/stable-diffusion-2-1 \
  --num-frames 16 \
  --width 512 \
  --height 512 \
  --times 1 \
  --sdp

Animating a static image:

python inference.py \
  --model checkpoint-path \
  --prompt "an astronaut is walking on the moon" \
  --init-image "image.png" \
  --num-frames 16 \
  --times 1 \
  --sdp

Creating infinite length videos by using the last frame as the new init image and by increasing the --times parameter:

python inference.py \
  --model checkpoint-path \
  --prompt "an astronaut is walking on the moon" \
  --model-2d stabilityai/stable-diffusion-2-1 \
  --num-frames 16 \
  --width 512 \
  --height 512 \
  --times 4 \
  --sdp

Shoutouts

modelscope-image-to-video-v2's People

Contributors

bfasenfest avatar motexture avatar

Forkers

peterzs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.