Giter VIP home page Giter VIP logo

-t-3ds-'s Introduction

WonderJourney: Going from Anywhere to Everywhere

a arXiv twitter

alice.mp4
real_campus_long.mp4

Getting Started

Installation

For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 24GB GPU memory to run.

Clone the repo and create the environment:

git clone https://github.com/KovenYu/WonderJourney.git
cd WonderJourney
mamba create --name wonderjourney python=3.10
mamba activate wonderjourney

We are using Pytorch3D to perform rendering. Run the following commands to install it or follow their installation guide (it may take some time).

mamba install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
mamba install -c fvcore -c iopath -c conda-forge fvcore iopath
mamba install -c bottler nvidiacub
mamba install pytorch3d -c pytorch3d

Install the rest of the requirements:

pip install -r requirements.txt

Load English language model for spacy:

python -m spacy download en_core_web_sm

Export your OpenAI api_key (since we use GPT-4 to generate scene descriptions):

export OPENAI_API_KEY='your_api_key_here'

Download Midas DPT model and put it to the root directory.

wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt

Run examples

  • Example config file

    To run an example, first you need to write a config. An example config ./config/village.yaml is shown below:

    runs_dir: output/56_village
    
    example_name: village
    
    seed: -1
    frames: 10
    save_fps: 10
    
    finetune_decoder_gen: True
    finetune_decoder_interp: False  # Turn on this for higher-quality rendered video
    finetune_depth_model: True
    
    num_scenes: 4
    num_keyframes: 2
    use_gpt: True
    kf2_upsample_coef: 4
    skip_interp: False
    skip_gen: False
    enable_regenerate: True
    
    debug: True
    inpainting_resolution_gen: 512
    
    rotation_range: 0.45
    rotation_path: [0, 0, 0, 1, 1, 0, 0, 0]
    camera_speed_multiplier_rotation: 0.2
    

    The total frames of the generated example is num_scenes $\times$ num_keyframes. You can manually adjust rotation_path in the config file to control the rotation state of the camera in each frame. A value of $0$ indicates moving straight, $1$ signifies a right turn, and $-1$ indicates a left turn.

  • Run

    python run.py --example_config config/village.yaml
    

    You will see results in output/56_village/{time-string}_merged.

How to add more examples?

We highly encourage you to add new images and try new stuff! You would need to do the image-caption pairing separately (e.g., using DALL-E to generate image and GPT4V to generate description).

  • Add a new image in ./examples/images/.

  • Add content of this new image in ./examples/examples.yaml.

    Here is an example:

    - name: new_example
      image_filepath: examples/images/new_example.png
      style_prompt: DSLR 35mm landscape
      content_prompt: scene name, object 1, object 2, object 3
      negative_prompt: ''
      background: ''
    • content_prompt: "scene name", "object 1", "object 2", "object 3"

    • negative_prompt and background are optional

    For controlled journey, you need to add control_text. Examples are as follow:

    - name: poem_jiangxue
      image_filepath: examples/images/60_poem_jiangxue.png
      style_prompt: black and white color ink painting
      content_prompt: Expansive mountainous landscape, old man in traditional attire, calm river, mountains
      negative_prompt: ""
      background: ""
      control_text: ["千山鸟飞绝", "万径人踪灭", "孤舟蓑笠翁", "独钓寒江雪"]
      
    - name: poem_snowy_evening
      image_filepath: examples/images/72_poem_snowy_evening.png
      style_prompt: Monet painting
      content_prompt: Stopping by woods on a snowy evening, woods, snow, village
      negative_prompt: ""
      background: ""
      control_text: ["Snowy Woods and Farmhouse: A secluded farmhouse, a frozen lake, a dense thicket, a quiet meadow, a chilly wind, a pale twilight, a covered bridge, a rustic fence, a snow-laden tree, and a frosty ground", "The Traveler's Horse: A restless horse, a jingling harness, a snowy mane, a curious gaze, a sturdy hoof, a foggy breath, a leather saddle, a woolen blanket, a frost-covered tail, and a patient stance", "Snowfall in the Woods: A gentle snowflake, a whispering wind, a soft flurry, a white blanket, a twinkling icicle, a bare branch, a hushed forest, a crystalline droplet, a serene atmosphere, and a quiet night", "Deep, Dark Woods in the Evening: A mysterious grove, a shadowy tree, a darkened sky, a hidden trail, a silent owl, a moonlit glade, a dense underbrush, a quiet clearing, a looming branch, and an eerie stillness"]
  • Write a config config/new_example.yaml like ./config/village.yaml for the new example

  • Run

    python run.py --example_config config/new_example.yaml

Citation

@article{yu2023wonderjourney,
  title={WonderJourney: Going from Anywhere to Everywhere},
  author={Yu, Hong-Xing and Duan, Haoyi and Hur, Junhwa and Sargent, Kyle and Rubinstein, Michael and Freeman, William T and Cole, Forrester and Sun, Deqing and Snavely, Noah and Wu, Jiajun and Herrmann, Charles},
  journal={arXiv preprint arXiv:2312.03884},
  year={2023}
}

Acknowledgement

We appreciate the authors of SceneScape, MiDaS, SAM, Stable Diffusion, and OneFormer to share their code.

-t-3ds-'s People

Contributors

kovenyu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.