Giter VIP home page Giter VIP logo

4dfy's Introduction

4D-fy - threestudio

Random Sample

| Project Page | Paper | User Study Template | threestudio extension |

Installation

Install threestudio

This part is the same as original threestudio. Skip it if you already have installed the environment.

  • You must have an NVIDIA graphics card with at least 24 GB VRAM and have CUDA installed.
  • Install Python >= 3.8.
  • (Optional, Recommended) Create a virtual environment:
python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip
  • Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.
# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
  • (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
pip install ninja
  • Install dependencies:
pip install -r requirements.txt

Install MVDream

MVDream multi-view diffusion model is provided in a different codebase. Install it by:

git clone https://github.com/bytedance/MVDream extern/MVDream
pip install -e extern/MVDream 

Quickstart

Our model is trained in 3 stages and there are three different config files for every stage. Training has to be resumed after finishing a stage.

seed=0
gpu=0
exp_root_dir=/path/to

# Original configs used in paper with 80 GB GPU memory

# Stage 1
# python launch.py --config configs/fourdfy_stage_1.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing"

# Stage 2
# ckpt=/path/to/fourdfy_stage_1/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_2.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# Stage 3
# ckpt=/path/to/fourdfy_stage_2/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_3.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# Low memory configs for 24-48 GB GPU memory

# Stage 1
# python launch.py --config configs/fourdfy_stage_1_low_vram.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing"

# Stage 2
# ckpt=/path/to/fourdfy_stage_1_low_vram/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_2_low_vram.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# Stage 3
# ckpt=/path/to/fourdfy_stage_2_low_vram/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_3_low_vram.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt


### Alternatives
# Use VideoCrafter2 in stage 3

# ckpt=/path/to/fourdfy_stage_2/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_3_vc.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# ckpt=/path/to/fourdfy_stage_2_low_vram/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_3_low_vram_vc.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# Use deformation based approach to preserve quality in dynamic stage

# Stage 1
# python launch.py --config configs/fourdfy_stage_1_low_vram_deformation.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing"

# Stage 2
# ckpt=/path/to/fourdfy_stage_1_low_vram_deformation/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_2_low_vram_deformation.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# Stage 3: Low memory configs for 24-48 GB GPU memory
# ckpt=/path/to/fourdfy_stage_2_low_vram_deformation/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_3_low_vram_vc_deformation.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

# Stage 3
# ckpt=/path/to/fourdfy_stage_2_low_vram_deformation/a_panda_dancing@timestamp/ckpts/last.ckpt
# python launch.py --config configs/fourdfy_stage_3_vc_deformation.yaml --train --gpu $gpu exp_root_dir=$exp_root_dir seed=$seed system.prompt_processor.prompt="a panda dancing" system.weights=$ckpt

Memory Usage

Depending on the text prompt, stage 3 might not fit on a 24-48 GB GPU, we trained our final models with an 80 GB GPU. There are ways to reduce memory usage to fit on smaller GPUs:

  • Use the _low_vram config files instead of the original ones
  • If it still does not fit your GPU memory, you can reduce system.renderer.base_renderer.train_max_nums
  • Another way is to reduce the rendering resolution for the video model with data.single_view.width_vid=144 and data.single_view.height_vid=80 (or even data.single_view.width_vid=72 and data.single_view.height_vid=40)
  • Mixed precision: trainer.precision=16-mixed
  • Memory efficient attention: Set system.guidance_video.enable_memory_efficient_attention=true
  • Furthermore, by setting data.single_view.num_frames=8, the number of frames can be reduced
  • Reducing the hash grid capacity in system.geometry.pos_encoding_config, e.g., system.geometry.pos_encoding_config.n_levels=8. For this, retraining of the first two stages is required though.

More tips

  • More motion. To increase the motion, the learning rate for the video model can be increased to system.loss.lambda_sds_video=0.3 or system.loss.lambda_sds_video=0.5.
  • Use VideoCrafter2 video guidance. In the paper we used ZeroScope, but there is an option in the train.sh to use VideoCrafter2 instead for more motion and higher quality.
  • Use deformation based approach. Instead of adding features from a static and dynamic hash grid, we also provide a deformation based approach in the train.sh to keep the static quality and only learn a deformation based motion.

Credits

This code is built on the threestudio-project, MVDream-threestudio, and VideoCrafter. Thanks to the maintainers for their contribution to the community!

Citing

If you find 4D-fy helpful, please consider citing:

@article{bahmani20244dfy,
  title={4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling},
  author={Bahmani, Sherwin and Skorokhodov, Ivan and Rong, Victor and Wetzstein, Gordon and Guibas, Leonidas and Wonka, Peter and Tulyakov, Sergey and Park, Jeong Joon and Tagliasacchi, Andrea and Lindell, David B.},
  journal={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2024}
}

4dfy's People

Contributors

sherwinbahmani avatar

Stargazers

Junwei Zhou avatar Jacek Blaszczynski avatar  avatar Hang Ye avatar White Heisenberg avatar  avatar Duotun Wang avatar  avatar shimomura kei avatar coco avatar  avatar Jun-Pu (Yi) ZHANG 张珺菩 avatar  avatar Yifan LIU avatar Griffin Seonho Lee avatar charlieguo avatar  avatar Ronghuan Wu avatar José Ramon Carias  avatar Jing Yang avatar lulihua avatar Tuan Duc Ngo avatar  avatar Jinseo Jeong avatar  avatar Liu Lijuan avatar  avatar  avatar  avatar JINGJUN TAO avatar Ha Yeon Kim (Jane) avatar  avatar YangXiuyu avatar Zhiyang Guo avatar wonseok oh avatar tukeping avatar  avatar Junyi Zhang avatar Jinxiu Liu avatar X-GAO avatar ZWB avatar Qi Sun 孙启 avatar Jiachen Tao avatar zeng-yifei avatar KeqiangSun avatar Jaeah Lee avatar Barışcan Kurtkaya avatar Kayla Firestack avatar Ying Jiang avatar  avatar Xuhai Chen avatar Sai madhav avatar Yingshu CHEN avatar Yiming Xie avatar Nippun Sharma avatar Xiao Pan  avatar Xiaokun Sun avatar Weihan Luo avatar mingde-yao avatar Penalty_kl avatar Xinzhou Wang avatar  avatar Gordon Guocheng Qian 钱国成 avatar Andy avatar  avatar Fu-Yun Wang avatar Adam avatar  avatar Bing avatar Aniket Agarwal avatar Ivan Skorokhodov avatar Le Pham Nhat Quynh avatar  avatar Yuqi HU avatar Yu Liu avatar lg(x) avatar Joseph avatar coderpiaobozhe avatar Haoge Deng avatar Jeff Carpenter avatar 周 avatar Xiaoxia (Shirley) Wu avatar xmu-xiaoma666 avatar Jie Wang avatar Hu Zhu avatar Runpei Dong avatar  avatar Chris Hu avatar  avatar Zhengyang Liang avatar  avatar wgqtmac avatar Yi Wei avatar YANHONG ZENG avatar  avatar Ceyuan Yang avatar Weiliang Chen avatar Nikki29 avatar Bilal ARIKAN avatar  avatar

Watchers

 avatar Yunlin Chen avatar Snow avatar Pyjcsx avatar <>(CK)<> avatar  avatar shimomura kei avatar

4dfy's Issues

train_dynamic_camera

Hi, could you please explain the purpose of the code related to "train_dynamic_camera"? It appears to have never been used?

FileNotFoundError: Text embedding file

Process SpawnProcess-1:
Traceback (most recent call last):
File "/tiamat-vePFS/share_data/minyue/miniconda3/envs/4dfy/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/tiamat-vePFS/share_data/minyue/miniconda3/envs/4dfy/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
TypeError: StableDiffusionPromptProcessor.spawn_func() takes 3 positional arguments but 4 were given
Traceback (most recent call last):
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/launch.py", line 178, in
main()
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/launch.py", line 107, in main
system: BaseSystem = threestudio.find(cfg.system_type)(
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/threestudio/systems/base.py", line 40, in init
self.configure()
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/threestudio/systems/fourdfy.py", line 41, in configure
self.prompt_processor_multi_view = threestudio.find(self.cfg.prompt_processor_type_multi_view)(
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/threestudio/utils/base.py", line 63, in init
self.configure(*args, **kwargs)
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/threestudio/models/prompt_processors/base.py", line 191, in configure
self.load_text_embeddings()
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/threestudio/models/prompt_processors/base.py", line 249, in load_text_embeddings
self.text_embeddings = self.load_from_cache(self.prompt)[None, ...]
File "/tiamat-vePFS/share_data/zengbohan/4D_search/4dfy-main/threestudio/models/prompt_processors/base.py", line 267, in load_from_cache
raise FileNotFoundError(
FileNotFoundError: Text embedding file .threestudio_cache/text_embeddings/c99df661fbb1a4d3ff4ef8dff7508dea.pt for model stabilityai/stable-diffusion-2-1-base and prompt [a panda dancing] not found.

I meet this problem and I do not figure out the reason.

More motion

Hi, according to another issue, I increase the system.loss.lambda_sds_video towards 1.0. Because based on the results displayed in the gallery, "a dog riding a skateboard" doesn't seem to have much motion, so I tried "a panda dancing".

When training for 50,000 epochs with system.loss.lambda_sds_video=0.1, I got,

it50000-0_video.mp4

When training for 50,000 epochs with system.loss.lambda_sds_video=1.0, I got,

it50000-0_video.mp4

And I found that, when training for 10,000 epochs with system.loss.lambda_sds_video=0.1,

it10000-0_video.mp4

when training for 10,000 epochs with system.loss.lambda_sds_video=1.0,

it10000-0_video.mp4

Thus, I guess that perhaps when using a small system.loss.lambda_sds_video, it requires longer training time to obtain more motion. However, when the training time is long enough, the impact of system.loss.lambda_sds_video seems not to be significant?

Optimization procedure in stage 3

Thanks again for sharing your work! According to the paper, all three losses are used in the third stage. However, in the code, it seems that VSD is not used in this stage?

cuda error

I followed the instructions but got incompatible cuda every time. It is quite weird that torch.cuda.is_available() is True at the start, but turns false at some point. Can you provide a more detailed instruction of how to get the environment right ?

visualize the result

Sorry I'm new to threestudio. I was wondering how can we visualize the result after training all the three stages.

I got a checkpoint after the training of stage 3. But how can I use it?

Failed in the frist step

hey, i failed in the first python, which is

python launch.py --config configs/fourdfy_stage_1.yaml --train --gpu 0 exp_root_dir=$exp_root_dir seed=0 system.prompt_processor.prompt="a dog riding a skateboard"

here is my log:
dreamer@e9c9125073a7:~/threestudio$ python launch.py --config configs/fourdfy_stage_1.yaml --train --gpu 0 exp_root_dir=$exp_root_dir seed=0 system.prompt_processor.prompt="a dog riding a skateboard"
Traceback (most recent call last):
File "/home/dreamer/threestudio/launch.py", line 178, in
main()
File "/home/dreamer/threestudio/launch.py", line 102, in main
cfg = load_config(args.config, cli_args=extras, n_gpus=n_gpus)
File "/home/dreamer/threestudio/threestudio/utils/config.py", line 85, in load_config
scfg = parse_structured(ExperimentConfig, cfg)
File "/home/dreamer/threestudio/threestudio/utils/config.py", line 99, in parse_structured
scfg = OmegaConf.structured(fields(**cfg))
File "", line 21, in init
File "/home/dreamer/threestudio/threestudio/utils/config.py", line 75, in post_init
self.exp_dir = os.path.join(self.exp_root_dir, self.name)
File "/usr/lib/python3.10/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

4dfy result

I have the same issue with #6 .
I try to generate the 'a_crocodile_playing_a_drum_set' with default setting and seed. i get the result with a_crocodile_playing_a_drum_set. is it the problem about seed?

Version Problem

we have to use the low version 'xformers' like 0.0.12 to satisfy torch==1.12.1+cu113. If using high version xformers, it will automatically download torch=2.1.2.

but low xformers.ops.memory_efficient_attention get the version problem and i have no idea to fix it.


4dfy/extern/MVDream/mvdream/ldm/modules/diffusionmodules/model.py", line 258, in forward

 out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=self.attention_op)

TypeError: memory_efficient_attention() got an unexpected keyword argument 'attn_bias'

Any recommended parameter settings in stage 3 about A100-40GB?

Hello, I encountered some issues in Stage 3. I tried all the parameter settings you recommended, and I can barely run it on my A100-40GB.
my settings:
num_samples_per_ray=512-->64.
width_vid=256-->72
height_vid=160-->40
num_frames=16-->8

The VRAM usage fluctuates between 30324/40536MB and 40420/40536MB. Are there optimal recommended parameters to achieve the best performance on my A100-40GB?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.