Giter VIP home page Giter VIP logo

dynamicnerf's Introduction

Dynamic View Synthesis from Dynamic Monocular Video

arXiv

Project Website | Video | Paper

Dynamic View Synthesis from Dynamic Monocular Video
Chen Gao, Ayush Saraf, Johannes Kopf, Jia-Bin Huang
in ICCV 2021

Setup

The code is test with

  • Linux (tested on CentOS Linux release 7.4.1708)
  • Anaconda 3
  • Python 3.7.11
  • CUDA 10.1
  • 1 V100 GPU

To get started, please create the conda environment dnerf by running

conda create --name dnerf python=3.7
conda activate dnerf
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv -c pytorch
pip install imageio scikit-image configargparse timm lpips

and install COLMAP manually. Then download MiDaS and RAFT weights

ROOT_PATH=/path/to/the/DynamicNeRF/folder
cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/weights.zip
unzip weights.zip
rm weights.zip

Dynamic Scene Dataset

The Dynamic Scene Dataset is used to quantitatively evaluate our method. Please download the pre-processed data by running:

cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip

Training

You can train a model from scratch by running:

cd $ROOT_PATH/
python run_nerf.py --config configs/config_Balloon2.txt

Every 100k iterations, you should get videos like the following examples

The novel view-time synthesis results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/novelviewtime. novelviewtime

The reconstruction results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset. testset

The fix-view-change-time results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_view000. testset_view000

The fix-time-change-view results will be saved in $ROOT_PATH/logs/Balloon2_H270_DyNeRF/testset_time000. testset_time000

Rendering from pre-trained models

We also provide pre-trained models. You can download them by running:

cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/logs.zip
unzip logs.zip
rm logs.zip

Then you can render the results directly by running:

python run_nerf.py --config configs/config_Balloon2.txt --render_only --ft_path $ROOT_PATH/logs/Balloon2_H270_DyNeRF_pretrain/300000.tar

Evaluating our method and others

Our goal is to make the evaluation as simple as possible for you. We have collected the fix-view-change-time results of the following methods:

NeRF
NeRF + t
Yoon et al.
Non-Rigid NeRF
NSFF
DynamicNeRF (ours)

Please download the results by running:

cd $ROOT_PATH/
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/results.zip
unzip results.zip
rm results.zip

Then you can calculate the PSNR/SSIM/LPIPS by running:

cd $ROOT_PATH/utils
python evaluation.py
PSNR / LPIPS Jumping Skating Truck Umbrella Balloon1 Balloon2 Playground Average
NeRF 20.99 / 0.305 23.67 / 0.311 22.73 / 0.229 21.29 / 0.440 19.82 / 0.205 24.37 / 0.098 21.07 / 0.165 21.99 / 0.250
NeRF + t 18.04 / 0.455 20.32 / 0.512 18.33 / 0.382 17.69 / 0.728 18.54 / 0.275 20.69 / 0.216 14.68 / 0.421 18.33 / 0.427
NR NeRF 20.09 / 0.287 23.95 / 0.227 19.33 / 0.446 19.63 / 0.421 17.39 / 0.348 22.41 / 0.213 15.06 / 0.317 19.69 / 0.323
NSFF 24.65 / 0.151 29.29 / 0.129 25.96 / 0.167 22.97 / 0.295 21.96 / 0.215 24.27 / 0.222 21.22 / 0.212 24.33 / 0.199
Ours 24.68 / 0.090 32.66 / 0.035 28.56 / 0.082 23.26 / 0.137 22.36 / 0.104 27.06 / 0.049 24.15 / 0.080 26.10 / 0.082

Please note:

  1. The numbers reported in the paper are calculated using TF code. The numbers here are calculated using this improved Pytorch version.
  2. In Yoon's results, the first frame and the last frame are missing. To compare with Yoon's results, we have to omit the first frame and the last frame. To do so, please uncomment line 72 and comment line 73 in evaluation.py.
  3. We obtain the results of NSFF and NR NeRF using the official implementation with default parameters.

Train a model on your sequence

  1. Set some paths
ROOT_PATH=/path/to/the/DynamicNeRF/folder
DATASET_NAME=name_of_the_video_without_extension
DATASET_PATH=$ROOT_PATH/data/$DATASET_NAME
  1. Prepare training images and background masks from a video.
cd $ROOT_PATH/utils
python generate_data.py --videopath /path/to/the/video
  1. Use COLMAP to obtain camera poses.
colmap feature_extractor \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--ImageReader.mask_path $DATASET_PATH/background_mask \
--ImageReader.single_camera 1

colmap exhaustive_matcher \
--database_path $DATASET_PATH/database.db

mkdir $DATASET_PATH/sparse
colmap mapper \
    --database_path $DATASET_PATH/database.db \
    --image_path $DATASET_PATH/images_colmap \
    --output_path $DATASET_PATH/sparse \
    --Mapper.num_threads 16 \
    --Mapper.init_min_tri_angle 4 \
    --Mapper.multiple_models 0 \
    --Mapper.extract_colors 0
  1. Save camera poses into the format that NeRF reads.
cd $ROOT_PATH/utils
python generate_pose.py --dataset_path $DATASET_PATH
  1. Estimate monocular depth.
cd $ROOT_PATH/utils
python generate_depth.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/midas_v21-f6b98070.pt
  1. Predict optical flows.
cd $ROOT_PATH/utils
python generate_flow.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/raft-things.pth
  1. Obtain motion mask (code adapted from NSFF).
cd $ROOT_PATH/utils
python generate_motion_mask.py --dataset_path $DATASET_PATH
  1. Train a model. Please change expname and datadir in configs/config.txt.
cd $ROOT_PATH/
python run_nerf.py --config configs/config.txt

Explanation of each parameter:

  • expname: experiment name
  • basedir: where to store ckpts and logs
  • datadir: input data directory
  • factor: downsample factor for the input images
  • N_rand: number of random rays per gradient step
  • N_samples: number of samples per ray
  • netwidth: channels per layer
  • use_viewdirs: whether enable view-dependency for StaticNeRF
  • use_viewdirsDyn: whether enable view-dependency for DynamicNeRF
  • raw_noise_std: std dev of noise added to regularize sigma_a output
  • no_ndc: do not use normalized device coordinates
  • lindisp: sampling linearly in disparity rather than depth
  • i_video: frequency of novel view-time synthesis video saving
  • i_testset: frequency of testset video saving
  • N_iters: number of training iterations
  • i_img: frequency of tensorboard image logging
  • DyNeRF_blending: whether use DynamicNeRF to predict blending weight
  • pretrain: whether pre-train StaticNeRF

License

This work is licensed under MIT License. See LICENSE for details.

If you find this code useful for your research, please consider citing the following paper:

@inproceedings{Gao-ICCV-DynNeRF,
    author    = {Gao, Chen and Saraf, Ayush and Kopf, Johannes and Huang, Jia-Bin},
    title     = {Dynamic View Synthesis from Dynamic Monocular Video},
    booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
    year      = {2021}
}

Acknowledgments

Our training code is build upon NeRF, NeRF-pytorch, and NSFF. Our flow prediction code is modified from RAFT. Our depth prediction code is modified from MiDaS.

dynamicnerf's People

Contributors

gaochen315 avatar zhywanna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamicnerf's Issues

The motion mask

Hi, how to generate such an accurate motion mask? I ran the generate_motion_mask.py, but the generated mask is very messy.

code about generate motion mask

Nice work, but I have some problems with the code of generate motion mask. In utils/generate_motion_mask.py line 128-131, why do h and w have to be divided by 2, which confuses me very much.

Missing dependency

Hi @gaochen315 ,

Thanks a lot for releasing the code for DynamicNeRF! This is one of the smoothest experiences I had so far when it comes to running a Github repo - great work!

Just one small thing I noticed is that scikit-image seems to be missing in the dependencies. When I run the motion mask generation script, an import error is raised.

no video output

Dear gaochen, thanks for shaing your great job!
I followed your lead and trained around 240k step currently, but i got no video as you mentioned in readme Every 100k iterations, you should get videos like the following examples
and the filename is not the same as your's
image

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Step: 49993, Loss: 0.013911506161093712, Time: 0.13519692420959473, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Step: 49994, Loss: 0.012133732438087463, Time: 0.1316220760345459, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Step: 49995, Loss: 0.01327237207442522, Time: 0.1362135410308838, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Step: 49996, Loss: 0.01176927424967289, Time: 0.12955403327941895, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Step: 49997, Loss: 0.012316515669226646, Time: 0.13320422172546387, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Step: 49998, Loss: 0.012100623920559883, Time: 0.1324453353881836, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Step: 49999, Loss: 0.011780014261603355, Time: 0.13549208641052246, chain_5frames: False, expname: Balloon2_H270_DyNeRF_pretrain
Traceback (most recent call last):
File "run_nerf.py", line 763, in
train()
File "run_nerf.py", line 452, in train
**render_kwargs_train)
File "/DynamicNeRF/render_utils.py", line 98, in render
rays, chunk, **kwargs)
File "/DynamicNeRF/render_utils.py", line 26, in batchify_rays
all_ret = {k: torch.cat(all_ret[k], 0) for k in all_ret}
File "/DynamicNeRF/render_utils.py", line 26, in
all_ret = {k: torch.cat(all_ret[k], 0) for k in all_ret}
TypeError: expected Tensor as element 0 in argument 0, but got tuple

An issue about generating mask

Hi, thanks for your great work.

I am wondering why only consider the foreground as a person.

As shown in the umbrella case, the umbrella is not a person, but also successfully segmented.
image
image

Fail to reproduce COLMAP results on NVIDIA Dynamic Scene

Hi @gaochen315 , first very thanks for open-sourcing this great work!

When I visualize the provided COLMAP results on "Truck", I find the moving truck is also reconstructed (red circle in the figure below). This is strange, since it should have been filtered out by "background mask", right?
image

I follow the suggested COLMAP pipeline (https://github.com/gaochen315/DynamicNeRF#train-a-model-on-your-sequence) to estimate camera poses by myself, and got the following results
image
The problem is, my estimated camera poses are not very correct (We know this dataset is captured by 12 cameras stacked on 2 levels). Therefore, I wonder if I should set some parameters specially. Do you have some special settings when applying COLMAP on NVIDIA Dynamic Scene dataset?

Thank you again and looking forward to your reply!

数据集下载

您好,为什么我无法打开动态数据集链接进行下载呢

run on my own data

Hello, I tried your great work on my own video record by phone. But I failed on the following colmap's command
colmap mapper \ --database_path $DATASET_PATH/database.db \ --image_path $DATASET_PATH/images_colmap \ --output_path $DATASET_PATH/sparse \ --Mapper.num_threads 16 \ --Mapper.init_min_tri_angle 4 \ --Mapper.multiple_models 0 \ --Mapper.extract_colors 0
like this
355d17c61712147ea7347bda2010a02

firstly, I have some difficult to understand what does these Mapper.init_min_tri_angle Mapper.multiple_models Mapper.extract_colors mean? And what's your purpose to alter those parameters rather than using default ?
Secondly, I removed the last 4 parameters, things go well and successfully got my camera.bin. I wonder how those affect the result. And I'm worry about by remove the last 4 parameters Mapper.num_threads Mapper.init_min_tri_angle Mapper.multiple_models Mapper.extract_colors, whether the extrinsic camera poses I get are still accurate or not ?

Hoping for your reply, thanks!

KeyError: 'network_fn_d_state_dict'

2022-09-07 15-49-13 的屏幕截图
Fixing random seed 1
factor 2
(270, 480, 3, 12)
(270, 480, 12)
(270, 480, 12)
(270, 480, 2, 12)
(270, 480, 12)
Loaded ./data/Balloon2/ 45.061882503348485 71.07477444180932
Loaded llff (12, 270, 480, 3) (60, 3, 5) [270. 480. 418.96216] ./data/Balloon2/
DEFINING BOUNDS
NEAR FAR 0.0 1.0
Found ckpts ['./logs/Balloon2_H270_DyNeRF_pretrain/300000.tar', './logs/Balloon2_H270_DyNeRF_pretrain/Pretrained_S.tar']
Reloading from ./logs/Balloon2_H270_DyNeRF_pretrain/Pretrained_S.tar
Traceback (most recent call last):
File "run_nerf.py", line 768, in
train()
File "run_nerf.py", line 215, in train
render_kwargs_train, render_kwargs_test, start, grad_vars, optimizer = create_nerf(args)
File "/home/chenghuan/DynamicNeRF/folder/run_nerf_helpers.py", line 355, in create_nerf
model_d.load_state_dict(ckpt['network_fn_d_state_dict'])
KeyError: 'network_fn_d_state_dict'

Hi, thanks for your great work.

I wonder why there is no 'network_fn_d_state_dict' in ckpt.

Your model training time is slow . Can you take advantage of instant ngp-related work to increase speed?

我很疑惑为什么在我运行时ckpt中没有'network_fn_d_state_dict',您的模型训练时间很慢,我们能否利用 instant ngp 相关工作来提升您的模型的速度.期待您的回复.

How long does it usually take to train a model?

Hi. I'm following the steps on Train a model on your sequence and reached step 7, but it is running like forever. It has been over 83k steps in pretraining and it's still on-going:

...
Pretraining step: 83398, Loss: 0.0014436094788834453, Time: 0.125885009765625, expname: kids-at-playground
Pretraining step: 83399, Loss: 0.0024164789356291294, Time: 0.12565231323242188, expname: kids-at-playground
Pretraining step: 83400, Loss: 0.0017029627924785018, Time: 0.1292412281036377, expname: kids-at-playground
Pretraining step: 83401, Loss: 0.0016428650123998523, Time: 0.1261436939239502, expname: kids-at-playground
Pretraining step: 83402, Loss: 0.0021417145617306232, Time: 0.12614750862121582, expname: kids-at-playground

Do you have the original video file?

I need the original video file for some experiments. Have you kept it? The current frame-level images are too few, and compressing them into a video makes it uncomfortable to watch.

Windows Support

Hi all,

Has anyone tried running the code on Windows? Does the framework use any Linux specific libraries? Is there any information regarding the average time necessary to run the forward-pass with the pretrained models?

Best Regards
-E

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.