Giter VIP home page Giter VIP logo

sinnerf's Introduction

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image

[Paper] [Website]

Pipeline

Code

Environment

pip install -r requirements.txt

Dataset Preparation

Please download the datasets from these links:

Please download the depth from here: https://drive.google.com/drive/folders/13Lc79Ox0k9Ih2o0Y9e_g_ky41Nx40eJw?usp=sharing

Training

If you meet OOM issue, try:

  1. enable precision=16
  2. reduce the patch size --patch_size (or --patch_size_x, --patch_size_y) and enlarge the stride size --sH, --sW
NeRF synthetic
  • Step 1

    python train.py  --dataset_name blender_ray_patch_1image_rot3d  --root_dir  ../../dataset/nerf_synthetic/lego   --N_importance 64 --img_wh 400 400 --num_epochs 2000 --batch_size 1  --optimizer adam --lr 2e-4  --lr_scheduler steplr --decay_step 500 1000 --decay_gamma 0.5  --exp_name lego_s6 --with_ref --patch_size 64 --sW 6 --sH 6 --proj_weight 1 --depth_smooth_weight 0  --dis_weight 0 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 10 --scan 4
    
  • Step 2

    python train.py  --dataset_name blender_ray_patch_1image_rot3d  --root_dir  ../../dataset/nerf_synthetic/lego   --N_importance 64 --img_wh 400 400 --num_epochs 2000 --batch_size 1  --optimizer adam --lr 5e-5  --lr_scheduler steplr --decay_step 500 1000 --decay_gamma 0.5  --exp_name lego_s6_4ft --with_ref --patch_size 64 --sW 4 --sH 4 --proj_weight 1 --depth_smooth_weight 0  --dis_weight 0.01 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 0 --pt_model xxx.ckpt --nerf_only  --scan 4
    
LLFF
  • Step 1

    python train.py  --dataset_name llff_ray_patch_1image_proj  --root_dir  ../../dataset/nerf_llff_data/room   --N_importance 64 --img_wh 504 378 --num_epochs 2000 --batch_size 1  --optimizer adam --lr 2e-4  --lr_scheduler steplr --decay_step 500 1000 --decay_gamma 0.5  --exp_name llff_room_s4 --with_ref --patch_size_x 63 --patch_size_y 84 --sW 4 --sH 4 --proj_weight 1 --depth_smooth_weight 0  --dis_weight 0 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 10
    
  • Step 2

    python train.py  --dataset_name llff_ray_patch_1image_proj  --root_dir  ../../dataset/nerf_llff_data/room   --N_importance 64 --img_wh 504 378 --num_epochs 2000 --batch_size 1  --optimizer adam --lr 5e-5  --lr_scheduler steplr --decay_step 500 1000 --decay_gamma 0.5  --exp_name llff_room_s4_2ft --with_ref --patch_size_x 63 --patch_size_y 84 --sW 2 --sH 2 --proj_weight 1 --depth_smooth_weight 0  --dis_weight 0.01 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 0 --pt_model xxx.ckpt --nerf_only
    
DTU
  • Step 1

    python train.py  --dataset_name dtu_proj  --root_dir  ../../dataset/mvs_training/dtu   --N_importance 64 --img_wh 640 512 --num_epochs 2000 --batch_size 1  --optimizer adam --lr 2e-4  --lr_scheduler steplr --decay_step 500 1000 --decay_gamma 0.5  --exp_name dtu_scan4_s8 --with_ref --patch_size_y 70 --patch_size_x 56 --sW 8 --sH 8 --proj_weight 1 --depth_smooth_weight 0  --dis_weight 0 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 10 --scan 4
    
  • Step 2

    python train.py  --dataset_name dtu_proj  --root_dir  ../../dataset/mvs_training/dtu   --N_importance 64 --img_wh 640 512 --num_epochs 2000 --batch_size 1  --optimizer adam --lr 5e-5  --lr_scheduler steplr --decay_step 500 1000 --decay_gamma 0.5  --exp_name dtu_scan4_s8_4ft --with_ref --patch_size_y 70 --patch_size_x 56 --sW 4 --sH 4 --proj_weight 1 --depth_smooth_weight 0  --dis_weight 0.01 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 0 --pt_model xxx.ckpt --nerf_only  --scan 4
    

More finetuning with smaller strides benefits reconstruction quality.

Testing

python eval.py  --dataset_name llff  --root_dir /dataset/nerf_llff_data/room --N_importance 64 --img_wh 504 378 --model nerf --ckpt_path ckpts/room.ckpt --timestamp test

Please use --split val for NeRF synthetic dataset.

Acknowledgement

Codebase based on https://github.com/kwea123/nerf_pl . Thanks for sharing!

Citation

If you find this repo is helpful, please cite:


@InProceedings{Xu_2022_SinNeRF,
author = {Xu, Dejia and Jiang, Yifan and Wang, Peihao and Fan, Zhiwen and Shi, Humphrey and Wang, Zhangyang},
title = {SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image},
journal={arXiv preprint arXiv:2204.00928},
year={2022}
}

sinnerf's People

Contributors

ir1d avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sinnerf's Issues

Where is the decay strategy of stride in the code ?

Hi, Thanks for releasing the code of the awesome work. I am curious about the decay strategy of stride in Progressive Strided Ray Sampling. However, I can not find the correct place in the code. It seems that the stride is controlled by sW,sH in Dataset class but sW,sH are not modified during training. The same problem happens for the vit_weight and dis_weight, which are weights of global structure prior loss and local texture guidance loss.

The fragment in the paper about thedecay strategy of stride:
image

The fragment in the paper about the annealling of loss weights:
image

About paper results

Thanks for your great work. I want to know how you train DS-NeRF from a single view. DS-NeRF needs structure-from-motion to aquire 3D sparse points. Thanks in advance!
image

Training time

Hi,
Thanks for the great work and for uplaoding the training code.

Can you share the time it took to train the model room.ckpt and the device it was trained on so that I can have a reference for reproducing/ training for other scenes?

Thanks

There is a problem in image warping on LLFF dataset.

I appreciate your sharing of the great work. I have a question. I have been trying to reproduce your work. On the LLFF dataset, however, I can't obtain a reliable warping image by using the depth maps you provided on this website. The warping error seems critical when comparing the warped and target images. Although I aligned with your warping code, it doesn't work. I also test the image warping on the Blender dataset and it works well. Is there an additional process such as depth scaling? If so, please let me know.

GPU environments

Hi
I encountered the problem that CUDA out of memory.
My gpu spec is GeForce RTX 2080 Ti.
And I'm using DDP.

Could you tell me your gpu spec?
Also, I would appreciate it if you could tell me how to make a depth image .npy files.

Overall Loss Function

Hello! I have already read your paper and code. But I have something curious about the overall loss function. Because I don't know enough so sorry

you mentioned on paper :
we reduce the weight of global structure prior λ3 and increase the weight of local texture guidance λ2.
=> I don't know it is definitely right, Since you train twice, does that mean that each lambda shrinks and grows between experiments?
.....
In all our experiments, λ1, λ2, and λ3 are initialized to be 8, 0.1, and 0, respectively.
=> But code is not matched to parameter setting in your code.
.....
During the training process,
we gradually decrease λ2 to 0 and increase λ3 to 0.1 with a linear function.
=> I don't know what it means, so can you tell me about this and how can I find lamda weight on your code?

Training Code

Hi All,
Thank you for this repo and amazing paper!
Do you plan to release the training code anytime soon? (we would love that!)

Thanks.
@AvivNavon

Some minor issues and loss=nan problem

Hi, there!Sorry to reply you so late, I have been reading your code and running experiments recently.

But I ran into some small problems as follows,

1. When running the test code, the following code will make an error

python eval.py  \
       --dataset_name blender_ray_patch_1image_rot3d  \
       --root_dir ./synthetic_SinNeRF/lego/  \
       --N_importance 64 --img_wh 400 400 --model nerf \
       --ckpt_path ./ckpts/lego_s6_4ft/last.ckpt \
       --timestamp test

When the default value of the --split is test, an error will be reported here. note that frame

c2w = torch.FloatTensor(frame['transform_matrix'])[:3, :4]


2. The dtu file you uploaded is missing a part or the code is wrong?
image

depth_filename = os.path.join(self.root_dir, 'MVSNet_pytorch_outputs/',
f'scan{self.scan}/depth_est/rect_{vid + 1:03d}_{self.light_idx}_r5000.pfm')


3. loss nan problem
I am running the latest code on an RTX 3090, 24G, and environments are created using environment.yaml, but is still OOM, so I adjusted the patch_size, precision and --sH, --sW according to the README.

I set precision=16, and --sH, --sW remain the same(=6).

I found that the loss=nan problem will appear when the patch_size is too small, such as patch_size=8 or patch_size=16, even patch_size=32.

It works(without loss=nan) when patch_size=50, but that's not a good number is it?

I would be grateful if you could provide advice on how to deal with this.
Thank you !

Pretrained models

Thanks for sharing great work.
You have provided one pretrained model, 'room.ckpt'.
Do you have any plan to share other models as well?
If possible please share other ckpts trained on synthetic 360 and DTU.
Thank you so much in advance.

how to generate depth of LLFF dataset

Hi, Thanks for your amazing work.
I wonder how to generate your depth map of LLFF dataset?
Since for LLFF dataset, you provide depth of 5 scenes, could you please provide other 3 scenes? Or method for produce same depth?

How many scans are used to evaluate on DTU?

Thanks for sharing your excellent work!
I want to know How many scenes are used to evaluate on DTU dataset and what are the scenes indexes? (In your paper. Table.2 "We report average values across scenes")
It will be a huge convenience for me to know so that I can compare with the reported scores, thanks in advance!!!

Forward warping

Dear author
Thank you for your great work. And I am trying to incorporate your work into my pipeline. I found a problem that you use forward warping to get the new RGB information and the new depth. But according to my understanding, forward warping will create holes and loss accuracy. Please point it out if I am wrong. If you stick to the plan I mentioned, how do you solve the problem related with forward warping? Thank you so much!

Confusion about the inverse-depth-smooth-loss

Hello! I'm currently doing some research on NeRF, and I found your work SinNeRF.
It's an awesome work, and I download the code (in this repository).
While reading the code, I found that you use kornia.losses.inverse_depth_smoothness_loss to calculate the "self-supervised inverse depth smoothness loss" which you describe in the paper (equation 4) .
The conflict is that, while kornia.losses.inverse_depth_smoothness_loss using the first-order gradient of the RGB Image according to the official document and its source code, the equation in paper use the second-order gradient of RGB Image.
Thanks for any advice and help.

Regarding results in the table 2

Thank you so much for sharing your great work.
Regarding the results in table 2, I have the below questions.

Q1. Which scenes did you use for the evaluation and what is the reference camera ID?
According to the files and codes, I am assuming that you evaluated on 19 scenes, such as below,
[1, 3, 4, 5, 6, 8, 9, 14, 15, 30, 34, 40, 55, 60, 63, 82, 84, 103, 105]
Please correct me if I am wrong.

Q2. Did you use any object mask for the evaluation as RegNeRF did?
If so, would you please let me know how you created the mask?

Thank you very much for your kind reply in advance.

Unseen views during training

Can you explain why the DTU only uses the 10 nearby cameras instead of random surrounding views like the LLFF dataset?
Is there any reason to use the existing camera poses during training for the DTU dataset?
I could understand using existing camera poses for evaluation but I haven't understood clearly for training.

[Paper]
LLFF Dataset: We randomly select a single view as the reference view and use its surrounding views as unseen views during training.
DTU Dataset: We use 10 nearby cameras from the dataset as unseen views during training.

nan

I'm sorry to bother you.
I'm having some problems running your code on my own machine and I hope you can help me out (I'm using 4 A5000 cards for training. Other than that, all other parameters are default).
1, in the NeRF synthetic dataset on step1 training, in the training to more than 1500 steps, all the losses become 'nan', is this normal?
2, after this situation of 1, does it mean that the model has been trained? Should I stop the training?
3, There are several different pth files in the 'ckpts/lego_s6' folder, which one should I choose as the training weights for the second step?
4、You mentioned supplementary material in your paper, but I didn't find the relevant link, can you provide it?
I am looking forward to your reply, thank you very much.

Question about the rectified DTU dataset.

Hi,
I have found that your used DTU seems to be different from the commonly used one by pixelNeRF, DietNeRF, etc. Specifically, the images seems to be closer than the standard one.
e.g.:
standard scan8 cam0:
image

yours:
image

May I ask how did you preprocess the DTU dataset? Did you crop the image? This is important for me to perform fair comparisons.

Best,

Confusing results after step-2 training.

Hi! Thanks for your great work and implementation!

I'm currently trying your code on nerf_synthetic (lego) and dtu (scan4), while the results after the two-stage training are confusing.

Specifically, the evaluation psnr for lego is 20.6 by the end of step 1, while it drops to 14.9 after step 2 training. The same thing happens to dtu_scan_4, where the evaluation psnr is around 15.0 after step 1 and drops to 11.8 after step 2. The visualization results for lego are as below.

I'd appreciate it if you could provide some idea on this phenomenon and how to fix this problem. Thank you!

lego after step 1
002
lego after step 2
010

Testing poses for the result videos

Thanks again for your great work.
Did you provide testing poses for the result videos somewhere in your code?
If so, please let me know where I can find it.
Thanks for your answer in advance.

Question about angle variable

I have a question about the blender dataset. the dataset definition contains the value 30, what does this angle indicate?

class Blender_ray_patch_1image_rot3d_camera_Dataset(Dataset):
    def __init__(self, root_dir, split='train', img_wh=(400, 400), patch_size=-1, factor=1, test_crop=False, with_ref=False, repeat=1, load_depth=False, depth_type='nerf', sH=1, sW=1, angle=30, **kwargs):

Versions

Can you share your versions?
I have errors because of different versions of torch, python etc...

problems in load vit

Hi, thank for your exciting work,but When I tried to train in the room scene, I had the following problems when loading the VIT model, could you give me some suggestions?

~/NeRFs/SinNeRF$
python train.py --dataset_name llff_ray_patch_1image_proj --root_dir data/nerf_llff_data/room --N_importance 64 --img_wh 504 378 --num_epochs 3000 --batch_size 1 --optimizer adam --lr 2e-4 --lr_scheduler steplr --decay_step 1000 2000 --decay_gamma 0.5 --exp_name llff_room_s4 --with_ref --patch_size_x 63 --patch_size_y 84 --sW 4 --sH 4 --proj_weight 1 --depth_smooth_weight 0 --dis_weight 0 --num_gpus 4 --load_depth --depth_type nerf --model sinnerf --depth_weight 8 --vit_weight 10
Namespace(N_importance=64, N_samples=64, angle=30, batch_size=1, chunk=32768, ckpt_path=None, dataset_name='llff_ray_patch_1image_proj', decay_gamma=0.5, decay_step=[1000, 2000], depth_anneal=False, depth_smooth_weight=0.0, depth_type='nerf', depth_weight=8.0, dis_weight=0.0, dloss='hinge', exp_name='llff_room_s4', img_wh=[504, 378], load_depth=True, loss_type='mse', lr=0.0002, lr_scheduler='steplr', model='sinnerf', momentum=0.9, nH=32, nW=32, nerf_only=False, noise_std=1.0, num_epochs=3000, num_gpus=4, optimizer='adam', patch_loss='mse', patch_size=-1, patch_size_x=63, patch_size_y=84, perturb=1.0, poly_exp=0.9, prefixes_to_ignore=['loss'], proj_weight=1.0, pt_model=None, repeat=1, root_dir='data/nerf_llff_data/room', sH=4, sW=4, scan=4, spheric_poses=False, use_disp=False, vit_weight=10.0, warmup_epochs=0, warmup_multiplier=1.0, weight_decay=0, with_ref=True)
Using cache found in /home/zhangzhongwei18/.cache/torch/hub/facebookresearch_dino_main
Traceback (most recent call last):
File "train.py", line 19, in
system = SinNeRF(hparams)
File "/home/zhangzhongwei18/NeRFs/SinNeRF/models/sinnerf.py", line 148, in init
self.ext = VitExtractor(
File "/home/zhangzhongwei18/NeRFs/SinNeRF/models/extractor.py", line 22, in init
self.model = torch.hub.load(
File "/home/zhangzhongwei18/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/sinnerf/lib/python3.8/site-packages/torch/hub.py", line 404, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/home/zhangzhongwei18/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/sinnerf/lib/python3.8/site-packages/torch/hub.py", line 430, in _load_local
hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
File "/home/zhangzhongwei18/.custom/cuda-10.2-cudnn8-devel-ubuntu18.04-pytorch1.8.0_full_tensorboard/envs/sinnerf/lib/python3.8/site-packages/torch/hub.py", line 76, in import_module
spec.loader.exec_module(module)
File "", line 783, in exec_module
File "", line 219, in call_with_frames_removed
File "/home/zhangzhongwei18/.cache/torch/hub/facebookresearch_dino_main/hubconf.py", line 17, in
import vision_transformer as vits
File "/home/zhangzhongwei18/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 24, in
from utils import trunc_normal

ImportError: cannot import name 'trunc_normal
' from 'utils' (/home/zhangzhongwei18/NeRFs/SinNeRF/utils/init.py)

How do I train SinNeRF with my own single RGBD image?

Thanks for your work!

I wonder how can I train a SinNeRF model with a single RGB-D image? There is no pose information.

If I need to modify the code of dataset class, which file is the most convenient for me to modify?Thank you for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.