vincentfung13 / mine Goto Github PK

View Code? Open in Web Editor NEW

406.0 16.0 43.0 7.41 MB

Code and models for our ICCV 2021 paper "MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis"

License: MIT License

Python 98.64% Shell 1.36%

deep-learning novel-view-synthesis nerf depth-estimation 3d-reconstruction computer-vision 3d-vision

mine's Introduction

MINE: Continuous-Depth MPI with Neural Radiance Fields

Project Page | YouTube | bilibili

PyTorch implementation for our ICCV 2021 paper.

MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
Jiaxin Li*¹, Zijian Feng*¹, Qi She¹, Henghui Ding¹, Changhu Wang¹, Gim Hee Lee²
¹ByteDance, ²National University of Singapore
*denotes equal contribution

Our MINE takes a single image as input and densely reconstructs the frustum of the camera, through which we can easily render novel views of the given scene:

The overall architecture of our method:

Run training on the LLFF dataset:

Firstly, set up your conda environment:

conda env create -f environment.yml 
conda activate MINE

Download the pre-downsampled version of the LLFF dataset from Google Drive, unzip it and put it in the root of the project, then start training by running the following command:

sh start_training.sh MASTER_ADDR="localhost" MASTER_PORT=1234 N_NODES=1 GPUS_PER_NODE=2 NODE_RANK=0 WORKSPACE=/run/user/3861/vs_tmp DATASET=llff VERSION=debug EXTRA_CONFIG='{"training.gpus": "0,1"}'

You may find the tensorboard logs and checkpoints in the sub-working directory (WORKSPACE + VERSION).

Apart from the LLFF dataset, we experimented on the RealEstate10K, KITTI Raw and the Flowers Light Fields datasets - the data pre-processing codes and training flow for these datasets will be released later.

Running our pretrained models:

We release the pretrained models trained on the RealEstate10K, KITTI and the Flowers datasets:

Dataset	N	Input Resolution	Download Link
RealEstate10K	32	384x256	Google Drive
RealEstate10K	64	384x256	Google Drive
KITTI	32	768x256	Google Drive
KITTI	64	768x256	Google Drive
Flowers	32	512x384	Google Drive
Flowers	64	512x384	Google Drive

To run the models, download the checkpoint and the hyper-parameter yaml file and place them in the same directory, then run the following script:

python3 visualizations/image_to_video.py --checkpoint_path MINE_realestate10k_384x256_monodepth2_N64/checkpoint.pth --gpus 0 --data_path visualizations/home.jpg --output_dir .

Citation

If you find our work helpful to your research, please cite our paper:

@inproceedings{mine2021,
  title={MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis},
  author={Jiaxin Li and Zijian Feng and Qi She and Henghui Ding and Changhu Wang and Gim Hee Lee},
  year={2021},
  booktitle={ICCV},
}

mine's People

Contributors

Stargazers

Watchers

mine's Issues

how to prepear my dataset?

hi,
thanks for your good job!
but if i want to train my data? how to process?
i see the llff data have cameras.bin images.bin，points3D.bin。。。how to get these?
could you share the code for that?
Thanks.

kitti depth map evaluation results?

Have you tried estimated depth evaluation on KITTI dataset?

KITTI traning code

Hi,
I was wondering if you plan to release KITTI training code at any time?
Apart from this, are the released model checkpoints all pretrained on ImageNet? Thanks!

Question about train my own photo set

It's a great work!
When I train my own photo set with llff's params, the code will report two errors："assert len(xyzs) >= visible_points_count" in nerf_dataset.py or "Matrix inverse contains nan!" in utils.py.Some data can be successfully trained, some data will report the first error, and some data will report the second error.
I would like to know if there are issues with the training data captured or if improvements can be made from the code? If there is a problem with the captured dataset, how should the correct image be captured?

Question about Training Data Requirements

Hi,
Thanks for your interesting work.

I have a question regarding training data but I seem not to be able to find it in the paper.
Do you need ground truth depth maps during training or not?
Say I give you a purely image dataset like CIFAR-10, can you run your method on this data or it should contain "additional" information? If so, what is this "additional" information?

I know that during inference you only need the image, but I want to know what information is required during training.

Sincerely,
Hadi.

out of memory

I train on the LLFF dataset with two 2080ti gpus, but it reports "out of memory". I changed the batch size from 2 to 1 in config file but still not work. What should I do?

Inplement detail about plane homography warping between src camera and tgt camera.

In operations/homography_sampler.py file,

Line 107-108 calculate plane homography warping matrix between src camera and tgt camera, following the equation:

While the K_inv should be K_tgt_inv, not the K_src_inv, K should be K_src. This issue will not happen when K_tgt=K_src, but cause error when intrinsics are not equal.
H_tgt_src = torch.matmul(K_src, torch.matmul(R_tnd, K_tgt_inv))

Preprocessing and Training Flow for Other Datasets

Hello authors, thank you for your great work.

You noted in the README:

Apart from the LLFF dataset, we experimented on the RealEstate10K, KITTI Raw and the Flowers Light Fields datasets - the data pre-processing codes and training flow for these datasets will be released later.

I believe the last update on this was in October 2021, so I am following up. Will you be able to release the dataloaders/code soon?

All the best,

This problem has been solved.Thank you

Hyperparameters for training

Hello,
would like to congratulate you on such great work!

Are the hyperparameters for the kitti_raw dataset included in params_kitti_raw.yaml the same ones used to reach the results in the paper or should they be changed?

Inference memory is insufficient

When using 1060-3g inference, the GPU memory is not enough, how should I modify it? thanks

实时渲染问题

十分有幸可以拜读您的出色工作，但是我有一个问题，就是该工作是否可以实时处理视频以及直播流？
谢谢

KITTI split and LPIPS computation

Hi,

Thank you for the fantastic work! I have two small questions regarding model evaluation.

KITTI raw data split
Section 4.1 mentions that there are 20 city sequences from KITTI Raw used for training and 4 sequences used for test. However, there are 28 city sequences in KITTI Raw in total. Do you use the rest of 4 sequences anywhere in the pipeline? Are the 20 training sequences and 4 test sequences exactly the same as used in Tulsiani 2018, as implemented here?
LPIPS computation
You computed LPIPS here. According the dataloader implemented here, your inputs to LPIPS are in range [0, 1] while LPIPS expects inputs in range [-1, 1] as mentioned in their doc. Am I missing anything here, or the input should indeed be normalized to have the correct LPIPS score?

Thank you in advance for the time.

minimum hardward requirements

Thank you for your nice work!

What if I want to run your code, do I need 48 V100 GPUs as you mentioned in the paper?

What are the minimum requirements to run this code?

Thanks in advance.

Qualitative comparision about KITTI

Hi,
there is a qualitative comparision with single-view MPI on KITTI dataset in your paper,
but I do not find their pretrained model on KITTI from their repository.
Did you train their model to get the qualitative results?
Could you provide me a copy of these qualitative results? (just for academic purposes)
Thank you.

Reproducibility Discrepancy

I've been trying to reproduce your results on KITTI Raw dataset using the published code and also used the code in here to create the same splits and preprocessing indicated in the paper. I ran an evaluation run using the pretrained weights of KITTI (32 layers) but the following results are the ones I got which don't align with the results of the paper.

[2021-11-10 18:40:17,526 synthesis_task_kitti.py] Evaluation finished, average losses:
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_src 0.011722 (0.013352)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_src 0.018813 (0.022328)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_tgt 0.058807 (0.064112)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_tgt 0.343890 (0.348406)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_lpips_tgt 0.214107 (0.253099)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_psnr_tgt 18.623119 (18.415305)

I've also included attachments of the synthesized target and src images. There could be an issue with the kitti data loader that I created so I can share it with you to point out the issue that's causing the discrepancy. If not, I would appreciate it if you can share with me your KITTI data loader to trace where the error is myself.

Training on multiple images per scene

Hi,

I noticed in your code that there is an option to train MINE with multiple images as input. In that case, there is no scale ambiguity, right? Can you give an example of a data-loader for that case?

Why image normalization twice

Hi,
Image normalization is realized by "img_transforms" when loading image in function of "nerf_dataset.py" . Why normalize the input image again in "ResnetEncoder forward step" ???

WeChat group

Thanks for open-sourcing the great work!
Look forward to friends who pay attention to this work and hope to discuss it. Thank you.
WeChat link as follows：
https://wx3.sinaimg.cn/mw690/b0e12477gy1gwcfjj9ez9j20tc10xjuv.jpg

Question about KITTI raw dataset

Hi,
Thanks for sharing your work! I am wondering when will you release the dataset pipeline for KITTI raw and other datasets.
By the way, how to evaluate the network for each dataset? And what's the reported performance in LLFF dataset? I can't found them in the paper.
Thanks!

无法下载

https://drive.google.com/file/d/1sV7ioO_bintNg4U33YfUpFDD782OY8NI/view?usp=sharing下载不了

Questions about eq(3), eq (8) and eq(12) in the paper

I have some questions about the equations in the paper.
I think those equations should be corrected.
If I misunderstood something, please let me know.

(3)
in the paper

expected

(8) Parenthesis position is somewhat weird.
in the paper

expected

(12) Scale factor defined in MPI and MINE is in a reverse relationship, but equations do not reflect the difference.
in the paper

expected

Correspondence of formula and code（torch.cumprod）

Dear authors:
Thanks for your impressive work. I found the opetation "torch.cumprod" in code
def plane_volume_rendering(rgb_BS3HW, sigma_BS1HW, xyz_BS3HW, is_bg_depth_inf): transparency_acc = torch.cumprod(transparency + 1e-6, dim=1) # BxSx1xHxW
However, I can't see an equation that contains cumprod opetation in paper "MINE:...". Where should I refer to the corresponding formula.
Thanks a lot.

No module named 'kornia'

when I run the demo, then got the issue: No module named 'kornia'

无法训练

在没有下载resnet50和vgg16预训练模型的情况下，loss为0，无法训练