Giter VIP home page Giter VIP logo

mine's Introduction

MINE: Continuous-Depth MPI with Neural Radiance Fields

PyTorch implementation for our ICCV 2021 paper.

MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
Jiaxin Li*1, Zijian Feng*1, Qi She1, Henghui Ding1, Changhu Wang1, Gim Hee Lee2
1ByteDance, 2National University of Singapore
*denotes equal contribution

Our MINE takes a single image as input and densely reconstructs the frustum of the camera, through which we can easily render novel views of the given scene:

ferngif

The overall architecture of our method:

Run training on the LLFF dataset:

Firstly, set up your conda environment:

conda env create -f environment.yml 
conda activate MINE

Download the pre-downsampled version of the LLFF dataset from Google Drive, unzip it and put it in the root of the project, then start training by running the following command:

sh start_training.sh MASTER_ADDR="localhost" MASTER_PORT=1234 N_NODES=1 GPUS_PER_NODE=2 NODE_RANK=0 WORKSPACE=/run/user/3861/vs_tmp DATASET=llff VERSION=debug EXTRA_CONFIG='{"training.gpus": "0,1"}'

You may find the tensorboard logs and checkpoints in the sub-working directory (WORKSPACE + VERSION).

Apart from the LLFF dataset, we experimented on the RealEstate10K, KITTI Raw and the Flowers Light Fields datasets - the data pre-processing codes and training flow for these datasets will be released later.

Running our pretrained models:

We release the pretrained models trained on the RealEstate10K, KITTI and the Flowers datasets:

Dataset N Input Resolution Download Link
RealEstate10K 32 384x256 Google Drive
RealEstate10K 64 384x256 Google Drive
KITTI 32 768x256 Google Drive
KITTI 64 768x256 Google Drive
Flowers 32 512x384 Google Drive
Flowers 64 512x384 Google Drive

To run the models, download the checkpoint and the hyper-parameter yaml file and place them in the same directory, then run the following script:

python3 visualizations/image_to_video.py --checkpoint_path MINE_realestate10k_384x256_monodepth2_N64/checkpoint.pth --gpus 0 --data_path visualizations/home.jpg --output_dir .

Citation

If you find our work helpful to your research, please cite our paper:

@inproceedings{mine2021,
  title={MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis},
  author={Jiaxin Li and Zijian Feng and Qi She and Henghui Ding and Changhu Wang and Gim Hee Lee},
  year={2021},
  booktitle={ICCV},
}

mine's People

Contributors

vincentfung13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mine's Issues

how to prepear my dataset?

hi,
thanks for your good job!
but if i want to train my data? how to process?
i see the llff data have cameras.bin images.bin,points3D.bin。。。how to get these?
could you share the code for that?
Thanks.

KITTI traning code

Hi,
I was wondering if you plan to release KITTI training code at any time?
Apart from this, are the released model checkpoints all pretrained on ImageNet? Thanks!

Question about train my own photo set

It's a great work!
When I train my own photo set with llff's params, the code will report two errors:"assert len(xyzs) >= visible_points_count" in nerf_dataset.py or "Matrix inverse contains nan!" in utils.py.Some data can be successfully trained, some data will report the first error, and some data will report the second error.
I would like to know if there are issues with the training data captured or if improvements can be made from the code? If there is a problem with the captured dataset, how should the correct image be captured?

Question about Training Data Requirements

Hi,
Thanks for your interesting work.

I have a question regarding training data but I seem not to be able to find it in the paper.
Do you need ground truth depth maps during training or not?
Say I give you a purely image dataset like CIFAR-10, can you run your method on this data or it should contain "additional" information? If so, what is this "additional" information?

I know that during inference you only need the image, but I want to know what information is required during training.

Sincerely,
Hadi.

out of memory

I train on the LLFF dataset with two 2080ti gpus, but it reports "out of memory". I changed the batch size from 2 to 1 in config file but still not work. What should I do?

Inplement detail about plane homography warping between src camera and tgt camera.

In operations/homography_sampler.py file,
1
Line 107-108 calculate plane homography warping matrix between src camera and tgt camera, following the equation:
2
While the K_inv should be K_tgt_inv, not the K_src_inv, K should be K_src. This issue will not happen when K_tgt=K_src, but cause error when intrinsics are not equal.
H_tgt_src = torch.matmul(K_src, torch.matmul(R_tnd, K_tgt_inv))

Preprocessing and Training Flow for Other Datasets

Hello authors, thank you for your great work.

You noted in the README:

Apart from the LLFF dataset, we experimented on the RealEstate10K, KITTI Raw and the Flowers Light Fields datasets - the data pre-processing codes and training flow for these datasets will be released later.

I believe the last update on this was in October 2021, so I am following up. Will you be able to release the dataloaders/code soon?

All the best,

Hyperparameters for training

Hello,
would like to congratulate you on such great work!

Are the hyperparameters for the kitti_raw dataset included in params_kitti_raw.yaml the same ones used to reach the results in the paper or should they be changed?

实时渲染问题

十分有幸可以拜读您的出色工作,但是我有一个问题,就是该工作是否可以实时处理视频以及直播流?
谢谢

KITTI split and LPIPS computation

Hi,

Thank you for the fantastic work! I have two small questions regarding model evaluation.

  1. KITTI raw data split
    Section 4.1 mentions that there are 20 city sequences from KITTI Raw used for training and 4 sequences used for test. However, there are 28 city sequences in KITTI Raw in total. Do you use the rest of 4 sequences anywhere in the pipeline? Are the 20 training sequences and 4 test sequences exactly the same as used in Tulsiani 2018, as implemented here?

  2. LPIPS computation
    You computed LPIPS here. According the dataloader implemented here, your inputs to LPIPS are in range [0, 1] while LPIPS expects inputs in range [-1, 1] as mentioned in their doc. Am I missing anything here, or the input should indeed be normalized to have the correct LPIPS score?

Thank you in advance for the time.

minimum hardward requirements

Thank you for your nice work!

What if I want to run your code, do I need 48 V100 GPUs as you mentioned in the paper?

What are the minimum requirements to run this code?

Thanks in advance.

Qualitative comparision about KITTI

Hi,
there is a qualitative comparision with single-view MPI on KITTI dataset in your paper,
but I do not find their pretrained model on KITTI from their repository.
Did you train their model to get the qualitative results?
Could you provide me a copy of these qualitative results? (just for academic purposes)
Thank you.

Reproducibility Discrepancy

I've been trying to reproduce your results on KITTI Raw dataset using the published code and also used the code in here to create the same splits and preprocessing indicated in the paper. I ran an evaluation run using the pretrained weights of KITTI (32 layers) but the following results are the ones I got which don't align with the results of the paper.

[2021-11-10 18:40:17,526 synthesis_task_kitti.py] Evaluation finished, average losses:
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_src 0.011722 (0.013352)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_src 0.018813 (0.022328)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_rgb_tgt 0.058807 (0.064112)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_loss_ssim_tgt 0.343890 (0.348406)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_lpips_tgt 0.214107 (0.253099)
[2021-11-10 18:40:17,531 synthesis_task_kitti.py] val_psnr_tgt 18.623119 (18.415305)

I've also included attachments of the synthesized target and src images. There could be an issue with the kitti data loader that I created so I can share it with you to point out the issue that's causing the discrepancy. If not, I would appreciate it if you can share with me your KITTI data loader to trace where the error is myself.

src_final
tgt_1

Training on multiple images per scene

Hi,

I noticed in your code that there is an option to train MINE with multiple images as input. In that case, there is no scale ambiguity, right? Can you give an example of a data-loader for that case?

Why image normalization twice

Hi,
Image normalization is realized by "img_transforms" when loading image in function of "nerf_dataset.py" . Why normalize the input image again in "ResnetEncoder forward step" ???

Question about KITTI raw dataset

Hi,
Thanks for sharing your work! I am wondering when will you release the dataset pipeline for KITTI raw and other datasets.
By the way, how to evaluate the network for each dataset? And what's the reported performance in LLFF dataset? I can't found them in the paper.
Thanks!

Questions about eq(3), eq (8) and eq(12) in the paper

I have some questions about the equations in the paper.
I think those equations should be corrected.
If I misunderstood something, please let me know.

(3)
in the paper
image

expected
image

(8) Parenthesis position is somewhat weird.
in the paper
image
expected
image

(12) Scale factor defined in MPI and MINE is in a reverse relationship, but equations do not reflect the difference.
in the paper
image
expected
image

Correspondence of formula and code(torch.cumprod)

Dear authors:
Thanks for your impressive work. I found the opetation "torch.cumprod" in code
def plane_volume_rendering(rgb_BS3HW, sigma_BS1HW, xyz_BS3HW, is_bg_depth_inf): transparency_acc = torch.cumprod(transparency + 1e-6, dim=1) # BxSx1xHxW
However, I can't see an equation that contains cumprod opetation in paper "MINE:...". Where should I refer to the corresponding formula.
Thanks a lot.

无法训练

image
在没有下载resnet50和vgg16预训练模型的情况下,loss为0,无法训练

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.