nianticlabs / stereo-from-mono Goto Github PK

[ECCV 2020] Learning stereo from single images using monocular depth estimation networks

License: Other

Python 100.00%

deep-learning deeplearning stereo-matching stereo stereo-algorithms monodepth megadepth monocular-depth-estimation single-image-depth-prediction depth-estimation

stereo-from-mono's Introduction

Learning Stereo from Single Images

Jamie Watson, Oisin Mac Aodha, Daniyar Turmukhambetov, Gabriel J. Brostow and Michael Firman – ECCV 2020 (Oral presentation)

Link to paper

Supervised deep networks are among the best methods for finding correspondences in stereo image pairs. Like all supervised approaches, these networks require ground truth data during training. However, collecting large quantities of accurate dense correspondence data is very challenging. We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.

Inspired by recent progress in monocular depth estimation, we generate plausible disparity maps from single images. In turn, we use those flawed disparity maps in a carefully designed pipeline to generate stereo training pairs. Training in this manner makes it possible to convert any collection of single RGB images into stereo training data. This results in a significant reduction in human effort, with no need to collect real depths or to hand-design synthetic data. We can consequently train a stereo matching network from scratch on datasets like COCO, which were previously hard to exploit for stereo.

Through extensive experiments we show that our approach outperforms stereo networks trained with standard synthetic datasets, when evaluated on KITTI, ETH3D, and Middlebury.

✏️ 📄 Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{watson-2020-stereo-from-mono,
 title   = {Learning Stereo from Single Images},
 author  = {Jamie Watson and
            Oisin Mac Aodha and
            Daniyar Turmukhambetov and
            Gabriel J. Brostow and
            Michael Firman
           },
 booktitle = {European Conference on Computer Vision ({ECCV})},
 year = {2020}
}

📊 Evaluation

We evaluate our performance on several datasets: KITTI (2015 and 2012), Middlebury (full resolution) and ETH3D (Low res two view). To run inference on these datasets first download them, and update paths_config.yaml to point to these locations.

Note that we report scores on the training sets of each dataset since we never see these images during training.

Run evaluation using:

CUDA_VISIBLE_DEVICES=X  python main.py \
  --mode inference \
  --load_path <downloaded_model_path>

optionally setting --test_data_types and --save_disparities.

A trained model can be found HERE.

🎯 Training

To train a new model, you will need to download several datasets: ADE20K, DIODE, Depth in the Wild, Mapillary and COCO. After doing so, update paths_config.yaml to point to these directories.

Additionally you will need some precomputed monocular depth estimates for these images. We provide these for MiDaS: ADE20K, DIODE, Depth in the Wild, Mapillary and COCO. Download these and put them in the corresponding data paths (i.e. your paths specified in paths_config.yaml).

Now you can train a new model using:

CUDA_VISIBLE_DEVICES=X  python  main.py --mode train \
 --log_path <where_to_save_your_model> \
 --model_name <name_of_your_model>

Please see options.py for full list of training options.

👩‍⚖️ License

stereo-from-mono's People

Contributors

Stargazers

Watchers

stereo-from-mono's Issues

Custom left image to right image

Hello! First of all, thank you for this incredible repo!
I am discovering this project and I am wondering if I can use it to synthesize a right image from an arbitrary left image with the code that is provided here. For example, by running the main.py with some specific parameters. If so, how would I do this?
Thanks in advance for your feedback!

google colab version

More a request than an issue but could we have a google colab version of this?

I think that it would benefit a lot of people including me.

It would be nice to be able to easily:

given a stereo pair, get the depth map.

It would also be nice to be able to:

given an image and its depth map, get a synthetic image representing the right image or any synthetic image that can be derived from the source image (going left or right and by any amount).

Can someone teach me how to run this code?

I want to know probably the environment(python version, torch version and so on...) Help!! If you want to tell me more details, that will be great, thanks very much!!!

How to synthesize right image，which code you are used？

Hi,
Thanks for your work. It's a very great work, I had some problem with synthesize right image. can you share the code you are used?

Especially，about two main sources of artifacts: occlusion holes and collisions. which code and way is you are used. Can you share some experience .

Generating right side image

Good day! Thank you for your work on stereo from mono images.
I'm trying to generate a right side image from some left image. Currently I am looking at datasets/warp_dataset.py and base_dataset.py. Are these the only files I need if I am only interested in generating right side from a left image? Any help is greatly appreciated. Thank you very much!

About releasing code and dataset

Hi, your paper is a brilliant work! Could you please share the MfS dataset and your code? Thanks a lot!

About backward warping v.s. forward warping

Thanks for your great work!

I'm a bit confused that since some depth networks like monodepth can directly predict both left and right disparities from only the left image, why don't we just use backward warping from the left image and the right disparity to reconstruct the right image so as to avoid the problems w.r.t. forward warping? Yet more recent models starting from monodepth2 remove the left-right consistency loss and right disparity prediction, it seems ok for me to add these components back to get the right disparities?

Saving the Right sided images

I am unable to save the synthesized images , for example, using this code, I cannot save the right sided image from the left sided image. I have tried using

_savepath = os.path.join(self.opt.load_path, data_type, 'Right_Image') os.makedirs(_savepath, exist_ok=True) io.imsave(os.path.join(_savepath, '{}.png'.format(str(idx).zfill(3))), right_image)

in the warp_dataset.py after line number 336 but it is not working . I think some coding changes should be made in the Inference.py . But I cannot figure it out

How to run inference on a sample image

Can you please provide a sample code/instruction on how to use the provided pretrained model for generating output of an input image?

My compliment to your wonderful work! Do you have plan to generate high quality right image making the pair ready for 3D movies?

Dear author:

As the title of issue, I would like to ask if you have such plan or know any paper/project of making left-right image pair for 3D movies. I shall be most grateful if you may show me any information. Thanks!

Sincerely,
Picard

Metrics used for evaluation

Hello, your paper is quite interesting. Did you evaluate the proposed method on commonly used metrics such as absolute relative error, RMSE etc?

Creating stereo frame pairs from monocular video to train monodepth2

As this work is from Niantic, who is also responsible for monodepth2, I am curious (1) if you have been able to leverage stereo-from-mono to generate stereo video from monocular video, and if so, (2) if those generated stereo videos produce the same depth estimation benefits with monodepth2 as that paper shows for real stereo videos vs. monocular videos. Have you tried this type of experiment? If so, how does it work out? To be honest, I thought this experiment would be in the paper for sure. I am particularly curious about what type of temporal consistency you can expect frame-to-frame.

No permission in Google Cloud

I try to download the trained model from Google Cloud, but it shows "Additional permissions required to list objects in this bucket: Ask a bucket owner to grant you 'storage.objects.list' permission."

where is https://github.com/nianticlabs/ panoptic-forecasting？

The code to get disparity

Hi,
How to get the disparity files like "midas_depths_diode/val/indoors/scene_00021/scan_00189/00021_00189_indoors_200_010.npy"

Thanks.

about pretrain model

Hello,can you tell me how much data is used for the pre-training model? I tested it with Intel D435 data, and the result looks not good enough(shown as below),will you provide the link to MFSdataset in the paper?