Giter VIP home page Giter VIP logo

stereo-from-mono's Introduction

Jamie Watson, Oisin Mac Aodha, Daniyar Turmukhambetov, Gabriel J. Brostow and Michael Firman – ECCV 2020 (Oral presentation)

Link to paper

2 minute ECCV presentation video link

10 minute ECCV presentation video link

Training data and results qualitative comparison

Supervised deep networks are among the best methods for finding correspondences in stereo image pairs. Like all supervised approaches, these networks require ground truth data during training. However, collecting large quantities of accurate dense correspondence data is very challenging. We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.

Overview of our stereo data generation approach

Inspired by recent progress in monocular depth estimation, we generate plausible disparity maps from single images. In turn, we use those flawed disparity maps in a carefully designed pipeline to generate stereo training pairs. Training in this manner makes it possible to convert any collection of single RGB images into stereo training data. This results in a significant reduction in human effort, with no need to collect real depths or to hand-design synthetic data. We can consequently train a stereo matching network from scratch on datasets like COCO, which were previously hard to exploit for stereo.

Depth maps produced by stereo networks trained with Sceneflow and our method

Through extensive experiments we show that our approach outperforms stereo networks trained with standard synthetic datasets, when evaluated on KITTI, ETH3D, and Middlebury.

Quantitative comparison of stereo networks trained with Sceneflow and our method

✏️ 📄 Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{watson-2020-stereo-from-mono,
 title   = {Learning Stereo from Single Images},
 author  = {Jamie Watson and
            Oisin Mac Aodha and
            Daniyar Turmukhambetov and
            Gabriel J. Brostow and
            Michael Firman
           },
 booktitle = {European Conference on Computer Vision ({ECCV})},
 year = {2020}
}

📊 Evaluation

We evaluate our performance on several datasets: KITTI (2015 and 2012), Middlebury (full resolution) and ETH3D (Low res two view). To run inference on these datasets first download them, and update paths_config.yaml to point to these locations.

Note that we report scores on the training sets of each dataset since we never see these images during training.

Run evaluation using:

CUDA_VISIBLE_DEVICES=X  python main.py \
  --mode inference \
  --load_path <downloaded_model_path> 

optionally setting --test_data_types and --save_disparities.

A trained model can be found HERE.

🎯 Training

To train a new model, you will need to download several datasets: ADE20K, DIODE, Depth in the Wild, Mapillary and COCO. After doing so, update paths_config.yaml to point to these directories.

Additionally you will need some precomputed monocular depth estimates for these images. We provide these for MiDaS: ADE20K, DIODE, Depth in the Wild, Mapillary and COCO. Download these and put them in the corresponding data paths (i.e. your paths specified in paths_config.yaml).

Now you can train a new model using:

CUDA_VISIBLE_DEVICES=X  python  main.py --mode train \
 --log_path <where_to_save_your_model> \
 --model_name <name_of_your_model>

Please see options.py for full list of training options.

👩‍⚖️ License

Copyright © Niantic, Inc. 2020. Patent Pending. All rights reserved. Please see the license file for terms.

stereo-from-mono's People

Contributors

daniyar-niantic avatar jamiewatson683 avatar mdfirman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stereo-from-mono's Issues

Custom left image to right image

Hello! First of all, thank you for this incredible repo!
I am discovering this project and I am wondering if I can use it to synthesize a right image from an arbitrary left image with the code that is provided here. For example, by running the main.py with some specific parameters. If so, how would I do this?
Thanks in advance for your feedback!

google colab version

More a request than an issue but could we have a google colab version of this?

I think that it would benefit a lot of people including me.

It would be nice to be able to easily:

  • given a stereo pair, get the depth map.

It would also be nice to be able to:

  • given an image and its depth map, get a synthetic image representing the right image or any synthetic image that can be derived from the source image (going left or right and by any amount).

Can someone teach me how to run this code?

I want to know probably the environment(python version, torch version and so on...) Help!! If you want to tell me more details, that will be great, thanks very much!!!

How to synthesize right image,which code you are used?

Hi,
Thanks for your work. It's a very great work, I had some problem with synthesize right image. can you share the code you are used?

Especially,about two main sources of artifacts: occlusion holes and collisions. which code and way is you are used. Can you share some experience .

Generating right side image

Good day! Thank you for your work on stereo from mono images.
I'm trying to generate a right side image from some left image. Currently I am looking at datasets/warp_dataset.py and base_dataset.py. Are these the only files I need if I am only interested in generating right side from a left image? Any help is greatly appreciated. Thank you very much!

About backward warping v.s. forward warping

Thanks for your great work!

I'm a bit confused that since some depth networks like monodepth can directly predict both left and right disparities from only the left image, why don't we just use backward warping from the left image and the right disparity to reconstruct the right image so as to avoid the problems w.r.t. forward warping? Yet more recent models starting from monodepth2 remove the left-right consistency loss and right disparity prediction, it seems ok for me to add these components back to get the right disparities?

Saving the Right sided images

I am unable to save the synthesized images , for example, using this code, I cannot save the right sided image from the left sided image. I have tried using

_savepath = os.path.join(self.opt.load_path, data_type, 'Right_Image') os.makedirs(_savepath, exist_ok=True) io.imsave(os.path.join(_savepath, '{}.png'.format(str(idx).zfill(3))), right_image)

in the warp_dataset.py after line number 336 but it is not working . I think some coding changes should be made in the Inference.py . But I cannot figure it out

Metrics used for evaluation

Hello, your paper is quite interesting. Did you evaluate the proposed method on commonly used metrics such as absolute relative error, RMSE etc?

Creating stereo frame pairs from monocular video to train monodepth2

As this work is from Niantic, who is also responsible for monodepth2, I am curious (1) if you have been able to leverage stereo-from-mono to generate stereo video from monocular video, and if so, (2) if those generated stereo videos produce the same depth estimation benefits with monodepth2 as that paper shows for real stereo videos vs. monocular videos. Have you tried this type of experiment? If so, how does it work out? To be honest, I thought this experiment would be in the paper for sure. I am particularly curious about what type of temporal consistency you can expect frame-to-frame.

No permission in Google Cloud

I try to download the trained model from Google Cloud, but it shows "Additional permissions required to list objects in this bucket: Ask a bucket owner to grant you 'storage.objects.list' permission."

Screen Shot 2021-07-01 at 20 38 09

The code to get disparity

Hi,
How to get the disparity files like "midas_depths_diode/val/indoors/scene_00021/scan_00189/00021_00189_indoors_200_010.npy"

Thanks.

about pretrain model

Hello,can you tell me how much data is used for the pre-training model? I tested it with Intel D435 data, and the result looks not good enough(shown as below),will you provide the link to MFSdataset in the paper?
image
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.