Giter VIP home page Giter VIP logo

depth-estimation-using-data-driven-approaches's Introduction

Depth Estimation using Data Driven Approaches

Introduction

     Time of Flight, Structured light and Stereo technology have been used widely for Depth Map estimation. Each have these come with their own pros and cons in terms of speed of image capture, structural description and ambient light performance. Monocular cues such as: Texture and Gradient Variation, Shading , color/Haze, and defocus aid in accurate depth estimation. These are complex statistical models which are susceptible to noise. Recently, data driven approaches as in deep learning has been employed for depth estimation. These data driven approaches are less prone to noise if presented with enough data to learn coarser and finer details.

Convolution Neural Networks - CNN

     In deep learning, CNNs are widely used in the image processing applications. Convolution layers are the basic building block of CNN and it combines with Pooling and ReLU activation layers. Kernel learns during each layer using back propagation.The CNN learns the features from the input images by applying the varied filters across the image generating feature maps at each layer. As we go deeper into the network the feature maps are able to identify complex features and objects intuitively. ConvNets have been very successful for image classification, but recently have been used for image prediction and other applications. The addition of upscaling and deconvolution layers have given way to upscale the compressed feature map for data prediction over class.

![image](https://cloud.githubusercontent.com/assets/11435669/20927466/c186f656-bb8f-11e6-86a8-2d6661db827c.png)

Related Work

     A fully automatic 2D-to-3D conversion algorithm: Deep3D [1] that takes 2D images or video frames as input and outputs 3D stereo image pairs. David Eigen from NYU proposed a single monocular image based architecture that employs two deep network stacks called Multi Scale Network [2]: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. It is trained on real world dataset. “FlowNet: Learning Optical Flow with Convolutional Networks” [3] uses video created virtually to make the network learn motion parameters and hence forth extract optical flow. “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches” [4] a method for extracting depth information from stereo data and their respective patches. Similar to [4] “Depth and surface normal estimation from monocular images using regression on deep features and hierarchical {CRFs}” [5] uses different scale of image patches to extract depth information.

![image](https://cloud.githubusercontent.com/assets/11435669/20927710/c855abca-bb90-11e6-9dd1-3fe86007c398.png)

Multi Scale network

![image](https://cloud.githubusercontent.com/assets/11435669/20927750/f034fe20-bb90-11e6-9cb8-262d661d205a.png) ![image](https://cloud.githubusercontent.com/assets/11435669/20927757/f4e5f3b6-bb90-11e6-91c3-ba2bf66dacb0.png)

FlowNet

Methods

A.   Stereo ConvNet Architecture

     The images and ground truth depth maps used for training, validation and testing are produced by varying orientations of the 3D model generated using the Blender software tool. As our first step, we use SteroConvNet [6] and the first half of the network is shown below. Second half of the network is the mirror image of the last convolution layer, replacing convolution with deconvolution and pooling with upscaling. Input Image, even though consists of concatenated left and right image pairs , the network takes it as two separate images. Here, the reference output label is the ground truth depth map generated using the Blender's "Mist" function.

![image](https://cloud.githubusercontent.com/assets/11435669/20928225/cc7f1b58-bb92-11e6-9217-fa0811db36bd.png)

Stereo ConvNet Architecture

B.   Deeper Stereo ConvNet Architecture

     In Deeper Stereo ConvNet, input remains constant but architecture is modified with an extra convolution and deconvolution layer. Also, depth of the filters is increased referring to [3] in order to capture more details.

C.   Patched Deeper Stereo ConvNet Architecture

     Referring to [4] and [5], input stream has been increased to 6 for Patched Deeper Stereo ConvNet, by decomposing left image into 4 scaled parts. Thus, as in the referenced papers higher accuracy of the depth map is expected.

image

Patched Deeper Stereo ConvNet Architecture

Results

      Stereo ConvNet Architecture
          + smooth without holes
          + coarse structure preserved
          -Blurred at edges
          -Sharp structures lost
          -Fine objects smeared or lost.
          Time to test = 20 s

      Deeper Stereo ConvNet Architecture
          + smooth without holes
          + coarse structure preserved
          + Edges are sharper
          -Still noise at the edges
          -Fine details/objects smeared or lost.
          Note:The increased depth of the network learns more detail about the scene.
          Time to test = 70 s

      Patched Deeper Stereo ConvNet Architecture
          + smooth without holes
          + Fine structure preserved
          + Image predicted with less noise.
          -Time to train and test increases.
          Note:The increased depth and increased data resolution of the network learns more
          detail about the scene.
          Time to test = 145 s

Stereo ConvNet Architecture:

![examples](https://cloud.githubusercontent.com/assets/11435669/20932317/3d7a65d8-bba2-11e6-90d0-0589dc66ccee.png)

Deeper Stereo ConvNet Architecture:

![examples](https://cloud.githubusercontent.com/assets/11435669/20932408/8b2d6d0c-bba2-11e6-977c-10a82ce5e9aa.png)

Patched Deeper Stereo ConvNet Architecture:

![examples](https://cloud.githubusercontent.com/assets/11435669/20932444/a7e2b2ae-bba2-11e6-8bfa-d8b2e5dca250.png)

3D modeling for Patched Deeper Stereo ConvNet Architecture:

Image Expected output Derived output
1_s 2_s 3_s
4_s 5_s 6_s

Conclusion

      Data Driven Depth Estimation approaches would be effective if sufficiently large descriptive labelled dataset were avialable. Patched Deeper Stereo ConvNet predicts depth map very similar to the ground truth. Time to train the network is directly proportional to the depth and complexity of the CNN architecture. In further implementations, we plan to combine the architecture of our Patched Deeper StereoConvNet with Multi-Scale Deep Network and observe the results for real world images.

References

[1]     “Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks” Junyuan Xie, Ross Girshick, Ali Farhadi,University of Washington.

[2]     “Depth Map Prediction from a Single Image using a Multi-Scale Deep Network” David Eigen, Christian Puhrsch, Rob Fergus Dept. of Computer Science, Courant Institute, New York University.

[3]      “FlowNet: Learning Optical Flow with Convolutional Networks”, A. Dosovitskiy and P. Fischer, ICCV , 2015.

[4]     “Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs” by Bo Li1, Chunhua Shen , Yuchao Dai , Anton van den Hengel, Mingyi He, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15).

[5]     “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches” by Jure Zbontar ,University of Ljubljana Vecna ,Yann LeCun, Journal of Machine Learning Research 17 (2016).

[6]     https://github.com/LouisFoucard/StereoConvNet

depth-estimation-using-data-driven-approaches's People

Contributors

arpitastugave avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.