Giter VIP home page Giter VIP logo

depth-estimation's Introduction

Depth Estimation by Convolutional Neural Networks

This is a repository for my master's thesis - Depth estimation by CNNs. You can read the whole thesis here. Here I briefly present the solution and results.

Architecture:

I use architecture similar to the one used by Eigen et al. with the difference, that I also use network that estimates gradients of the depth map:

Architecture

For the global context network I use pretrained AlexNet, gradient network is a convolutional part of AlexNet, and the refining network is also fully convolutional, more details in the thesis. I trained each part separately, first global context network and the gradient network, after that I fixed their parameters and trained the refining network.

Normalized loss function:

For training the global context network and the refining network I wanted to use scale invariant loss, similar to the one used by Eigen et al., but I took it a step further and I used loss function that is scale-and-translation invariant. I would put an equation here, but I couldn't find how to do that easy. Luckily - it can be explained fairly easily in words: to obtain normalized depth map you just subtract its mean and divide by its variance. Normalized loss is just a squared distance between normalized output depth map and the target depth map. This showed to improve speed of convergence significantly.

Trained model

You can download the trained model here.

Results:

I made several experiments for the thesis, you can have a look at all of them in the chapter 5 of the thesis. Here I present just the most significant ones. All experiments were performed on NYU Depth v2 dataset.

Comparison of different loss functions

I trained the refining network with different loss functions for 60 000 iterations.

Losses

From left to right: input; squared distance loss; squared distance loss in log space; scale invariant loss by Eigen et al.; normalized loss; ground truth

As you can see, networks utilizing other loss functions produce ineligible outputs compared to network using normalized loss. Difference is reduced when the network is trained longer (Eigen et al. ran the training for ~1.5M iterations, here it's just 60k).

Comparison to existing solutions

How does the model fare against existing solutions? I compared results of my model to the results from this [1] and this [2] papers, both by Eigen et al.. Model with normalized loss has trouble estimating absolute depth values, but it estimates relative structure of the depth map fairly well. To test this I substitued mean and variance of the ground truth to the output depth map and this model I called 'model with oracle'. It achieves state of the art performance in RMSE metric at the time of writing the thesis. Keep in mind that this model just aims to prove that model trained with normalized loss estimates the structure of the depth map well, regardless of absolute depth values.

[1] [2] Proposed model With Oracle
RMSE 0.907 0.641 1.169 0.569

Comparison to eigen

From left to right by columns: input image, ground truth; [1], proposed model; [2], model with oracle

Usage

python test_depth.py INPUT_DIR GT_DIR OUT_DIR SNAPSHOTS_DIR [--log]

  • INPUT_DIR is the path to the folder containing input images
  • GT_DIR is the path to the folder containing ground truth depth maps
  • OUT_DIR is the path to the folder to which will be written output depth maps
  • SNAPSHOTS_DIR is the path to the folder containing .caffemodel files containing trained network models. All models from this folder will be evaluated.
  • --log switch is used when the depth values that are produced by the network are in log space

Frameworks/Libraries needed:

  • Caffe
  • Python2.7: caffe, scipy, scikit-image, numpy, pypng, cv2, Pillow, matplotlib

Few notes

  • input images should be named in a same way as the corresponding ground truths, with difference that input images should have a suffix 'colors', while ground truth images should have a suffix 'depth'. Note that these suffixes should preceed file extension, e.g., 'image1_colors.png' and corresponding depth map 'image1_depth.png'
  • along with .caffemodel file, corresponding deploy network definition file has to be placed into SNAPSHOTS_DIR, with the same name as the model file but with different extension 'prototxt' instead of 'caffemodel'
  • there will actually be two output folders created, one OUT_DIR and the other OUT_DIR + '_abs'. OUT_DIR contains output depths that are fit using MVN normalization onto ground truth, OUT_DIR + '_abs' contains the raw output depth maps.
  • note that you need AlexNet caffemodel for the training of the global context network, gradient network and their joint configuration. It can be downloaded here: https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet

depth-estimation's People

Contributors

janivanecky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

depth-estimation's Issues

RuntimeWarning: invalid value encountered in true_divide

Hi @janivanecky
With your guide, I can run the code now. Thanks! However, there is a warning:
/usr/lib/python2.7/dist-packages/scipy/ndimage/interpolation.py:535: RuntimeWarning: invalid value encountered in true_divide
zoom = (numpy.array(input.shape) - 1) / zoom_div

Does it matter?

By the way, why do we need ground truth depth map as an input? You know the goal is using the model to estimate depth map without providing ground truth depth map.

Thanks,
Yingjun

About the dataset codes

Hello ,

When I run poccess_raw.m, I got into several errors:

1)Error in get_synched_frames (line 20)
files = regexp(ls(sceneDir), '(\s+|\n)', 'split');
And the error shows:The 'STRING' input must be either a char row vector, a cell array of char row vectors, or a string array.

2)When I used cellstr to change it into a cell array. I got a second error:
Error using sort
Input argument must be a cell array of character vectors.

Error in get_synched_frames (line 24)
files = sort(files);

I am not really familiar with matlab, so can you give me some hints or advice about that?
By the way, I am using the matlab 2018, can it be a problem of the matlab version ? Which version did you use and Did you meet these errors too?

Can I use my data set while executing test_depth.py

Can I use my data set while executing test_depth.py, if yes how can i pass my images and i'm getting below mentioned result
TOP 0 for AbsRelDiff

TOP 0 for SqrRelDiff

TOP 0 for RMSE

TOP 0 for RMSELog

TOP 0 for SIMSE

TOP 0 for Log10

TOP 0 for MVN

TOP 0 for Threshold 1.25

TOP 0 for Threshold 1.25^2

TOP 0 for Threshold 1.25^3

And also two directors created but no output in the folders

get depth of new image

Hi,

I am trying to do inference of the trained model and get the depth of one of my images from KITTI dataset.

I am not quite sure how to do it. I have tried creating firstly lmdb file with only my image and the running the get_depth script but it does not work.

I have been able to run process_test.sh that calculates the depth of the nyu_depth_v2_dataset but I am not sure how to do it on new images.

Can you explain in detail please how could I obtain depth map of own image?

thanks

Cannot make the data with process_raw

Command windows output:

process_raw
590

 1

basement_0001a
0

 0

Found 0 depth, 0 rgb images, and 0 accel dumps.

filecount:1
filecount to process:1
Error using process_raw (line 29)
Reference to non-existent field 'rawRgbFilename'.

I check everything 10 times and cannot make it work, its like basement_0001a contain nothing but its contain all the data needed when I check from the file folder....

Thanks for your help

Can you share the source code of your modified caffe?

Hello, I am interested in your doctoral thesis. It is also interested in capturing depth information from a single image. I am a deep learning for beginners. I would like to refer to your modified Caffe source code. I wonder if it would be convenient for you to share it with me.
My email address is โ€œ[email protected]โ€
Thank you.

Learning to train the first component of the net

Hi @janivanecky
I have read your thesis, but I still do not grasp too much about training the net.
To train the first component, is the command like this: ./build/tools/caffe train --solver=solver.prototxt ?
If so, where should I add below lines?
solver = caffe.get_solver('solver.prototxt')
solver.net.copy_from('bvlc_alexnet.caffemodel')

Thanks for the help!
Yingjun

Depth value scaling on test and train

Question A) In your script in /dataset/test/crop.py. On lines 39-49 https://github.com/janivanecky/Depth-Estimation/blob/master/dataset/test/crop.py#L42 you have some logic that seems to be dividing the depth array by 65535.0. What is that value from? Is that the max depth value that the kinect sensor can get?

Question B) Also, are the ground truth depth maps for the gradient network scaled from [0, 255] or from [0, 10]?

Question C) If the ground truth depths for the gradient network (that are of shape 75, 54), are scaled from [0, 255], are you simply converting them to [0, 10] inside the loss function? Specifically the ScaleInvariantMeanSquaredError loss function inside /source/global_context_network/eval_depth.py. https://github.com/janivanecky/Depth-Estimation/blob/master/eval_depth.py#L44. I don't really understand what is being passed to the LogDepth() function because if you divide by 10.0 then multiply by 10.0 isn't that the same thing as multiplying by 1?

Thanks!

Creating training dataset

Hi @janivanecky,
I'm getting an error while running train.py from global_context_network folder. Error is
Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: data (known types: AbsVal, Accuracy,...

I've created .lmdb with a python script where original images are rgb and gt is grayscale. I'm not sure if these correct at all.

Can you guide me on these initial points,

  1. How to create dataset (images dimensions, rbg or grayscall)? If you can share some sample images for original and gt that would be great.
    2 How to generate .lmdb file?

Thank you.

Inference scipt?

Hello, I am new to Caffe. I was wondering if you had a script to read the trained caffe model and apply it on a test image. I am not sure how to get an image output after reading the caffemodel file.

Any help is appreciated.
Thank you.

model trained on Kitti

Hello,
Thanks for sharing your work on depth estimation.
I am wondering whether you have performed any experiments on the kitti data set.
I particular I am interested to know if you could provide your model trained on kitti for a fair comprison in my research work.

many thanks

Two problems to run the model

Hi @janivanecky

  1. Running test_depth.py will need eval_depth.py, but no this file in the project.
  2. Beside the above, it still needs model_norm_abs_100k.prototxt, no this file. So I changed the name of net_deploy.prototxt to model_norm_abs_100k.prototxt. Not sure if this is OK.

Thanks,
Yingjun

The corresponding prototxt of the trained model

Hi Jan, your thesis is great, I need to have a quick try with your trained model, but it just has a .caffemodel file, and which one is the corresponding .prototxt file for this trained model?

Why crop the image?

@janivanecky

Hi. Thanks for your great work. I have a question regarding the data processing. Why do you crop each image into the resolution of 561 x 427 in the process_row.m?

miss folders in `split_train_set.sh` script

Hi @janivanecky ,

Thanks for your work.

I try to use your code to get the train images from your code to NYU_Depth_V2 dataset.

but there are some miss folders in the raw dataset comparing to the results from get_train_scenes.m

such as: 'book_store0000' which is in the results of get_train_scenes and have no corresponding folder in the raw dataset. I download the split-file of raw datasets and the one-only file of the raw datasets, this folder doesn't exist in both datasets.

Do you have any ideas to solve this problem?

or just ignore this folder is ok?

I am looking forward to your response.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.