Giter VIP home page Giter VIP logo

sfmlearner's Introduction

SfMLearner

This codebase implements the system described in the paper:

Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe

In CVPR 2017 (Oral).

See the project webpage for more details. Please contact Tinghui Zhou ([email protected]) if you have any questions.

Prerequisites

This codebase was developed and tested with Tensorflow 1.0, CUDA 8.0 and Ubuntu 16.04.

Running the single-view depth demo

We provide the demo code for running our single-view depth prediction model. First, download the pre-trained model from this Google Drive, and put the model files under models/. Then you can use the provided ipython-notebook demo.ipynb to run the demo.

Preparing training data

In order to train the model using the provided code, the data needs to be formatted in a certain manner.

For KITTI, first download the dataset using this script provided on the official website, and then run the following command

python data/prepare_train_data.py --dataset_dir=/path/to/raw/kitti/dataset/ --dataset_name='kitti_raw_eigen' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=128 --num_threads=4

For the pose experiments, we used the KITTI odometry split, which can be downloaded here. Then you can change --dataset_name option to kitti_odom when preparing the data.

For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip, 2) camera_trainvaltest.zip. Then run the following command

python data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=171 --num_threads=4

Notice that for Cityscapes the img_height is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.

Training

Once the data are formatted following the above instructions, you should be able to train the model by running the following command

python train.py --dataset_dir=/path/to/the/formatted/data/ --checkpoint_dir=/where/to/store/checkpoints/ --img_width=416 --img_height=128 --batch_size=4

You can then start a tensorboard session by

tensorboard --logdir=/path/to/tensorflow/log/files --port=8888

and visualize the training progress by opening https://localhost:8888 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~100K iterations when training on KITTI.

Notes

After adding data augmentation and removing batch normalization (along with some other minor tweaks), we have been able to train depth models better than what was originally reported in the paper even without using additional Cityscapes data or the explainability regularization. The provided pre-trained model was trained on KITTI only with smooth weight set to 0.5, and achieved the following performance on the Eigen test split (Table 1 of the paper):

Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
0.183 1.595 6.709 0.270 0.734 0.902 0.959

When trained on 5-frame snippets, the pose model obtains the following performanace on the KITTI odometry split (Table 3 of the paper):

Seq. 09 Seq. 10
0.016 (std. 0.009) 0.013 (std. 0.009)

Evaluation on KITTI

Depth

We provide evaluation code for the single-view depth experiment on KITTI. First, download our predictions (~140MB) from this Google Drive and put them into kitti_eval/.

Then run

python kitti_eval/eval_depth.py --kitti_dir=/path/to/raw/kitti/dataset/ --pred_file=kitti_eval/kitti_eigen_depth_predictions.npy

If everything runs properly, you should get the numbers for Ours(CS+K) in Table 1 of the paper. To get the numbers for Ours cap 50m (CS+K), set an additional flag --max_depth=50 when executing the above command.

Pose

We provide evaluation code for the pose estimation experiment on KITTI. First, download the predictions and ground-truth pose data from this Google Drive.

Notice that all the predictions and ground-truth are 5-frame snippets with the format of timestamp tx ty tz qx qy qz qw consistent with the TUM evaluation toolkit. Then you could run

python kitti_eval/eval_pose.py --gtruth_dir=/directory/of/groundtruth/trajectory/files/ --pred_dir=/directory/of/predicted/trajectory/files/

to obtain the results reported in Table 3 of the paper. For instance, to get the results of Ours for Seq. 10 you could run

python kitti_eval/eval_pose.py --gtruth_dir=kitti_eval/pose_data/ground_truth/10/ --pred_dir=kitti_eval/pose_data/ours_results/10/

KITTI Testing code

Depth

Once you have model trained, you can obtain the single-view depth predictions on the KITTI eigen test split formatted properly for evaluation by running

python test_kitti_depth.py --dataset_dir /path/to/raw/kitti/dataset/ --output_dir /path/to/output/directory --ckpt_file /path/to/pre-trained/model/file/

Pose

We also provide sample testing code for obtaining pose predictions on the KITTI dataset with a pre-trained model. You can obtain the predictions formatted as above for pose evaluation by running

python test_kitti_pose.py --test_seq [sequence_id] --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file /path/to/pre-trained/model/file/

A sample model trained on 5-frame snippets can be downloaded at this Google Drive.

Then you can obtain predictions on, say Seq. 9, by running

python test_kitti_pose.py --test_seq 9 --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file models/model-100280

Other implementations

Pytorch (by Clement Pinard)

Disclaimer

This is the authors' implementation of the system described in the paper and not an official Google product.

sfmlearner's People

Contributors

huang-jin avatar tinghuiz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sfmlearner's Issues

Pose evaluation code and multi GPU training

First of all, thanks a lot for releasing the code, the paper is amazing!

I just wanted to ask, when are you planning on releasing the evaluation code and whether you're planning on adding multi-GPU training as well.

Cheers,
Maciej

1.problem about the process of generating IMG frames 2.some errors during the training data

  1. when executing python data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=171 --num_threads=4 The result is 5 frames,such as:
    5frames,while executing on other pc(gpu:1080ti),it's 3 frames:
    3frames

2.Follow the steps from python train.py --dataset_dir=/path/to/the/formatted/data/ --checkpoint_dir=/where/to/store/checkpoints/ --img_width=416 --img_height=128 --batch_size=4the result is :
result
so i would to know how to solve this problems ? looking forward to your reply ,thank you very much !

Artifact in some test images?

Hi @tinghuiz ,

I tried to test your code and model on kitti test split (200 images, not eigen split). I observed that there are artifacts one some images. Like the ones below. Have you encountered this before? Can you give any hints why this could happen?

000001_10
000002_10
000003_10

Thank you!

How to train on dataset for which we don't have calibration information

Hi,

I am trying to implement this architecture by passing video dataset taken from youtube. I have been able to create the dataset having three sequence of frames together. I don't have the calibration matrix for this videos, so how best can i train using the following architecture.

How to derive the camera calibration matrix directly knowing the camera properties.

I am thinking of trying to run this architecture by taking video from my mobile but getting blocked in the camera calibration matrix part.

Any help would be greatly appreciated.

how to make myself pose trajectory file

now I have complete the training process, and want to output the pose trajectory according to the training model. In your page, I need to run python test_kitti_pose.py --test_seq [sequence_id] --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file /path/to/pre-trained/model/file/ , so where is KITTI/odometry/set/, you mean I should make own test set? how to make the test set ?

Error when run this code on jupyter notebook

I have read your relative paper and believe this work will a great one. When I try your code on jupyter notebook with ubuntu 16.04, tensorflow. I encounter an error:

TypeError: concat() got an unexpected keyword argument 'axis'

the details are as following
%pylab inline
from future import division
import os
import numpy as np
import scipy.misc
import tensorflow as tf
from SfMLearner import SfMLearner
from utils import *

mode = 'depth'
img_height=128
img_width=416
ckpt_file = 'models/model-145248'
I = scipy.misc.imread('misc/sample.png')
I = scipy.misc.imresize(I, (img_height, img_width))
Populating the interactive namespace from numpy and matplotlib
In [4]:

sfm = SfMLearner(batch_size=1,
img_height=img_height,
img_width=img_width)
sfm.setup_inference_graph(mode=mode)

TypeError Traceback (most recent call last)
in ()
2 img_height=img_height,
3 img_width=img_width)
----> 4 sfm.setup_inference_graph(mode=mode)

/home/timing/Program/SfMLearner/SfMLearner.pyc in setup_inference_graph(self, mode)
37 def setup_inference_graph(self, mode='depth'):
38 if mode == 'depth':
---> 39 self.build_depth_test_graph()
40
41 def inference(self, inputs, sess, mode='depth'):

/home/timing/Program/SfMLearner/SfMLearner.pyc in build_depth_test_graph(self)
18 input_mc = self.preprocess_image(input_uint8)
19 with tf.name_scope("depth_prediction"):
---> 20 pred_disp, depth_net_endpoints = disp_net(input_mc)
21 pred_depth = [1./disp for disp in pred_disp]
22 pred_depth = pred_depth[0]

/home/timing/Program/SfMLearner/nets.pyc in disp_net(tgt_image, is_training)
104 # There might be dimension mismatch due to uneven down/up-sampling
105 upcnv7 = resize_like(upcnv7, cnv6b)
--> 106 i7_in = tf.concat([upcnv7, cnv6b], axis=3)
107 icnv7 = slim.conv2d(i7_in, 512, [3, 3], stride=1, scope='icnv7')
108

TypeError: concat() got an unexpected keyword argument 'axis'

Generating the actual depth values of the kitti dataset

hi,
I tested your evaluation code it work fine with the prediction downloaded.
But im trying to find a prediction for a specific image from the kitti data set.
in which format do you save the prediction of the disparity.
im more interested in the actual depth values of a single image.

How to train on NYU dataset

Hi,

I have went through the data loader code and realised that the code is made for the kitti dataset folder structure. Have anyone tried loading different dataset like NYU?

networks weight decay

Hello, I have been trying to replicate your results on my own pytorch implementation, but had some trouble converging with your hyper parameters.

Especially, the weight decay you use seems very large to me : https://github.com/tinghuiz/SfMLearner/blob/master/nets.py#L27

Weight decay is usually around 5e-5 ~ 5e-4 and here it is 0.05 ! When using it, my two networks just go to zero very quickly.

As I am not very familiar with tf.slim, I have done some research, and I am not sure you actually apply weight regularization, since apparently you have to call slim.losses.get_total_loss()

This also corroborates the fact that trying to set l2 regularization to extreme values (like 50.0) doesn't change anything.

The good news here are if weight decay is indeed not applied to your network, you might have something interesting to work on if you want to improve even more your results !

Clément

Trained your network locally, but my eval. result is not as good as yours

Hi,

Thanks for sharing your wonderful works.
I followed your readme file, and it seems that everything goes well until when I see different evaluation result compared with yours (Ours(K) in table 1 of your paper) when I do evaluate based on locally trained weight. During training, I used the same parameters as suggested on this webpage.

The followings metrics which I got:
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.2621, 3.6171, 8.2036, 0.3806, 0.0000, 0.6577, 0.8520, 0.9258

I checked and reviewed procedure, but I cannot find any hint for the above different evaluation result.
Let me know what do you think about it.

Regards,
CJ

Question about seq_length

I am trying to train and test the SfMLeaner model.
But I found that seq_length's values in two files(train.py and test_kitti_pose.py) are different.
seq_length is 3 in train.py and 5 in test_kitti_pose.py

I set the value of seq_length to same number(3 or 5) and trained and tested, but error was raised.

How to fix it?

Parameter updating by novel view synthesis

Hi @tinghuiz,

The method _interpolate in file utils.py samples pixels in the source image with tf.gather, given the indices computed from intrinsics matrices and depth. The gradient of tf.gather is not computed w.r.t. the indices, so backpropagation does not reach the tensor of the computed depth (or, similarly, the pose). So,

How does the network update the depth (pose) parameters? Am I missing something in the analysis?

Cheers.

question about cityscapes dataset preparation

Hi, @tinghuiz ,

I'm trying to prepare cityscapes dataset. According to the download page , generally we'll download "leftImg8bit_trainvaltest.zip" which are not consecutive frames and used for segmentation tasks. I guess we should download a dataset with consecutive frames, could you reply the name of exact dataset zip file to prepare? e.g. leftImg8bit_trainextra.zip / leftImg8bit_demoVideo.zip / leftImg8bit_sequence_trainvaltest.zip / etc (which one ?)

Thanks!

src to target image projection

Hi,
I am trying to expand your project for my thesis and I figure out that there might be a logical error in your implementation and even in the paper itself (figure 3 and section 3.1).
As I understood, you are extracting depth information from target image due to:

SfMLearner/SfMLearner.py

Lines 30 to 31 in febf0d3

pred_disp, depth_net_endpoints = disp_net(tgt_image,
is_training=True)

and then pass this depth along with curr_src_image_stack into the projective_inverse_warp function to calculate curr_proj_image

SfMLearner/SfMLearner.py

Lines 68 to 72 in febf0d3

curr_proj_image = projective_inverse_warp(
curr_src_image_stack[:,:,:,3*i:3*(i+1)],
tf.squeeze(pred_depth[s], axis=3),
pred_poses[:,i,:],
intrinsics[:,s,:,:])
.
curr_proj_error = tf.abs(curr_proj_image - curr_tgt_image)
and also equation1 in original paper show that you are trying to build the target image based on pixel coordinates of src images
def pixel2cam(depth, pixel_coords, intrinsics, is_homogeneous=True):

If it is true, you need to feed depth information of "src image" not "target image" into the projective_inverse_warp function, or you can use target image and its depth to build src image(figure 3 in the paper shows that the target images pixels will be projected into the src view but section 3.1 and equation1 contradict this subject).

About connect the 5-frames snippets into a complete sequence

I want to test the whole sequence pose estimate, but the pose net output the 5-frames snippets. By observing the 5-frames snippets(format: timestamp tx ty tz qx qy qz qw), the world coordinate system in each snippets is different ,how to change it to the same world coordinate system,such as reference frame is the NO.0 frame.And how to reduce the error while project to the same coordinate system.

Evaluation training on higher resolutions

How did you obtain the depth predictions on Make3D? Can the network be evaluated/trained on higher resolution inputs or do you need to scale the inputs accordingly?

question about kitti dataset preparation

Hi, @tinghuiz ,

I also hope to have a clarification on kitti dataset preparation.
First , according to the kitti raw download page , there are up to 100 sequences (e.g. 2011_09_26_drive_0001, 2011_09_26_drive_0002, ...) . Could you clarify which sequences should we download for preparation?

Thanks!

pose evaluation

While evaluating the pose, there differ a lot between the data you provided in pose_eval_data(with the 

directory name "ours results") and the result run by test_kitti_pose.py using the model 100892.
Take frame 000860 in sequence 10 for example, the car is turning right. In "ours results", the tz/tx is
almost 7, it's reasonable. But the model 100892 output is not right, tz/tx is almost 150, meaning there is
no turning at all. Have you test the pose estimate performace of the model 100892?

Single view depth has limitation of generalizing

I think that single view depth has more problems generalizing to previously unseen types of images. For example, when I rotate the input image 180 degrees, the depth output is totally wrong. The input of two images is necessary to get robust results.

training error

Hi, @tinghuiz ,

I successfully train kitti_raw dataset. However, during the training at Epoch: [20] [ 8722/10062] , it got
Segmentation fault (core dumped). Details are the following:

Epoch: [20] [ 3822/10062] time: 0.5334/it loss: 0.769
 [*] Saving checkpoint to /data3/kitti_raw_checkpoints/...
Epoch: [20] [ 3922/10062] time: 0.5566/it loss: 0.697
Epoch: [20] [ 4022/10062] time: 0.5485/it loss: 0.813
Epoch: [20] [ 4122/10062] time: 0.5367/it loss: 0.772
Epoch: [20] [ 4222/10062] time: 0.5849/it loss: 0.865
Epoch: [20] [ 4322/10062] time: 0.5584/it loss: 0.811
Epoch: [20] [ 4422/10062] time: 0.5920/it loss: 0.816
Epoch: [20] [ 4522/10062] time: 0.5262/it loss: 0.715
Epoch: [20] [ 4622/10062] time: 0.5609/it loss: 0.823
Epoch: [20] [ 4722/10062] time: 0.5624/it loss: 0.691
Epoch: [20] [ 4822/10062] time: 0.5650/it loss: 0.705
Epoch: [20] [ 4922/10062] time: 0.5585/it loss: 0.675
Epoch: [20] [ 5022/10062] time: 0.5590/it loss: 0.720
Epoch: [20] [ 5122/10062] time: 0.5712/it loss: 0.648
Epoch: [20] [ 5222/10062] time: 0.5852/it loss: 0.756
Epoch: [20] [ 5322/10062] time: 0.5354/it loss: 0.759
Epoch: [20] [ 5422/10062] time: 0.5403/it loss: 0.772
Epoch: [20] [ 5522/10062] time: 0.5520/it loss: 0.845
Epoch: [20] [ 5622/10062] time: 0.5396/it loss: 0.766
Epoch: [20] [ 5722/10062] time: 0.5183/it loss: 0.729
Epoch: [20] [ 5822/10062] time: 0.5255/it loss: 0.727
Epoch: [20] [ 5922/10062] time: 0.5384/it loss: 0.606
Epoch: [20] [ 6022/10062] time: 0.5533/it loss: 0.663
Epoch: [20] [ 6122/10062] time: 0.5903/it loss: 0.640
Epoch: [20] [ 6222/10062] time: 0.5318/it loss: 0.738
Epoch: [20] [ 6322/10062] time: 0.5226/it loss: 0.807
Epoch: [20] [ 6422/10062] time: 0.5603/it loss: 0.889
Epoch: [20] [ 6522/10062] time: 0.5770/it loss: 0.845
Epoch: [20] [ 6622/10062] time: 0.5342/it loss: 0.649
Epoch: [20] [ 6722/10062] time: 0.5349/it loss: 0.681
Epoch: [20] [ 6822/10062] time: 0.5585/it loss: 0.631
Epoch: [20] [ 6922/10062] time: 0.5523/it loss: 0.881
Epoch: [20] [ 7022/10062] time: 0.5862/it loss: 0.813
Epoch: [20] [ 7122/10062] time: 0.5370/it loss: 0.828
Epoch: [20] [ 7222/10062] time: 0.5419/it loss: 0.767
Epoch: [20] [ 7322/10062] time: 0.5301/it loss: 0.731
Epoch: [20] [ 7422/10062] time: 0.5466/it loss: 0.643
Epoch: [20] [ 7522/10062] time: 0.5471/it loss: 0.835
Epoch: [20] [ 7622/10062] time: 0.5313/it loss: 0.774
Epoch: [20] [ 7722/10062] time: 0.5442/it loss: 0.825
Epoch: [20] [ 7822/10062] time: 0.5720/it loss: 0.697
Epoch: [20] [ 7922/10062] time: 0.5723/it loss: 0.700
Epoch: [20] [ 8022/10062] time: 0.6266/it loss: 0.880
Epoch: [20] [ 8122/10062] time: 0.5573/it loss: 0.783
Epoch: [20] [ 8222/10062] time: 0.5382/it loss: 0.705
Epoch: [20] [ 8322/10062] time: 0.5158/it loss: 0.896
Epoch: [20] [ 8422/10062] time: 0.5315/it loss: 0.796
Epoch: [20] [ 8522/10062] time: 0.5571/it loss: 0.651
Epoch: [20] [ 8622/10062] time: 0.6167/it loss: 0.880
Epoch: [20] [ 8722/10062] time: 0.5562/it loss: 0.855
Segmentation fault (core dumped)
(tf_1.0) root@milton-All-Series:/data/code2/SfMLearner#

I used the following command to train

 python train.py --dataset_dir=/data3/kitti_raw_formatted/ --checkpoint_dir=/data3/kitti_raw_checkpoints/ --img_width=416 --img_height=128 --batch_size=4 --smooth_weight=0.5 --explain_reg_weight=0.2

Any suggestions to fix it?
THX!

Question about the depth prediction result

Hi tinghuiz,
I tried your code recently. It worked cool! But after completing all the steps you stated on the website, I found a problem here.

I believe I strictly followed your steps, and trained the model for 200K steps (the default value in your code). The evaluated depth accuracy I got was as follows:
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.2331, 3.7935, 7.4710, 0.3076, 0.0000, 0.7006, 0.8807, 0.9453

I also tried to disable the pose net and fit the ground truth pose to the model, and got the results shown below.
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.1889, 2.1794, 6.6112, 0.2708, 0.0000, 0.7413, 0.9053, 0.9597

It seems still worse than the result you post on the page. Is there any further fine-tuning should I make to reach the results rather than simply running the code directly?

Thanks!

pose question

Hi,
It's not an error.
I wanna to know why the image_seq's shape is looking like that.
For a 5-sequence case,
(128, 416, 3) => np.hstack => (128, 832, 3) => np.hstack => ... => (128, 2080, 3)
I thought that it would be (5, 128, 416, 3) for the image_seq!
pred = sfm.inference(image_seq[None, :, :, :], sess, mode='pose')
Is there any help?
THANKS

Generating ground truth for 3 frame snippet

Hello,

Firstly, thank you for sharing the code. I am using your code in my project. Could you please tell me how did you make the ground truth files for every 5 frame snippet. I trained the model on 3 frames snippet. I want to make ground truth for them.

The way to save the disparity into image

Hi,
I follow the demo code and try to save the depth image by using test_kitti_depth.py
Images I saved are not what I expected.......
The original input image:
0000000069
The predicted depth:
keisha_disp
The resized predicted depth:
keisha_disp_resize

those predictions I saved don't look like a proper depth map. Is there anything that I misunderstood?
Here is my code:

pred = sfm.inference(inputs, sess, mode='depth')
pred_disp = normalize_depth_for_display(pred['depth'][0,:,:,0])
cv2.imwrite('data/kitti/raw_disp.png', pred_disp)
fh_disp = open('data/kitti/raw_disp.png', 'r')
raw_disp = pil.open(fh_disp)
img_disp = raw_disp.resize((1242, 375), pil.ANTIALIAS)
img_disp.save('data/kitti/disp_resize.png')

Thanks

Pretrained model and evaluation

Hi @tinghuiz , thanks for releasing the code. Seems that you did not provide the full pipeline (training+testing+evaluation), for now you just released the test results in .npy files, but not the testing code.

I wonder if the provided pretrained model is the one you used in the paper, I want to use it to test the images in eigen's test split and evaluate it. Also, can you provide the pretrained model for pose estimation? I want to see if the numbers are consistent with the paper.

Thank you so much!

Question about pose estimation network

I am wondering how you can use so many downsampling convolutions in your pose network though the input image is 128 pixels in height. The resulting feature map would have a height below 0 with all the convolutions or am I wrong? Are you using zero padding or something for the convolutions? It's not clear for me to see.

Question about training difficulties for pose network

I have done my own implementation in torch of a similar model as yours that does train the depth and the camera pose transformation in the same unsupervised manner as you do.

However I have difficulties with the pose network that is somehow not able to learn the transformation properly. My implementation is right, I tested it with stereo examples and it is able to learn the depth and the static transformation of the stereo setup. However if I go on to a not static setup, like a monocular video, the pose network is not able to learn the different transformation parameters anymore. That even happens if I use stereo images and change the left with right image randomly, so that it has to learn just 2 different transformations.

So I am asking you if you experienced any such difficulties with your pose network? Did you use the same learning rate for the depth and the pose network? And how did you normalize the image inputs (range between -1 and 1?)?

How to evaluate?

Hi @tinghuiz,

Thank you for sharing with us the training code. How should I evaluate based on the output? I observed that the depth output is resized to [0,1]. How can I restore the absolute depth value for evaluation?

Thank you

about batch normalization

hi, you said BN is used for all layers except for output layers in paper, but I cannot find BN in code. As you explained in Notes, remove BN helps training, could you explain it or give me some insights?

Run the eval_depth.py, but get error of files missing which do exist

Hi , I just encounter this situation.
I was trying to run evaluation code, but it keeps saying that the file is missing.
To be clarified, all my data are in JPG format, so I altered the "test_files_eigen.txt" to match my data.

Files do exist and file paths are totally correct, but it says that my files are missing.

Here are the error messages:

$python kitti_eval/eval_depth.py --kitti_dir=~/Downloads/datasets/KITTI/ --pred_file=kitti_eval/kitti_eigen_depth_predictions.npy
~/Downloads/datasets/KITTI/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000069.jpg missing
~/Downloads/datasets/KITTI/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000054.jpg missing
~/Downloads/datasets/KITTI/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000042.jpg missing
...

Could anyone help me to solve the problem?
Thanks

Reasonable Training Loss

Hi, I have three questions shown as follows:

  1. How low the value of the loss can result in reasonable depth prediction?
    I've tried to train on other dataset (ViDRILO), so I'm not quite sure if 100K iterations of training is enough.

  2. Still wonder about where the log files are saved. Does the log file look like events.out.tfevents.1505446260.joy
    tensorboard --logdir=/path/to/tensorflow/log/files --port=8888

when launching the https://localhost:8888, errors show up.
E0916 00:12:21.511262 Thread-1 _internal.py:87] 127.0.0.1 - - [16/Sep/2017 00:12:21] code 400, message Bad request syntax ("\x16\x03\x01\x00\xc0\x01\x00\x00\xbc\x03\x03\x87\xf8Z'#5\x05[\x0f/\x89\xaf\x80G\x07\x98\xad.\xcb\xeah\xdeh\x1br\x87>\xc5\x90+\xed\xd3\x00\x00\x1c\x8a\x8a\xc0+\xc0/\xc0,\xc00\xcc\xa9\xcc\xa8\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00")

  1. Where is the pose model saved?
    After training, I just got one model (depth model).

About cityscapes datesets

In the link about Cityscapes, I didn't find a data set called leftImg8bit_sequence_trainvaltest.zip. Is it a change of name or other reason?

How to separate model params?

When trainning, depthnet and posenet params saved together, but the pretrained models you released are separated, how can I separate them when trainning?
And, if I have pretrained depthnet, if I want to used the pretrained depthnet params, How can i do it?

test_kitti_pose.py: error: batch_unpack_image_sequence

When I run the test_kitti_pose.py, I got the error for missing function batch_unpack_image_sequence

Traceback (most recent call last):
  File "test_kitti_pose.py", line 97, in <module>
    main()
  File "test_kitti_pose.py", line 62, in main
    FLAGS.seq_length)
  File "/media/cssp/Data/JoyYang/SfMLearner/SfMLearner.py", line 306, in setup_inference
    self.build_pose_test_graph()
  File "/media/cssp/Data/JoyYang/SfMLearner/SfMLearner.py", line 273, in build_pose_test_graph
    self.batch_unpack_image_sequence(
AttributeError: 'SfMLearner' object has no attribute 'batch_unpack_image_sequence'

But the batch_unpack_image_sequence is defined at data_loader.py for sure.
Could you tell me how to fix it?

script for generating groundtruth poses on kitti odom

Hi, tinghui

Thanks for your generous sharing of the code! While I encountered some problems during generating groundtruth poses on kitti_odom dataset with a different sequence length setting. I take the official groundtruth poses as a projection matrix, that is in a [R, t] form. While my generated poses differ from yours. Is there any problem in my understanding of the official pose format, or would you like to share your script for generating groundtruth poses?

How to find the define for the funtion projective_inverse_warp ?

hello, The research is great work in monocular depth and camera motion estimation!
I read most of your code in SfMlearner.py, but I can not find the function projective_inverse_warp at SfMlearner.py line-67#. could you tell me how to find it? I think it's the key to understand the formulation-2 in your paper!

Precondition Error when is_training is set to false

I noticed that when the depth test graph is being build, the is_training argument for disp_net is not set to False. Won't this negatively affect the test performance, as the batch normalization won't be configured properly?

When setting this argument to True, an exception is raised. (Related to batch norm)

FailedPreconditionError: Attempting to use uninitialized value depth_net/upcnv3/BatchNorm/moving_mean
	 [[Node: depth_net/upcnv3/BatchNorm/moving_mean/read = Identity[T=DT_FLOAT, _class=["loc:@depth_net/upcnv3/BatchNorm/moving_mean"], _device="/job:localhost/replica:0/task:0/gpu:0"](depth_net/upcnv3/BatchNorm/moving_mean)]]
	 [[Node: depth_prediction/truediv/_131 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_459_depth_prediction/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I get this when using the model that was provided in the "download_model.sh" script

Traceback on demo.ipypb

First cell
`

UnicodeDecodeError Traceback (most recent call last)
in ()
12 ckpt_file = 'models/model-190532'
13 fh = open('misc/sample.png', 'r')
---> 14 I = pil.open(fh)
15 I = I.resize((img_width, img_height), pil.ANTIALIAS)
16 I = np.array(I)

~/anaconda3/envs/Tensorflow/lib/python3.6/site-packages/PIL/Image.py in open(fp, mode)
2484 exclusive_fp = True
2485
-> 2486 prefix = fp.read(16)
2487
2488 preinit()

~/anaconda3/envs/Tensorflow/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte`

Running Error

Hi, @tinghuiz ,

Thanks for releasing this package.
However, when I run through the following cell, I got this error

saver = tf.train.Saver([var for var in tf.trainable_variables()]) 
with tf.Session() as sess:
    saver.restore(sess, ckpt_file)
    pred = sfm.inference(I[None,:,:,:], sess, mode=mode)

My system is ubuntu 14.04, tensorflow 1.0 and python 3.5.

Any suggestion to fix this issue?

THX!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.