tinghuiz / sfmlearner Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 554.0 6.96 MB

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

License: MIT License

Python 38.82% Jupyter Notebook 61.18%

deep-learning depth-prediction self-supervised-learning unsupervised-learning visual-odometry

sfmlearner's People

Contributors

Stargazers

Watchers

Forkers

ilovecv caomw issac8huxley bigsnarfdude ossdc chagge pandamax rap9430 lancelot899 benjamesbabala jwgu giserh spillai wanjinchang hkkoo1 joeking11829 stevenlol lyk125 viibridges ahojnnes 5059 hit2sjtu ml-lab newzhx cheneyfan hchrobo wangsheng1991 chenyncv damonmin yanleirex cmxnono sjwang2015 chuangg peterzhousz deepmusic maelok liangzhangxd mylxiaoyi nkdhny perception-slam ieyer yangdegang asiron supersonic2022 bodhisattwa-chakraborty liruihao lraxue native93 silasxue llcc343 agentcole belalmksaid 19ai yan99033 whuwan mtourne whucvliu konanrobot wwwanghao kongan labimage dymat ottffive weimianli ywwang2013 robinzzb masonsuwei yanminbit aellaboudy aforechi hyaihjq atilaorh leiup scholltan skylook kimsoohwan thinkbiganalytics pangfumin royal-feng vbillys rvarun7777 v-mehta trungquy vvaquero dmax123 melights soonminhwang soledad89 mightychaos hyuantan eborboihuc lynn294555 aruncskumar shim94kr rolling0707 afakihcpr brotherhuang andrewraharjo adong7639 joyyang1215

sfmlearner's Issues

Question about training difficulties for pose network

I have done my own implementation in torch of a similar model as yours that does train the depth and the camera pose transformation in the same unsupervised manner as you do.

However I have difficulties with the pose network that is somehow not able to learn the transformation properly. My implementation is right, I tested it with stereo examples and it is able to learn the depth and the static transformation of the stereo setup. However if I go on to a not static setup, like a monocular video, the pose network is not able to learn the different transformation parameters anymore. That even happens if I use stereo images and change the left with right image randomly, so that it has to learn just 2 different transformations.

So I am asking you if you experienced any such difficulties with your pose network? Did you use the same learning rate for the depth and the pose network? And how did you normalize the image inputs (range between -1 and 1?)?

question about kitti dataset preparation

Hi, @tinghuiz ,

I also hope to have a clarification on kitti dataset preparation.
First , according to the kitti raw download page , there are up to 100 sequences (e.g. 2011_09_26_drive_0001, 2011_09_26_drive_0002, ...) . Could you clarify which sequences should we download for preparation?

Thanks!

Is there test results on indoor dataset?

Hi Tinghui,

Is there experiments conducted in indoor scenes? Like NYU dataset or SUN3d?

Generating ground truth for 3 frame snippet

Hello,

Firstly, thank you for sharing the code. I am using your code in my project. Could you please tell me how did you make the ground truth files for every 5 frame snippet. I trained the model on 3 frames snippet. I want to make ground truth for them.

training error

Hi, @tinghuiz ,

I successfully train kitti_raw dataset. However, during the training at Epoch: [20] [ 8722/10062] , it got
Segmentation fault (core dumped). Details are the following:

Epoch: [20] [ 3822/10062] time: 0.5334/it loss: 0.769
 [*] Saving checkpoint to /data3/kitti_raw_checkpoints/...
Epoch: [20] [ 3922/10062] time: 0.5566/it loss: 0.697
Epoch: [20] [ 4022/10062] time: 0.5485/it loss: 0.813
Epoch: [20] [ 4122/10062] time: 0.5367/it loss: 0.772
Epoch: [20] [ 4222/10062] time: 0.5849/it loss: 0.865
Epoch: [20] [ 4322/10062] time: 0.5584/it loss: 0.811
Epoch: [20] [ 4422/10062] time: 0.5920/it loss: 0.816
Epoch: [20] [ 4522/10062] time: 0.5262/it loss: 0.715
Epoch: [20] [ 4622/10062] time: 0.5609/it loss: 0.823
Epoch: [20] [ 4722/10062] time: 0.5624/it loss: 0.691
Epoch: [20] [ 4822/10062] time: 0.5650/it loss: 0.705
Epoch: [20] [ 4922/10062] time: 0.5585/it loss: 0.675
Epoch: [20] [ 5022/10062] time: 0.5590/it loss: 0.720
Epoch: [20] [ 5122/10062] time: 0.5712/it loss: 0.648
Epoch: [20] [ 5222/10062] time: 0.5852/it loss: 0.756
Epoch: [20] [ 5322/10062] time: 0.5354/it loss: 0.759
Epoch: [20] [ 5422/10062] time: 0.5403/it loss: 0.772
Epoch: [20] [ 5522/10062] time: 0.5520/it loss: 0.845
Epoch: [20] [ 5622/10062] time: 0.5396/it loss: 0.766
Epoch: [20] [ 5722/10062] time: 0.5183/it loss: 0.729
Epoch: [20] [ 5822/10062] time: 0.5255/it loss: 0.727
Epoch: [20] [ 5922/10062] time: 0.5384/it loss: 0.606
Epoch: [20] [ 6022/10062] time: 0.5533/it loss: 0.663
Epoch: [20] [ 6122/10062] time: 0.5903/it loss: 0.640
Epoch: [20] [ 6222/10062] time: 0.5318/it loss: 0.738
Epoch: [20] [ 6322/10062] time: 0.5226/it loss: 0.807
Epoch: [20] [ 6422/10062] time: 0.5603/it loss: 0.889
Epoch: [20] [ 6522/10062] time: 0.5770/it loss: 0.845
Epoch: [20] [ 6622/10062] time: 0.5342/it loss: 0.649
Epoch: [20] [ 6722/10062] time: 0.5349/it loss: 0.681
Epoch: [20] [ 6822/10062] time: 0.5585/it loss: 0.631
Epoch: [20] [ 6922/10062] time: 0.5523/it loss: 0.881
Epoch: [20] [ 7022/10062] time: 0.5862/it loss: 0.813
Epoch: [20] [ 7122/10062] time: 0.5370/it loss: 0.828
Epoch: [20] [ 7222/10062] time: 0.5419/it loss: 0.767
Epoch: [20] [ 7322/10062] time: 0.5301/it loss: 0.731
Epoch: [20] [ 7422/10062] time: 0.5466/it loss: 0.643
Epoch: [20] [ 7522/10062] time: 0.5471/it loss: 0.835
Epoch: [20] [ 7622/10062] time: 0.5313/it loss: 0.774
Epoch: [20] [ 7722/10062] time: 0.5442/it loss: 0.825
Epoch: [20] [ 7822/10062] time: 0.5720/it loss: 0.697
Epoch: [20] [ 7922/10062] time: 0.5723/it loss: 0.700
Epoch: [20] [ 8022/10062] time: 0.6266/it loss: 0.880
Epoch: [20] [ 8122/10062] time: 0.5573/it loss: 0.783
Epoch: [20] [ 8222/10062] time: 0.5382/it loss: 0.705
Epoch: [20] [ 8322/10062] time: 0.5158/it loss: 0.896
Epoch: [20] [ 8422/10062] time: 0.5315/it loss: 0.796
Epoch: [20] [ 8522/10062] time: 0.5571/it loss: 0.651
Epoch: [20] [ 8622/10062] time: 0.6167/it loss: 0.880
Epoch: [20] [ 8722/10062] time: 0.5562/it loss: 0.855
Segmentation fault (core dumped)
(tf_1.0) root@milton-All-Series:/data/code2/SfMLearner#

I used the following command to train

 python train.py --dataset_dir=/data3/kitti_raw_formatted/ --checkpoint_dir=/data3/kitti_raw_checkpoints/ --img_width=416 --img_height=128 --batch_size=4 --smooth_weight=0.5 --explain_reg_weight=0.2

Any suggestions to fix it?
THX!

pose evaluation

While evaluating the pose, there differ a lot between the data you provided in pose_eval_data(with the

directory name "ours results") and the result run by test_kitti_pose.py using the model 100892.
Take frame 000860 in sequence 10 for example, the car is turning right. In "ours results", the tz/tx is
almost 7, it's reasonable. But the model 100892 output is not right, tz/tx is almost 150, meaning there is
no turning at all. Have you test the pose estimate performace of the model 100892?

Error when run this code on jupyter notebook

I have read your relative paper and believe this work will a great one. When I try your code on jupyter notebook with ubuntu 16.04, tensorflow. I encounter an error:

TypeError: concat() got an unexpected keyword argument 'axis'

the details are as following
%pylab inline
from future import division
import os
import numpy as np
import scipy.misc
import tensorflow as tf
from SfMLearner import SfMLearner
from utils import *

mode = 'depth'
img_height=128
img_width=416
ckpt_file = 'models/model-145248'
I = scipy.misc.imread('misc/sample.png')
I = scipy.misc.imresize(I, (img_height, img_width))
Populating the interactive namespace from numpy and matplotlib
In [4]:

sfm = SfMLearner(batch_size=1,
img_height=img_height,
img_width=img_width)
sfm.setup_inference_graph(mode=mode)

TypeError Traceback (most recent call last)
in ()
2 img_height=img_height,
3 img_width=img_width)
----> 4 sfm.setup_inference_graph(mode=mode)

/home/timing/Program/SfMLearner/SfMLearner.pyc in setup_inference_graph(self, mode)
37 def setup_inference_graph(self, mode='depth'):
38 if mode == 'depth':
---> 39 self.build_depth_test_graph()
40
41 def inference(self, inputs, sess, mode='depth'):

/home/timing/Program/SfMLearner/SfMLearner.pyc in build_depth_test_graph(self)
18 input_mc = self.preprocess_image(input_uint8)
19 with tf.name_scope("depth_prediction"):
---> 20 pred_disp, depth_net_endpoints = disp_net(input_mc)
21 pred_depth = [1./disp for disp in pred_disp]
22 pred_depth = pred_depth[0]

/home/timing/Program/SfMLearner/nets.pyc in disp_net(tgt_image, is_training)
104 # There might be dimension mismatch due to uneven down/up-sampling
105 upcnv7 = resize_like(upcnv7, cnv6b)
--> 106 i7_in = tf.concat([upcnv7, cnv6b], axis=3)
107 icnv7 = slim.conv2d(i7_in, 512, [3, 3], stride=1, scope='icnv7')
108

TypeError: concat() got an unexpected keyword argument 'axis'

Artifact in some test images?

Hi @tinghuiz ,

I tried to test your code and model on kitti test split (200 images, not eigen split). I observed that there are artifacts one some images. Like the ones below. Have you encountered this before? Can you give any hints why this could happen?

Thank you!

How to train on NYU dataset

Hi,

I have went through the data loader code and realised that the code is made for the kitti dataset folder structure. Have anyone tried loading different dataset like NYU?

src to target image projection

Hi,
I am trying to expand your project for my thesis and I figure out that there might be a logical error in your implementation and even in the paper itself (figure 3 and section 3.1).
As I understood, you are extracting depth information from target image due to:

SfMLearner/SfMLearner.py

Lines 30 to 31 in febf0d3

 pred_disp, depth_net_endpoints = disp_net(tgt_image, 

 is_training=True)

and then pass this depth along with curr_src_image_stack into the projective_inverse_warp function to calculate curr_proj_image

SfMLearner/SfMLearner.py

Lines 68 to 72 in febf0d3

 curr_proj_image = projective_inverse_warp( 

 curr_src_image_stack[:,:,:,3*i:3*(i+1)], 

 tf.squeeze(pred_depth[s], axis=3), 

 pred_poses[:,i,:], 

 intrinsics[:,s,:,:])

SfMLearner/SfMLearner.py

Line 73 in febf0d3

curr_proj_error = tf.abs(curr_proj_image - curr_tgt_image)

and also equation1 in original paper show that you are trying to build the target image based on pixel coordinates of src images

SfMLearner/utils.py

Line 97 in febf0d3

def pixel2cam(depth, pixel_coords, intrinsics, is_homogeneous=True):

If it is true, you need to feed depth information of "src image" not "target image" into the projective_inverse_warp function, or you can use target image and its depth to build src image(figure 3 in the paper shows that the target images pixels will be projected into the src view but section 3.1 and equation1 contradict this subject).

How to interpolate ground-truth from sparse measurements? (about figure6 in the paper)

Hi,

I am wondering that how to interpolate ground-truth from sparse measurements as it says in the paper.
There is no explanation for that.
Could anyone teach me how? Thank you :)

pose question

Hi,
It's not an error.
I wanna to know why the image_seq's shape is looking like that.
For a 5-sequence case,
(128, 416, 3) => np.hstack => (128, 832, 3) => np.hstack => ... => (128, 2080, 3)
I thought that it would be (5, 128, 416, 3) for the image_seq!
pred = sfm.inference(image_seq[None, :, :, :], sess, mode='pose')
Is there any help?
THANKS

Evaluation training on higher resolutions

How did you obtain the depth predictions on Make3D? Can the network be evaluated/trained on higher resolution inputs or do you need to scale the inputs accordingly?

networks weight decay

Hello, I have been trying to replicate your results on my own pytorch implementation, but had some trouble converging with your hyper parameters.

Especially, the weight decay you use seems very large to me : https://github.com/tinghuiz/SfMLearner/blob/master/nets.py#L27

Weight decay is usually around 5e-5 ~ 5e-4 and here it is 0.05 ! When using it, my two networks just go to zero very quickly.

As I am not very familiar with tf.slim, I have done some research, and I am not sure you actually apply weight regularization, since apparently you have to call slim.losses.get_total_loss()

This also corroborates the fact that trying to set l2 regularization to extreme values (like 50.0) doesn't change anything.

The good news here are if weight decay is indeed not applied to your network, you might have something interesting to work on if you want to improve even more your results !

Clément

How do you compute d1_all in eval_depth.py?

How do you compute d1_all in eval_depth.py? I don't find any computation of d1_all.

Run the eval_depth.py, but get error of files missing which do exist

Hi , I just encounter this situation.
I was trying to run evaluation code, but it keeps saying that the file is missing.
To be clarified, all my data are in JPG format, so I altered the "test_files_eigen.txt" to match my data.

Files do exist and file paths are totally correct, but it says that my files are missing.

Here are the error messages:

$python kitti_eval/eval_depth.py --kitti_dir=~/Downloads/datasets/KITTI/ --pred_file=kitti_eval/kitti_eigen_depth_predictions.npy
~/Downloads/datasets/KITTI/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000069.jpg missing
~/Downloads/datasets/KITTI/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000054.jpg missing
~/Downloads/datasets/KITTI/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000042.jpg missing
...

Could anyone help me to solve the problem?
Thanks

about batch normalization

hi, you said BN is used for all layers except for output layers in paper, but I cannot find BN in code. As you explained in Notes, remove BN helps training, could you explain it or give me some insights?

question about cityscapes dataset preparation

Hi, @tinghuiz ,

I'm trying to prepare cityscapes dataset. According to the download page , generally we'll download "leftImg8bit_trainvaltest.zip" which are not consecutive frames and used for segmentation tasks. I guess we should download a dataset with consecutive frames, could you reply the name of exact dataset zip file to prepare? e.g. leftImg8bit_trainextra.zip / leftImg8bit_demoVideo.zip / leftImg8bit_sequence_trainvaltest.zip / etc (which one ?)

Thanks!

Precondition Error when is_training is set to false

I noticed that when the depth test graph is being build, the is_training argument for disp_net is not set to False. Won't this negatively affect the test performance, as the batch normalization won't be configured properly?

When setting this argument to True, an exception is raised. (Related to batch norm)

FailedPreconditionError: Attempting to use uninitialized value depth_net/upcnv3/BatchNorm/moving_mean
	 [[Node: depth_net/upcnv3/BatchNorm/moving_mean/read = Identity[T=DT_FLOAT, _class=["loc:@depth_net/upcnv3/BatchNorm/moving_mean"], _device="/job:localhost/replica:0/task:0/gpu:0"](depth_net/upcnv3/BatchNorm/moving_mean)]]
	 [[Node: depth_prediction/truediv/_131 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_459_depth_prediction/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I get this when using the model that was provided in the "download_model.sh" script

Reasonable Training Loss

Hi, I have three questions shown as follows:

How low the value of the loss can result in reasonable depth prediction?
I've tried to train on other dataset (ViDRILO), so I'm not quite sure if 100K iterations of training is enough.
Still wonder about where the log files are saved. Does the log file look like events.out.tfevents.1505446260.joy
tensorboard --logdir=/path/to/tensorflow/log/files --port=8888

when launching the https://localhost:8888, errors show up.
E0916 00:12:21.511262 Thread-1 _internal.py:87] 127.0.0.1 - - [16/Sep/2017 00:12:21] code 400, message Bad request syntax ("\x16\x03\x01\x00\xc0\x01\x00\x00\xbc\x03\x03\x87\xf8Z'#5\x05[\x0f/\x89\xaf\x80G\x07\x98\xad.\xcb\xeah\xdeh\x1br\x87>\xc5\x90+\xed\xd3\x00\x00\x1c\x8a\x8a\xc0+\xc0/\xc0,\xc00\xcc\xa9\xcc\xa8\xc0\x13\xc0\x14\x00\x9c\x00\x9d\x00/\x005\x00")

Where is the pose model saved?
After training, I just got one model (depth model).

About connect the 5-frames snippets into a complete sequence

I want to test the whole sequence pose estimate, but the pose net output the 5-frames snippets. By observing the 5-frames snippets(format: timestamp tx ty tz qx qy qz qw), the world coordinate system in each snippets is different ,how to change it to the same world coordinate system,such as reference frame is the NO.0 frame.And how to reduce the error while project to the same coordinate system.

How to separate model params?

When trainning, depthnet and posenet params saved together, but the pretrained models you released are separated, how can I separate them when trainning?
And, if I have pretrained depthnet, if I want to used the pretrained depthnet params, How can i do it?

1.problem about the process of generating IMG frames 2.some errors during the training data

when executing python data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=171 --num_threads=4 The result is 5 frames,such as:
,while executing on other pc(gpu:1080ti),it's 3 frames:

2.Follow the steps from python train.py --dataset_dir=/path/to/the/formatted/data/ --checkpoint_dir=/where/to/store/checkpoints/ --img_width=416 --img_height=128 --batch_size=4the result is :

so i would to know how to solve this problems ? looking forward to your reply ,thank you very much !

Running Error

Hi, @tinghuiz ,

Thanks for releasing this package.
However, when I run through the following cell, I got this error

saver = tf.train.Saver([var for var in tf.trainable_variables()]) 
with tf.Session() as sess:
    saver.restore(sess, ckpt_file)
    pred = sfm.inference(I[None,:,:,:], sess, mode=mode)

My system is ubuntu 14.04, tensorflow 1.0 and python 3.5.

Any suggestion to fix this issue?

THX!

How to evaluate?

Hi @tinghuiz,

Thank you for sharing with us the training code. How should I evaluate based on the output? I observed that the depth output is resized to [0,1]. How can I restore the absolute depth value for evaluation?

Thank you

The way to save the disparity into image

Hi,
I follow the demo code and try to save the depth image by using test_kitti_depth.py
Images I saved are not what I expected.......
The original input image:

The predicted depth:

The resized predicted depth:

those predictions I saved don't look like a proper depth map. Is there anything that I misunderstood?
Here is my code:

pred = sfm.inference(inputs, sess, mode='depth')
pred_disp = normalize_depth_for_display(pred['depth'][0,:,:,0])
cv2.imwrite('data/kitti/raw_disp.png', pred_disp)
fh_disp = open('data/kitti/raw_disp.png', 'r')
raw_disp = pil.open(fh_disp)
img_disp = raw_disp.resize((1242, 375), pil.ANTIALIAS)
img_disp.save('data/kitti/disp_resize.png')

Thanks

script for generating groundtruth poses on kitti odom

Hi, tinghui

Thanks for your generous sharing of the code! While I encountered some problems during generating groundtruth poses on kitti_odom dataset with a different sequence length setting. I take the official groundtruth poses as a projection matrix, that is in a [R, t] form. While my generated poses differ from yours. Is there any problem in my understanding of the official pose format, or would you like to share your script for generating groundtruth poses?

how to make myself pose trajectory file

now I have complete the training process, and want to output the pose trajectory according to the training model. In your page, I need to run python test_kitti_pose.py --test_seq [sequence_id] --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file /path/to/pre-trained/model/file/ , so where is KITTI/odometry/set/, you mean I should make own test set? how to make the test set ?

Single view depth has limitation of generalizing

I think that single view depth has more problems generalizing to previously unseen types of images. For example, when I rotate the input image 180 degrees, the depth output is totally wrong. The input of two images is necessary to get robust results.

How do you convert the output .npy file to images?

How do you convert the output .npy file to images? Can you show your code?

python error when training the KITTI data

Pretrained model and evaluation

Hi @tinghuiz , thanks for releasing the code. Seems that you did not provide the full pipeline (training+testing+evaluation), for now you just released the test results in .npy files, but not the testing code.

I wonder if the provided pretrained model is the one you used in the paper, I want to use it to test the images in eigen's test split and evaluate it. Also, can you provide the pretrained model for pose estimation? I want to see if the numbers are consistent with the paper.

Thank you so much!

How can I generate 5-frame snippet from full trajectory data?

Hello,

Can you share the exact way to get 5-frame snippet data from full trajectory to evaluate the method? (quarternion computation and/or quaternion to rotation matrix ..)

Generating the actual depth values of the kitti dataset

hi,
I tested your evaluation code it work fine with the prediction downloaded.
But im trying to find a prediction for a specific image from the kitti data set.
in which format do you save the prediction of the disparity.
im more interested in the actual depth values of a single image.

Can you share the code how did you get static_frames.txt?

Can you share the code how did you get static_frames.txt?
Thanks.

Parameter updating by novel view synthesis

Hi @tinghuiz,

The method _interpolate in file utils.py samples pixels in the source image with tf.gather, given the indices computed from intrinsics matrices and depth. The gradient of tf.gather is not computed w.r.t. the indices, so backpropagation does not reach the tensor of the computed depth (or, similarly, the pose). So,

How does the network update the depth (pose) parameters? Am I missing something in the analysis?

Cheers.

Question about seq_length

I am trying to train and test the SfMLeaner model.
But I found that seq_length's values in two files(train.py and test_kitti_pose.py) are different.
seq_length is 3 in train.py and 5 in test_kitti_pose.py

I set the value of seq_length to same number(3 or 5) and trained and tested, but error was raised.

How to fix it?

About cityscapes datesets

In the link about Cityscapes, I didn't find a data set called leftImg8bit_sequence_trainvaltest.zip. Is it a change of name or other reason?

Question about pose estimation network

I am wondering how you can use so many downsampling convolutions in your pose network though the input image is 128 pixels in height. The resulting feature map would have a height below 0 with all the convolutions or am I wrong? Are you using zero padding or something for the convolutions? It's not clear for me to see.

Pose evaluation code and multi GPU training

First of all, thanks a lot for releasing the code, the paper is amazing!

I just wanted to ask, when are you planning on releasing the evaluation code and whether you're planning on adding multi-GPU training as well.

Cheers,
Maciej

Question about the depth prediction result

Hi tinghuiz,
I tried your code recently. It worked cool! But after completing all the steps you stated on the website, I found a problem here.

I believe I strictly followed your steps, and trained the model for 200K steps (the default value in your code). The evaluated depth accuracy I got was as follows:
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.2331, 3.7935, 7.4710, 0.3076, 0.0000, 0.7006, 0.8807, 0.9453

I also tried to disable the pose net and fit the ground truth pose to the model, and got the results shown below.
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.1889, 2.1794, 6.6112, 0.2708, 0.0000, 0.7413, 0.9053, 0.9597

It seems still worse than the result you post on the page. Is there any further fine-tuning should I make to reach the results rather than simply running the code directly?

Thanks!

How to find the define for the funtion projective_inverse_warp ?

hello, The research is great work in monocular depth and camera motion estimation!
I read most of your code in SfMlearner.py, but I can not find the function projective_inverse_warp at SfMlearner.py line-67#. could you tell me how to find it? I think it's the key to understand the formulation-2 in your paper!

test_kitti_pose.py: error: batch_unpack_image_sequence

When I run the test_kitti_pose.py, I got the error for missing function batch_unpack_image_sequence

Traceback (most recent call last):
  File "test_kitti_pose.py", line 97, in <module>
    main()
  File "test_kitti_pose.py", line 62, in main
    FLAGS.seq_length)
  File "/media/cssp/Data/JoyYang/SfMLearner/SfMLearner.py", line 306, in setup_inference
    self.build_pose_test_graph()
  File "/media/cssp/Data/JoyYang/SfMLearner/SfMLearner.py", line 273, in build_pose_test_graph
    self.batch_unpack_image_sequence(
AttributeError: 'SfMLearner' object has no attribute 'batch_unpack_image_sequence'

But the batch_unpack_image_sequence is defined at data_loader.py for sure.
Could you tell me how to fix it?

Trained your network locally, but my eval. result is not as good as yours

Hi,

Thanks for sharing your wonderful works.
I followed your readme file, and it seems that everything goes well until when I see different evaluation result compared with yours (Ours(K) in table 1 of your paper) when I do evaluate based on locally trained weight. During training, I used the same parameters as suggested on this webpage.

The followings metrics which I got:
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.2621, 3.6171, 8.2036, 0.3806, 0.0000, 0.6577, 0.8520, 0.9258

I checked and reviewed procedure, but I cannot find any hint for the above different evaluation result.
Let me know what do you think about it.

Regards,
CJ

Why are small batches of size 4 used?

That seems like a small batch size. Why are size 4 batches used during training?

Will the train code be released?

Thanks for sharing this excellent work, is it easy to train the model on my custom video dataset?

Traceback on demo.ipypb

First cell
`

UnicodeDecodeError Traceback (most recent call last)
in ()
12 ckpt_file = 'models/model-190532'
13 fh = open('misc/sample.png', 'r')
---> 14 I = pil.open(fh)
15 I = I.resize((img_width, img_height), pil.ANTIALIAS)
16 I = np.array(I)

~/anaconda3/envs/Tensorflow/lib/python3.6/site-packages/PIL/Image.py in open(fp, mode)
2484 exclusive_fp = True
2485
-> 2486 prefix = fp.read(16)
2487
2488 preinit()

~/anaconda3/envs/Tensorflow/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte`

How to fine-tune the pre-trained models?

I would like to fine-tune your pre-trained models with indoor datasets.
How to load both pose models and depth models checkpoints at training stages?

How to train on dataset for which we don't have calibration information

Hi,

I am trying to implement this architecture by passing video dataset taken from youtube. I have been able to create the dataset having three sequence of frames together. I don't have the calibration matrix for this videos, so how best can i train using the following architecture.

How to derive the camera calibration matrix directly knowing the camera properties.

I am thinking of trying to run this architecture by taking video from my mobile but getting blocked in the camera calibration matrix part.

Any help would be greatly appreciated.

New error values in under README.md/Notes

Are the updated results in the Notes section of README.md evaluated with depths capped at 80m, or with depths capped at 50m?

	pred_disp, depth_net_endpoints = disp_net(tgt_image,
	is_training=True)

	curr_proj_image = projective_inverse_warp(
	curr_src_image_stack[:,:,:,3i:3(i+1)],
	tf.squeeze(pred_depth[s], axis=3),
	pred_poses[:,i,:],
	intrinsics[:,s,:,:])

tinghuiz / sfmlearner Goto Github PK

sfmlearner's People

Contributors

Stargazers

Watchers

Forkers

sfmlearner's Issues

sfm = SfMLearner(batch_size=1, img_height=img_height, img_width=img_width) sfm.setup_inference_graph(mode=mode)

First cell `

Recommend Projects

Recommend Topics

Recommend Org

sfm = SfMLearner(batch_size=1,
img_height=img_height,
img_width=img_width)
sfm.setup_inference_graph(mode=mode)

First cell
`