princeton-vl / deepv2d Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
Hi, I hold a question that since the kitti groundtruth is utilized during training, why the final output is not reside in same scale as the kitti groundtruth?(In evaluation, a median scaling is applied to depth prediciton)
Hi,
I have a question about retrieving the pose data.
As referenced below, after the pose is converted from quaternion to matrix, it follows by an inverse operation. Why is this inverse operation necessary?
DeepV2D/deepv2d/data_stream/nyuv2.py
Lines 112 to 113 in eb362f2
Thanks
Hello,
I have read your paper ! Thanks for uploading the code.
However, I would like to ask if your method can be trained end-2-end.
As I understand, the Depth module will build a cost volume around the key frame and then use 3D CNN network to predict the depth of that keyframe. In the Motion module, images and depths are required as the input to predict the relative poses.
If you have N = 5 input images, does it mean that you have to run your Depth module N times to get all N depth maps as input to the Motion module.
Hi @zachteed @heilaw @anewell @jiadeng ,
Thank you for your work! I have a question about the ckpt for demo and evaluation.
When we train the model we can get two checkpoints for stage_1 and stage_2, but I notice we only need to load one ckpt file for evaluation and demo. How can we get this final ckpt file and could you please explain more about the relationship between this final ckpt and two-stage ckpts got from training.
Thank you so much!
module 'tensorflow' has no attribute 'custom_gradient'
failed to run optimizer arithmeticoptimizer, stage removestackstridedslicesameaxis node
python demos/demo_slam.py --dataset=scannet --n_keyframes=3
Here's my conda enviroments.yml
name: py37-deepv2d
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _tflow_select=2.1.0=gpu
- absl-py=0.7.1=py37_0
- astor=0.7.1=py37_0
- blas=1.0=mkl
- c-ares=1.15.0=h7b6447c_1
- ca-certificates=2019.5.15=0
- certifi=2019.3.9=py37_0
- cudatoolkit=10.0.130=0
- cudnn=7.6.0=cuda10.0_0
- cupti=10.0.130=0
- gast=0.2.2=py37_0
- grpcio=1.16.1=py37hf8bcb03_1
- h5py=2.9.0=py37h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- intel-openmp=2019.4=243
- keras-applications=1.0.8=py_0
- keras-preprocessing=1.1.0=py_1
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libprotobuf=3.8.0=hd408876_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- markdown=3.1.1=py37_0
- mkl=2019.4=243
- mkl_fft=1.0.12=py37ha843d7b_0
- mkl_random=1.0.2=py37hd81dba3_0
- mock=3.0.5=py37_0
- ncurses=6.1=he6710b0_1
- numpy=1.16.4=py37h7e9f1db_0
- numpy-base=1.16.4=py37hde5b4d6_0
- openssl=1.1.1c=h7b6447c_1
- pip=19.1.1=py37_0
- protobuf=3.8.0=py37he6710b0_0
- python=3.7.3=h0371630_0
- readline=7.0=h7b6447c_5
- scipy=1.2.1=py37h7c811a0_0
- setuptools=41.0.1=py37_0
- six=1.12.0=py37_0
- sqlite=3.28.0=h7b6447c_0
- tensorboard=1.13.1=py37hf484d3e_0
- tensorflow=1.13.1=gpu_py37hc158e3b_0
- tensorflow-base=1.13.1=gpu_py37h8d69cac_0
- tensorflow-estimator=1.13.0=py_0
- tensorflow-gpu=1.13.1=h0d30ee6_0
- termcolor=1.1.0=py37_1
- tk=8.6.8=hbc83047_0
- werkzeug=0.15.4=py_0
- wheel=0.33.4=py37_0
- xz=5.2.4=h14c3975_4
- zlib=1.2.11=h7b6447c_3
- pip:
- attrs==19.1.0
- backcall==0.1.0
- bleach==3.1.0
- cycler==0.10.0
- decorator==4.4.0
- defusedxml==0.6.0
- easydict==1.9
- entrypoints==0.3
- google-pasta==0.1.8
- ipykernel==5.1.1
- ipython==7.5.0
- ipython-genutils==0.2.0
- jedi==0.13.3
- jinja2==2.10.1
- jsonschema==3.0.1
- jupyter-client==5.2.4
- jupyter-core==4.4.0
- jupyterlab==0.35.6
- jupyterlab-server==0.2.0
- kiwisolver==1.1.0
- markupsafe==1.1.1
- matplotlib==3.1.0
- mistune==0.8.4
- nbconvert==5.5.0
- nbformat==4.4.0
- notebook==5.7.8
- opencv-python==3.4.5.20
- pandas==0.24.2
- pandocfilters==1.4.2
- parso==0.4.0
- pexpect==4.7.0
- pickleshare==0.7.5
- prometheus-client==0.7.0
- prompt-toolkit==2.0.9
- ptyprocess==0.6.0
- pygments==2.4.2
- pyparsing==2.4.0
- pyrsistent==0.15.2
- python-dateutil==2.8.0
- pytz==2019.1
- pyyaml==5.3
- pyzmq==18.0.1
- seaborn==0.9.0
- send2trash==1.5.0
- terminado==0.8.2
- testpath==0.4.2
- toposort==1.5
- tornado==6.0.2
- tqdm==4.43.0
- traitlets==4.3.2
- vtk==8.1.2
- wcwidth==0.1.7
- webencodings==0.5.1
- wrapt==1.12.0
prefix: /home/yoyee/miniconda3/envs/py37-deepv2d
Hi, thanks for you work. The tfrecord is too big to download. Could you share a compressed file of pose information?
After installing the requirements, entering "python demos/kitti_demo.py --cfg cfgs/kitti.yaml --sequence demo_videos/kitti_demos/032/" yields ...
Traceback (most recent call last):
File "demos/kitti_demo.py", line 67, in <module>
main(args)
File "demos/kitti_demo.py", line 44, in main
depths = net.forward(data_blob)
File "lib/deepv2d.py", line 113, in forward
output = self.sess.run(self.outputs, feed_dict=feed_dict)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3,47,271] = [3, 47, 272] does not index into param shape [4,48,272,64]
[[node motion/GatherNd_1 (defined at lib/utils/bilinear_sampler.py:50) ]]
Caused by op 'motion/GatherNd_1', defined at:
File "demos/kitti_demo.py", line 67, in <module>
main(args)
File "demos/kitti_demo.py", line 38, in main
net = DeepV2D(INPUT_DIMS, cfg)
File "lib/deepv2d.py", line 38, in __init__
poses_pred = motion.forward(images[:, 1:], image_star, depth, intrinsics)
File "lib/networks/motion.py", line 84, in forward
G = self.flowse3(feat1, feat2, depth1, intrinsics/SC, G=G, reuse=i>0)
File "lib/networks/motion.py", line 108, in flowse3
featw = bilinear_sampler.bilinear_sampler(feat2, coords)
File "lib/utils/bilinear_sampler.py", line 88, in bilinear_sampler
output = bilinear_sampler_general(imgs, coords)
File "lib/utils/bilinear_sampler.py", line 50, in bilinear_sampler_general
img01 = tf.gather_nd(imgs, coords01)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3647, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/cgebbe/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): indices[3,47,271] = [3, 47, 272] does not index into param shape [4,48,272,64]
[[node motion/GatherNd_1 (defined at lib/utils/bilinear_sampler.py:50) ]]
Hi I was trying to run the demo
python demos/demo_v2d.py --model=models/scannet.ckpt --sequence=data/demos/scannet_0
But got the following error
2020-02-27 14:07:27.062479: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2020-02-27 14:07:27.062517: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
[[{{node motion/PnP/einsum_1/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape
_1)]]
[[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "demos/demo_v2d.py", line 82, in <module>
main(args)
File "demos/demo_v2d.py", line 64, in main
depths, poses = deepv2d(images, intrinsics, viz=True, iters=args.n_iters)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 462, in __call__
self.update_poses(i)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 368, in update_poses
self.poses, self.intrinsics, self.weights = self.sess.run(outputs, feed_dict=feed_dict)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
[[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job
:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
[[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'motion/PnP/einsum_1/MatMul', defined at:
File "demos/demo_v2d.py", line 82, in <module>
main(args)
File "demos/demo_v2d.py", line 55, in main
deepv2d = DeepV2D(cfg, args.model, use_fcrn=args.fcrn, is_calibrated=is_calibrated, mode=args.mode)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 68, in __init__
self._build_motion_graph()
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 129, in _build_motion_graph
images, depths, intrinsics, edge_inds, init=do_init)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/modules/motion.py", line 282, in forward
Tij = Tij.keyframe_optim(target, weight, depths, intrinsics)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/geometry/transformation.py", line 364, in keyframe_optim
J = einsum('...ij,...jk->...ik', jproj, jtran)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py", line 49, in einsum
out = tf.einsum(equation, *inputs)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 257, in einsum
axes_to_sum)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 389, in _einsum_reduction
product = math_ops.matmul(t0, t1)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
[[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
[[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
My environment setup is python 3.6.7, tensorflow-gpu 1.12.0
Seems that the problem is the batch size is too big. I have success when I only use 4 images. Can you help?
Hi, thank you for your great work!
https://github.com/princeton-vl/DeepV2D/blob/master/deepv2d/modules/motion.py#L318
It looks this line needs intrinsics_pred =
or intrinsics =
.
Do I misunderstand something?
Hi, thanks for sharing the good work.
However, I'm curious about the scale here in evaluation.
From my understanding, deepv2d is supervised, and should require no scaling in depth or pose evaluation.
However, in your evaluation script, all depth and pose are rescaled, why do we need that?
Another problem is about the scaling factor when calculating the trans(cm)
DeepV2D/evaluation/eval_utils.py
Line 57 in a3fbef1
Shouldn't it be np.dot(t1,t1)/np.dot(t1*t2)
?
Hi,
I'd like to test on a video sequence in TUM. I'm wondering how you test on a video sequence.
If the poses are unknown, how do you compute the poses? What's the batch size do you use to optimize the poses? Which images do you sample to compute the pose for a certain frame.
If the poses are known, do you only update the depth for one iteration?
Xuan
I believe your results might be even slightly better if you use the default validation method:
The paper states that you directly use the 192x1088 output image of the CNN for evaluation. In contrast, other papers first resize the inferred image to the RGB size, crop it and then evaluate it, see https://github.com/nianticlabs/monodepth2/blob/master/evaluate_depth.py#L187
You can do the same if you first pad the output image with 108 pixels to undo the previous cropping and then perform the resizing and cropping. In that case I get an absRelErr=0.0640. I believe the improvement is due to the fact that I see some artifacts at the top which are simply cropped away with this method.
Note however, that I have skipped some of the 697 images from the Eigen split, if one of the four neighboring images was not available. How have you dealt with these cases? It is not mentioned at all in the paper.
Hi Zachary,
I am trying to check the depth performance of deepV2D with various frames. If I change the config of KITTI to Frames:2, it turns out that the network parameter of the motion predictor is mismatched. Do we have to re-train the network under the setting of two frames here?
Best
Hi @heilaw @anewell @jiadeng @zachteed
Thanks for your work, I notice that if the video is uncalibrated with unknown focal, you offer the demos/demo_uncalibrated.py
, which can estimates the focal length during inference. So I wonder do you estimate the distortion parameters k1, k2, p1, p2 as well? Since I want to run on uncalibrated video with heavy distortion.
Thanks.
when I run the demo with gpu,there is something wrong:
Caused by op 'stereo/MatMul', defined at:
File "demos/demo_v2d.py", line 81, in
main(args)
File "demos/demo_v2d.py", line 55, in main
deepv2d = DeepV2D(cfg, args.model, use_fcrn=args.fcrn, is_calibrated=is_calibrated, mode=args.mode)
File "Deepv2d/deepv2d.py", line 73, in init
self._build_depth_graph()
File "Deepv2d/deepv2d.py", line 164, in _build_depth_graph
depths = self.depth_net.forward(Ts, images, intrinsics, adj_list)
File "Deepv2d/modules/depth.py", line 187, in forward
spred = self.stereo_network_avg(poses, images, intrinsics, idx)
File "Deepv2d/modules/depth.py", line 116, in stereo_network_avg
volume = operators.backproject_avg(Ts, depths, intrinsics, fmaps, adj_list)
File "Deepv2d/special_ops/operators.py", line 55, in backproject_avg
Tii = Ts.gather(ii) * Ts.gather(ii).inv() # this is just a set of id trans.
File "Deepv2d/geometry/transformation.py", line 146, in inv
Ginv = se3_matrix_inverse(self.matrix())
File "Deepv2d/geometry/se3.py", line 203, in se3_matrix_inverse
t = -tf.matmul(R, t)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/duanzm/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[7,3,3], b.shape=[7,3,1], m=3, n=1, k=3, batch_size=7
[[node stereo/MatMul (defined at Deepv2d/geometry/se3.py:203) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](stereo/transpose, stereo/strided_slice_1)]]
[[{{node Sum/_2107}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3915_Sum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
my cuda version is 9.0,what should i do?
I tried the demo
python3 demos/demo_uncalibrated.py --video=data/demos/golf.mov
but it crashed:
tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key stereo/BatchNorm/moving_stddev not found in checkpoint
Traceback (most recent call last):
File "/Users/l0stpenguin/Library/Python/3.7/lib/python/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/Users/l0stpenguin/Library/Python/3.7/lib/python/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/Users/l0stpenguin/Library/Python/3.7/lib/python/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key stereo/BatchNorm/moving_stddev not found in checkpoint
[[{{node save/RestoreV2}}]]
I do not have gpu machine. Is it possible to run it without gpu?
In paper I found a sentence
"In the backward pass, the gradients can be found by solving another linear system." in appendix under the title LS-OPTIMIZATION LAYER.
1.)Which is that linear system ?
2.)How did you get equation (16) in appendix?
can anyone please help me
Line 203 in eb362f2
Thanks for your great work, please help me figure out why here self.total_loss is set to depth_loss.
Hello. I notice you scale both depth and pose estimation for evaluation. It's reasonable to scale pose same as previous works but it's unfair to scale depth_pred too since the ground truth depth is used in the loss function. Yours is a supervised depth estimation method, why you also scale the estimated depth?
Hi,
Thanks for sharing this great work.
I'm wondering: where does the nyu_train.tfrecords file (https://github.com/princeton-vl/DeepV2D#nyuv2-1) come from?
It seems there are 13776 examples, each with 9 RGB images, 1 depth image and smaller things like intrinsics and poses.
It's about 138GB but NYU Depth V2 is more like 400GB, which surprises me (even though encoding is not the same). Maybe this file was built using NYU Depth V1, which is 90GB? Is this file the one used in the experiments reported in the paper?
The pretrained model on "https://www.dropbox.com/s/1cerfm260gqu9bt/models.zip" in the script "./data/download_models.sh" is not reachable. Is there another way to get the pretrained model, such as the google drive? Tks.
Hi, thank you for your nice work.
I'm wondering how you get the results from the paper.
I ran the code by
python demos/demo_slam.py --dataset=tum
Extract the poses from slam.poses
.
Then, I use evo_rpe for evaluation.
But the metrics from evo_rpe is
{"title": "RPE w.r.t. translation part (m)\nfor delta = 1 (frames) using consecutive pairs\n(with Sim(3) Umeyama alignment)", "ref_name": "DeepV2D/data/slam/tum/rgbd_dataset_freiburg1_room/groundtruth.txt", "est_name": "DeepV2D/results/tum/poses.tum", "label": "RPE (m)"}
The aligned trajectory also doesn't look right.
May I ask if there's some conversion I missed?
Thank you.
Hi,
Thanks for the great work. I see that from the code in demo_slam.py it is taking the video sequence from the nyu/kitti/scannet dataset, is there a way to use demo_slam with my custom video of n indoor scene recorded using smart phone camera?r
Hi, I run the demo as README. I install the vtk and I am sure that 'vtkOpenGLKitPython.so' is in /usr/local/lib/python3.5/dist-packages/vtk/ folder.
but still got the error as mentioned above.
Hi, thanks to your code.Although I view the code, I still don't understand the meaning of view pooling.In '3D Matching Network with view concatenation', you build cost volume for each image pairs, then you stack all of it and refine it with 3dcnn(_hourglass_3d) and output the probablity of depth.For me, I don't know where is the pooling work for different cost volume, it seems that you stack all the cost volume and output the depth map.
Hi, I'm trying to implement a single-view NYU evaluation. I noticed that you use 8 frames in your code and estimate the depth map of the first keyframe. I tried to reduce the number of frames to 1, the predicted depth maps are all nan values. I also tried to concatenate two same frames to predict, the resulted depth is not well.
How to implement a single-view demo correctly?
I train nyu at night and find that it is killed the next day morning. I repeat the training and find the memory keep increasing. My total memory is 32G. I use NVIDIA tensorflow r1.15 https://github.com/NVIDIA/tensorflow/tree/r1.15.
Caused by op 'motion/PnP_1/Cholesky', defined at:
File "demos/demo_uncalibrated.py", line 152, in <module>
main(args)
File "demos/demo_uncalibrated.py", line 90, in main
use_fcrn=True, is_calibrated=False, use_regressor=False)
File "deepv2d/deepv2d.py", line 68, in __init__
self._build_motion_graph()
File "deepv2d/deepv2d.py", line 129, in _build_motion_graph
images, depths, intrinsics, edge_inds, init=do_init)
File "deepv2d/modules/motion.py", line 287, in forward
(jj,ii), num_fixed=num_fixed, include_intrinsics=(not self.is_calibrated))
File "deepv2d/geometry/transformation.py", line 527, in global_optim
delta_update = cholesky_solve(H, b)
File "deepv2d/geometry/cholesky.py", line 32, in solve
x = cholesky_solve(H, b)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
return _graph_mode_decorator(f, *args, **kwargs)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 132, in
result, grad_fn = f(*args)
File "deepv2d/geometry/cholesky.py", line 9, in cholesky_solve
chol = tf.linalg.cholesky(H)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_linalg_ops.py", line 709, in
"Cholesky", input=input, name=name)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in
op_def=op_def)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in
return func(*args, **kwargs)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in
op_def=op_def)
File "/mnt/lustre/xiehaozhe/Applications/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Cholesky decomposition was not successful. The input might not be valid.
[[node motion/PnP_1/Cholesky (defined at deepv2d/geometry/cholesky.py:9) = Cholesky[T=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/device:CPU:0"](motion/PnP_1/Cast_3)]]
[[{{node motion/PnP_2/Cast_5/_2999}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_5542_motion/PnP_2/Cast_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Hi,
Thanks for uploading the code for this research paper.
I am successfully able to run the demo code for nyu, however the output is a single depth image and same goes for demo_uncalibrated script where the entire video is provided as input.
Shouldn't the output be multiple depth maps for different video frames or something similar as written in the paper?
Thank you for your great work and codes, this work is amazing.
Tensorflow is difficult for using and developing, any idea to release a pytorch version?
Line 273 in eb362f2
Hi, thx for sharing the code.
However, I have one problem about the dataloader for NYUv2.
DeepV2D/deepv2d/data_stream/nyuv2.py
Lines 183 to 184 in a3fbef1
I have already downloaded the NYUv2 raw dataset, and want to generate the tfrecords on my own.
However, it seems that the association file and gt pose file is not provided in the official dataset.
Did you generate it from another approach?
Hi, thanks for the great work. I wonder if you can provide a demo code to perform tracking (camera pose estimation) and mapping (depth estimation) simultaneously.
In all kitti demo sequences there is a file called "intrinsics.txt" with four numbers. What do they mean, why are they necessary and where do you get the values from?
If I followed the code correctly, they refer to fx, fy, cx, cy (see camera.py, line50) and you need them to reproject 2d points to 3d. This makes sense to me. But how do you get those values? In my understanding, Kitti raw provides the camera intrinsic matrix in the files "calib_cam_to_cam.txt" in lines starting with "K_0". It also provides the projection matrix directly in lines starting with "P_0". But the values specified there significantly differ from the values in the "intrinsics.txt" file. In particular, fx is approximately equal to fy in "calib_cam_to_cam.txt", whereas in the "intrinsics.txt" file, the first and second value differ by ~10% ?!
Hi @zachteed @heilaw @anewell @jiadeng
I notice that you scale both the depth map and the translation in pose matrix with scaling ratio 0.1 when training the KITTI dataset.
However, in the data streaming script for NYU and SCANNET, I only find the scaling for depth map with scaling ratio 1/5000 and 1/1000. Could you please explain why we don't need to scale the translation for NYU and SCANNET?
Thank you so much!
Hello Teed
I'm new to video to depth area, thanks for your excellent work.
I'm using your codes to predict the dpeth maps from "golf.mov ", however, I found you only predict 8 depth maps from a single video, I tried to remove this constraint but the out of memory error happened.
How to predict dense depth maps from a single video with your project? I'm looking forward to your reply, thank you very much!
Hi,
in the paper you train on KITTI, NYU and ScanNet for your best results (Scannet gets the most iterations during stage 1). However, there are only training scipts for KITTI and NYU present here.
What is the reason for this? Could this be additionally provided?
Another question: Afterwards, you report that stage 2 is trained for another 120k iterations. On what benchmark is this? Is this on the individual benchmark, where you report your results or do you train on several as in stage 1?
Best regards!
Hi, when I use kitti.ckpt
and set --mode=global
, the following error arises:
Traceback (most recent call last):
File "demos/demo_v2d.py", line 84, in <module>
main(args)
File "demos/demo_v2d.py", line 66, in main
depths, poses = deepv2d(images, intrinsics, viz=True, iters=args.n_iters)
File "deepv2d/deepv2d.py", line 467, in __call__
self.update_poses(i)
File "deepv2d/deepv2d.py", line 368, in update_poses
self.poses, self.intrinsics, self.weights = self.sess.run(outputs, feed_dict=feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1149, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 192, 1088) for Tensor 'Placeholder_1:0', which has shape '(5, 192, 1088)'
The nyu.ckpt
is normal in both global and keyframe mode.
What is the problem?
Many thanks.
Hi, I'm trying to run the demo with both kitti and nyu but i'm getting the following error:
Backprojection Op not available: Using python implementation
Traceback (most recent call last):
File "demos/nyu_demo.py", line 68, in
main(args)
File "demos/nyu_demo.py", line 41, in main
net = DeepV2D(INPUT_DIMS, cfg)
File "lib/deepv2d.py", line 38, in init
poses_pred = motion.forward(images[:, 1:], image_star, depth, intrinsics)
File "lib/networks/motion.py", line 84, in forward
G = self.flowse3(feat1, feat2, depth1, intrinsics/SC, G=G, reuse=i>0)
File "lib/networks/motion.py", line 107, in flowse3
coords = camera.camera_transform_project(G, depth, intrinsics)
File "lib/camera.py", line 87, in camera_transform_project
X = point_cloud_from_depth(depth, intrinsics)
File "lib/camera.py", line 72, in point_cloud_from_depth
X = iproj(pix, depth, kv)
File "lib/camera.py", line 54, in iproj
fx, fy, cx, cy = tf.split(kv, [1, 1, 1, 1], axis=-1)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1226, in split
name=name)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3289, in _split_v
num_split=num_split, name=name)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2508, in create_op
set_shapes_for_outputs(ret)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1873, in set_shapes_for_outputs
shapes = shape_func(op)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1823, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/data/work/depth_estimation/DeepV2D/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Dimension size, given by scalar input 2, must be non-negative but is -1 for 'motion/split' (op: 'SplitV') with input shapes: [7,4], [4], [] and with computed input tensors: input[2] = <-1>.
Hello. Thanks for the great work.
I am trying to run your evaluation code on ScanNet. I think I am using a newer version of ScanNet where some intrinsics file are placed elsewhere. So I got FileNotFound error here:
https://github.com/princeton-vl/DeepV2D/blob/master/deepv2d/data_stream/scannet.py#L111
Can you let me know what intrinsics (with respect to what image size) should be put here so that I can assign them manually?
Also, I am confused why only the depth intrinsics were used:
https://github.com/princeton-vl/DeepV2D/blob/master/deepv2d/data_stream/scannet.py#L143
while I guess the network will need color intrinsics instead.
Hi! I try to reconstruct image at Frame T using image at Frame T+1. However, the visualization seems odd.
Here is how I do the reconstruction:
Set D, RgbT, RgbT+, PoseT, PoseT+ as predicted depth(unscaled), input Rgb at frame T, input Rgb at fram T+1, Pose predicted at T, Pose predicted at T+1.
Then:
1. pts3d = backproject(Depth)
2. pts3d_at_frameT+1 = PoseT+ * inv(PoseT)
3. pts2d_at_frameT+1 = project(pts3d_at_frameT+1)
4. grid sample
However, below is a visualized reconstruction at 2011_10_03_drive_0027_0000000799.png. First row is original input, second row is reconstructed rgb, third row is flow visualizion:
I notice an obvious lack of scale in the reconstruction, it is general for other sequences. The pose I used come from Depth prediciton process(the pose results from eval_kitti scipt.). Ideally, the left corner car's position should not move since it is static.
Hi, thanks for your work. I encountered a problem that,
2020-09-06 16:43:02.192510: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
Is it a problem about cuda version? My environment is TF1.12, CUDA 9.2
Hi, thanks for your excellent work. If I make the nyud tfrecord myself, should I preprocess the depth first(using the official matlab tool) to align the depth ?
Hi,
Thank you for sharing the code.
I am not able to understand the significance of multiplying the translation vector by 0.1(args['scale']) constant in kitti.py file to update variables trajectory[i][0:3, 3].
Can you explain to me why you multiplied the translation vector by 0.1(args['scale'])?
for i in range(len(trajectory)):
trajectory[i] = np.dot(imu2cam, util.inv_SE3(trajectory[i]))
trajectory[i][0:3, 3] *= self.args['scale']
Thanks for sharing the wonderful work.
I have a question for the usage of the scenes in the ScanNet dataset.
While ScanNet itself provides train/val/test splits, it seems like this paper utilized specific scenes as below.
DeepV2D/data/scannet/scannet_test.txt
Line 1 in eb362f2
I want to double-check whether I correctly understand the author's intentions.
Hi, thanks for your nice work!
I wonder why the intrinsic parameters are divided by 4 at here
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.