Giter VIP home page Giter VIP logo

sfmlearner-pytorch's Introduction

SfMLearner Pytorch version

This codebase implements the system described in the paper:

Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe

In CVPR 2017 (Oral).

See the project webpage for more details.

Original Author : Tinghui Zhou ([email protected]) Pytorch implementation : Clément Pinard ([email protected])

sample_results

Preamble

This codebase was developed and tested with Pytorch 1.0.1, CUDA 10 and Ubuntu 16.04. Original code was developped in tensorflow, you can access it here

Prerequisite

pip3 install -r requirements.txt

or install manually the following packages :

pytorch >= 1.0.1
pebble
matplotlib
imageio
scipy
scikit-image
argparse
tensorboardX
blessings
progressbar2
path.py

Note

Because it uses latests pytorch features, it is not compatible with anterior versions of pytorch.

If you don't have an up to date pytorch, the tags can help you checkout the right commits corresponding to your pytorch version.

What has been done

  • Training has been tested on KITTI and CityScapes.
  • Dataset preparation has been largely improved, and now stores image sequences in folders, making sure that movement is each time big enough between each frame
  • That way, training is now significantly faster, running at ~0.14sec per step vs ~0.2s per steps initially (on a single GTX980Ti)
  • In addition you don't need to prepare data for a particular sequence length anymore as stacking is made on the fly.
  • You can still choose the former stacked frames dataset format.
  • Convergence is now almost as good as original paper with same hyper parameters
  • You can know compare with ground truth for your validation set. It is still possible to validate without, but you now can see that minimizing photometric error is not equivalent to optimizing depth map.

Differences with official Implementation

  • Smooth Loss is different from official repo. Instead of applying it to disparity, we apply it to depth. Original disparity smooth loss did not work well (don't know why !) and it did not even converge at all with weight values used (0.5).
  • loss is divided by 2.3 when downscaling instead of 2. This is the results of empiric experiments, so the optimal value is clearly not carefully determined.
  • As a consequence, with a smooth loss of 2.0̀, depth test is better, but Pose test is worse. To revert smooth loss back to original, you can change it here

Preparing training data

Preparation is roughly the same command as in the original code.

For KITTI, first download the dataset using this script provided on the official website, and then run the following command. The --with-depth option will save resized copies of groundtruth to help you setting hyper parameters. The --with-pose will dump the sequence pose in the same format as Odometry dataset (see pose evaluation)

python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti_raw' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 128 --num-threads 4 [--static-frames /path/to/static_frames.txt] [--with-depth] [--with-pose]

For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip, 2) camera_trainvaltest.zip. You will probably need to contact the administrators to be able to get it. Then run the following command

python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 171 --num-threads 4

Notice that for Cityscapes the img_height is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.

Training

Once the data are formatted following the above instructions, you should be able to train the model by running the following command

python3 train.py /path/to/the/formatted/data/ -b4 -m0.2 -s0.1 --epoch-size 3000 --sequence-length 3 --log-output [--with-gt]

You can then start a tensorboard session in this folder by

tensorboard --logdir=checkpoints/

and visualize the training progress by opening https://localhost:6006 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~30K iterations when training on KITTI.

Evaluation

Disparity map generation can be done with run_inference.py

python3 run_inference.py --pretrained /path/to/dispnet --dataset-dir /path/pictures/dir --output-dir /path/to/output/dir

Will run inference on all pictures inside dataset-dir and save a jpg of disparity (or depth) to output-dir for each one see script help (-h) for more options.

Disparity evaluation is avalaible

python3 test_disp.py --pretrained-dispnet /path/to/dispnet --pretrained-posenet /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list

Test file list is available in kitti eval folder. To get fair comparison with Original paper evaluation code, don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results.

Pose evaluation is also available on Odometry dataset. Be sure to download both color images and pose !

python3 test_pose.py /path/to/posenet --dataset-dir /path/to/KITIT_odometry --sequences [09]

ATE (Absolute Trajectory Error) is computed as long as RE for rotation (Rotation Error). RE between R1 and R2 is defined as the angle of R1*R2^-1 when converted to axis/angle. It corresponds to RE = arccos( (trace(R1 @ R2^-1) - 1) / 2). While ATE is often said to be enough to trajectory estimation, RE seems important here as sequences are only seq_length frames long.

Pretrained Nets

Avalaible here

Arguments used :

python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt

Depth Results

Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
0.181 1.341 6.236 0.262 0.733 0.901 0.964

Pose Results

5-frames snippets used

Seq. 09 Seq. 10
ATE 0.0179 (std. 0.0110) 0.0141 (std. 0.0115)
RE 0.0018 (std. 0.0009) 0.0018 (std. 0.0011)

Discussion

Here I try to link the issues that I think raised interesting questions about scale factor, pose inference, and training hyperparameters

  • Issue 48 : Why is target frame at the center of the sequence ?
  • Issue 39 : Getting pose vector without the scale factor uncertainty
  • Issue 46 : Is Interpolated groundtruth better than sparse groundtruth ?
  • Issue 45 : How come the inverse warp is absolute and pose and depth are only relative ?
  • Issue 32 : Discussion about validation set, and optimal batch size
  • Issue 25 : Why filter out static frames ?
  • Issue 24 : Filtering pixels out of the photometric loss
  • Issue 60 : Inverse warp is only one way !

Other Implementations

TensorFlow by tinghuiz (original code, and paper author)

sfmlearner-pytorch's People

Contributors

anuragranj avatar clementpinard avatar evanmays avatar innovarul avatar mbaradad avatar micat001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sfmlearner-pytorch's Issues

bugs

In Inverse_warp.py, the line 40,
cam_coords = (intrinsics_inv @ current_pixel_coords).reshape(b, 3, h, w)
SyntaxError: invalid syntax
what is @ used for? a decorator, or an operator? The paltform I used is python 2.7 anaconda

Can not reproduce pose results.

I am getting the following with the pretrained models, which are much worse than reported here. https://github.com/ClementPinard/SfmLearner-Pytorch#pose-results
It is possible that the pretrained models in google drive are not updated.

$ python test_pose.py ../SfmLearner-Pytorch/pretrained/exp_pose_model_best.pth.tar --dataset-dir kitti/odometry/ --sequence 09
getting test metadata for theses sequences : {Path('kitti/odometry/sequences/09')}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.50it/s]
1591 snippets to test
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 1587/1591 [02:35<00:00, 10.20it/s]

Results
	        ATE,         RE
mean 	     0.0195,     0.0041
std 	     0.0106,     0.0022
$ python test_pose.py ../SfmLearner-Pytorch/pretrained/exp_pose_model_best.pth.tar --dataset-dir kitti/odometry/ --sequence 10
getting test metadata for theses sequences : {Path('kitti/odometry/sequences/10')}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.54s/it]
1201 snippets to test
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌| 1197/1201 [03:11<00:00,  6.24it/s]

Results
	        ATE,         RE
mean 	     0.0148,     0.0042
std 	     0.0096,     0.0026

Question on the sequence of reference images

Hi Clement

Thanks for your prompt answer for my last question, but I have another one here which is not related with the code.

As mentioned in the paper and the code, the ref images are [t-n, t-n+1, .. ,t-1, t+1, ...t+n], but what if I use [t-n, .. t-1] only? What do you suppose the impact of the result? Thank you.

Cheers,
Rui

Inference quality is not good

The quality of depth generated during inference is much poorer compared to that seen on tensorboard (during training).

Here is an example inference output:
0000000000_disp

It seems inference is not working correctly.

Motion between consecutive frames

In the readme you state enough motion is guaranteed between following frames.
I wonder how that affects the performance when the vehicle is static.
Can you elaborate on that design decision?
is it convergence issues?

Thanks,
Guy

Multi scale training

The original SfMLearner uses a MultiScale approach for training (4 different scales/resolutions to train together). However this code is using a RandomScale apprach and seen in the set of augmentations:

input_transform = custom_transforms.Compose([
    custom_transforms.RandomHorizontalFlip(),
    custom_transforms.RandomScaleCrop(),
    custom_transforms.ArrayToTensor(),
    normalize
])

I feel that MultiScale approach will be better since Pose and depth may be dependent on scale.

This is not really an issue but a request to support MultiScale training instead of RandomScale.

difference in validation with sparse ground truth and filled ground truth of depth

Hi,
The depth predictions are validated with sparse ground truth depths of KITTI here, but there are also other papers validating against full ground truth (filled by interpolating). Will there be a large difference in the validation result between these two methods? Which one is the main criteria nowadays in monocular depth estimation?

Pre-trained model

Hi,

Is there a pre-trained PyTorch model trained on Cityscapes and Kitti available? The script models/download_model.sh and download_model_5frame.sh doesn't work.

Question regarding implementation of the variable out_of_bound

Consider the following line:
out_of_bound = 1 - (ref_img_warped == 0).prod(1, keepdim=True).type_as(ref_img_warped)
in the function loss_functions.py. If I am not wrong this is computed to not take loss at those pixels which are out of bound or possibly, disoccluded as well due to warping. My question here is, wouldn't pytorch calculate the gradient with respect to this variable out_of_bound as well and update it, although it shouldn't be updated?

This may not change the overall code and performance, but still a correct way to do it would be to specify that the variable out _of_bound should not get any gradients and not be updated as well.

prepare_train_data.py

Hi @ClementPinard

  • I come across some problem when running prepare_train_data.py as follows:
    python data/prepare_train_data.py C:\TensorFlow\SfmLearner-Pytorch\raw_data_downloader\2011_09_26\2011_09_26_drive_0001_sync --dataset-format "kitti" --dump-root C:\TensorFlow\SfmLearner-Pytorch\formatted_data --width 416 --height 128 --num-threads 4

  • And, it gives me the follwoing error message:
    Traceback (most recent call last): File "data/prepare_train_data.py", line 94, in <module> main() File "data/prepare_train_data.py", line 67, in main get_gt=args.with_gt) File "C:\TensorFlow\SfmLearner-Pytorch\data\kitti_raw_loader.py", line 33, in __init__ self.collect_train_folders() File "C:\TensorFlow\SfmLearner-Pytorch\data\kitti_raw_loader.py", line 51, in collect_train_folders drive_set = (self.dataset_dir/date).dirs() File "C:\Users\sheng\AppData\Local\Programs\Python\Python36\lib\site-packages\path.py", line 560, in dirs return [p for p in self.listdir(pattern) if p.isdir()] File "C:\Users\sheng\AppData\Local\Programs\Python\Python36\lib\site-packages\path.py", line 545, in listdir for child in os.listdir(self) FileNotFoundError: [WinError 3] The system cannot find the path specified: Path('C:\\TensorFlow\\SfmLearner-Pytorch\\raw_data_downloader\\2011_09_26\\2011_09_26_drive_0001_sync\\2011_09_26')

  • This is a screenshot of my command when running prepare_train_data and error described above:
    image

  • I tried both relative path and absolute path, and it gives me the same error. I also tried / and \, but it seems not working.

  • I work on a windows 10 machine and here is my directory structure:

SfmLearner-Pytorch
|-- data
|-- datasets
|-- formatted_data
|-- raw_data_downloader 
|     |-- 2011_09_26
|           |-- 2011_09_26_drive_0001_sync
|           |-- 2011_09_26_drive_0002_sync
|           |-- ..........................................................

Thank you in advance.

Possibly wrong calculation of ATE and RE?

When testing the pose prediction accuracy (in test_pose.py), the ATE and RE are divided by the snippet length.
Both predicted and GT poses have the Identity transformation as their first entry.
Therefore, the error on that first "pose" will be 0 which results in an overall snippet ATE and RE that is considerably lower than it should be.

test_disp.py fails with the recent commits

There were some changes to test_disp.py, and kitti_eval/depth_evaluation_utils.py in the recent commits, which results in the following error while testing: (The old version still works)

  File "test_disp.py", line 160, in <module>
    main()
  File "test_disp.py", line 61, in main
    framework = test_framework(dataset_dir, test_files, seq_length, args.min_depth, args.max_depth)
  File "kitti_eval/depth_evaluation_utils.py", line 14, in __init__
    self.calib_dirs, self.gt_files, self.img_files, self.displacements, self.cams = read_scene_data(self.root, test_files, seq_length, step)
  File "kitti_eval/depth_evaluation_utils.py", line 99, in read_scene_data
    displacements.append(get_displacements(data_root/date/scene/'oxts', ref_indices, demi_length))
  File "kitti_eval/depth_evaluation_utils.py", line 60, in get_displacements
    reordered_indices = [indices[tgt_index]] + [*indices[:tgt_index]] + [*indices[tgt_index + 1:]]
IndexError: index -1 is out of bounds for axis 0 with size 0

Visualizing world coordinate pose map

Hello,
I am a student who just very new to every contents written in this project (Python, PyTorch, CNN, movement estimation, etc.) and I want to learn basics by replicating your work :).

I could replicate most of your work but now I got stuck doing visualize the prediction poses and then compare to the KITTI ground truth poses data as similar as shown below:
default

I could draw the ground truth map with no problems. How your code as well as explained in the paper seem using 5-frame snippets for prediction, therefore the output final_poses has size [pics_number, frame_number, 3, 4] while KITTI ground truth poses data only has size [pics_number, 3, 4]. I don't know how to deal with this array to draw visual odometry map so could you please show me how to do it?

One more thing, sorry for my bad English but in file times.txt got from downloading the KITTI odometry data set which explained : "Timestamps for each of the synchronized image pairs in seconds", is it the time between frames?

Please correct me if I am wrong at any point.
Thank you very much!

Help to add a learning rate scheduler

Hi,

I tried to add a learning rate scheduler - file attached.
SfmLearner-Pytorch.zip

It can be used in this way:
python3 train.py ./data/kitti/kitti_rawdata_formatted/ -b 48 -m 1.0 -s 0.1 --sequence-length 3 --log-output --checkpoint_dir ./training/checkpoints/ --optimizer Adam --scheduler step --lr 2e-4

However, the loss doesn't come down if I use the scheduler. Could you please take a look and let me know if I am doing something wrong.

About Pose Results

Thanks for your cool work. This implementation is very useful.
I have a question about the accuracy of poses in Seq 09, 10.
./data/test_scenes.txt doesn't have 09, 10.
According to KITTI-odometry-development-kit,

Seq09 -> 2011_09_30_drive_0033 000000 001590       
Seq10 -> 2011_09_30_drive_0034 000000 001200

This means that this frameworks uses these 2 sequences in training or validation.
( Is this right ? )
How did you get the pose results on README ?
Did you use 09, 10 in training, or did you use another method ?
Also I generated kitti-odometry-data using this script, and then tried using your framework. (with --dataset-format stacked).
However, photometric-error in validation didn't decrease.
This is an example command.

python train.py /hoge/kitti_odom/ -b4 -m0.3 -s1.8 --epoch-size 1000 --sequence-length 5 --log-output  --rotation-mode euler --dataset-format stacked

The original paper uses 00-08 in training. I want to reproduce that score, but I cannot.
( Later, I will add Seq09,10 to test_scenes.txt and start training. )
Can you tell me your method ?
Thank you for reading.

inconsistent pose and depth result

I trained model with this commend:
python train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5
but my result was not as good as yours.

Abs rel | Sq rel | RMSE | RMSE log | A1 | A2 | A3 |
0.3794 | 3.8478 | 10.6573 | 0.5115 | 0.3702 | 0.6512 | 0.8179 |

pose
ate re
seq 09 0.0308 1.2759
seq 10 0.0261 1.3253

Can you tell me the possible reason? or did you train with gt?

About implementation of inverse warp

Hi, @ClementPinard , I am a little confused by your implementation of inverse warp function below:

if padding_mode == 'zeros':
        X_mask = ((X_norm > 1)+(X_norm < -1)).detach()
        X_norm[X_mask] = 2  # make sure that no point in warped image is a combinaison of im and gray
        Y_mask = ((Y_norm > 1)+(Y_norm < -1)).detach()
        Y_norm[Y_mask] = 2

for pixels out of range here, why do you set their value as 2?
It seems like these pixels will be padded by zeros, no matter their values are 2 or not, and what do you mean by your comment here? Could you please give some explanations? Thanks!

Inverse warp with scaled depth

Since the depth output of this model is scaled by a factor according to groud truth,why this code manages to inverse warp correctly?Does that mean we use a wrong depth map for warping?

Inconsistent results with test pretrained posenet model

I used the Odometry dataset and the pretrained model of PoseNet that you guys provided in the Goole drive to test on Sequence 09 and 10. However, I did not get the same results that you guys showed on the github. Do you guys update the model after finishing the read me page?

raw kitti loader's calibration reader might be wrong

The cam.txt generated by processing the kitti_raw_loader are always (0,0,0,0,0,0,0,0,1). This doesn't make sense. I think fx,fy should be 1 instead of zero. Is it because of lines 102-103 in the kitti_raw_loader.py that sets them to zero? When I run the code zoom_x and zoom_y are always 0.

P_rect[0] *= zoom_x
P_rect[1] *= zoom_y
        

Depth and Pose on Image with Black Borders

Hi! I already posted about this on the original tensorflow implementation, but i would also like to know your opinion on the matter :)

Do you think that the black borders on an image like the one bellow would affect the training and the predicitons of depth/pose?

exemplo

Visualizing pose

Hello,
I would like to know how to visualize the pose estimation outputs.
@ClementPinard Could you please explain the final_pose format and how to convert pose gt to kitti odometry pose?
Thanks

Does the size of batch-size affect the training results?

Hi,
I have run the train.py with the command blow on KITTI-raw-data :
python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt
Otherwise the batch-size=80, and the train(41664)/vaild(2452) split is different.
The result I get is:
disp:
Results with scale factor determined by GT/prediction ratio (like the original paper) :
`
abs_rel, sq_rel, rms, log_rms, a1, a2, a3
0.2058, 1.6333, 6.7410, 0.2895, 0.6762, 0.8853, 0.9532

pose:
Results 10
ATE, RE
mean 0.0223, 0.0053
std 0.0188, 0.0036

Results 09
ATE, RE
mean 0.0284, 0.0055
std 0.0241, 0.0035
`
You can see that there's still a quiet big margin with yours:
Abs Rel | Sq Rel | RMSE | RMSE(log) | Acc.1 | Acc.2 | Acc.3
0.181 | 1.341 | 6.236 | 0.262 | 0.733 | 0.901 | 0.964

I think there is no other factors causing this difference, otherwise the batch-size and data split. Therefore, does the size of batch-size affect the training results?

What's more, when I try to train my model with two Titan GPUs, batch-size=80*2=160, the memory usage of each GPU is:
GPU0: about 11G, GPU1: about 6G.
There is a huge memory usage difference between two GPUs, and it seriously impacts multi-gpu trianing.
And then I find the loss calculations are all placed on the first GPU, actually the memory is mainly used to calculate the 4 scales of depth photometric_reconstruction_loss, and we can just move some scales to the cuda:0, and others to cuda:1, it might be better I think.

Confusion about PoseNet architecture

Hi, thank you for this work!
I am confused wih the network architecture of PoseNet. In PoseNet, you concat target image and all the reference images (2 or 4 images) together (https://github.com/ClementPinard/SfmLearner-Pytorch/blob/master/models/PoseExpNet.py#L28) and predict relative pose for every of these reference images.
Actually we want to estimate relative pose between two frames. So in my opinion, taking every two frames as input in PoseNet may be more reasonable. Have you try these? Why design PoseNet in such a way?

stacked and sequence data loaders

Hi!

  1. Could you please explain the difference between SequenceFolder class from sequence_folders.py and from stacked_sequence_folders.py more precisely?

  2. Also I didn't understand how get data in directory in this format:

`
A sequence data loader where the images are arranged in this way:

    root/scene_1/0000000.jpg

    root/scene_1/0000000_cam.txt

    root/scene_1/0000001.jpg

    root/scene_1/0000001_cam.txt

    .
    root/scene_2/0000000.jpg

    root/scene_2/0000000_cam.txt

`

I think your script data/prepare_train_data.py doesn't provide any opportunity to configure KITTI dataset in this format.

Thanks!

Equal number of batches expected at THCTensorMathBlas.cu:471

Hi, I'm trying to train using a non-KITTI dataset and keep getting this error about equal number of batches (note that I have modified the code so the line numbers aren't very relevant):

Traceback (most recent call last):
File "train.py", line 568, in
main()
File "train.py", line 246, in main
train_loss = train(args, train_loader, disp_net, pose_exp_net, optimizer, args.epoch_size, logger, training_writer, ...)
File "train.py", line 336, in train
args.rotation_mode, args.padding_mode)
File "/.../loss_functions.py", line 44, in photometric_reconstruction_loss
loss += one_scale(d, mask)
File "/.../loss_functions.py", line 25, in one_scale
ref_img_warped = inverse_warp(ref_img, depth[:,0], current_pose, intrinsics_scaled, intrinsics_scaled_inv, rotation_mode, padding_mode)
File "/.../inverse_warp.py", line 183, in inverse_warp
cam_coords = pixel2cam(depth, intrinsics_inv) # [B,3,H,W]
File "/.../inverse_warp.py", line 39, in pixel2cam
cam_coords = intrinsics_inv.bmm(current_pixel_coords).view(b, 3, h, w)
RuntimeError: invalid argument 7: equal number of batches expected at /.../THC/generic/THCTensorMathBlas.cu:471

I'm not exactly sure what batch numbers have to be the same. Also, the problem seems to get fixed by changing the number of images in the training and validation datasets, which suggests that perhaps both the number of training and testing images have to be divisible by the batch size or something similar. Do you have an idea what's going on? Thanks!

Pytorch version

In the README.md, you mention:
pytorch 0.3
scipy
argparse
tensorboard-pytorch
tensorboard
blessings
progressbar2
path.py

Is there a mistake? Pytorch 0.2 seems to be latest version.

Still the problem of "@"

Hi Clément Pinard,
meet the invalid syntax problem because of the "@" in inverse_warp.py. Though there were already an issue about this symbol, but still confused how to change and fix it in the following sentence.
rotMat = xmat @ ymat @ zmat

BTW, I am using torch 0.4.1, python 2.7&3.6
Thanks for your answer!

Multi-GPU training

The training is not using multi-gpus. If you just add DataParallel(), it will use all the GPUs in th system - snippet from train.py below:

if args.pretrained_disp:
    print("=> using pre-trained weights for Dispnet")
    a = torch.load(args.pretrained_disp)
    disp_net.load_state_dict(a['state_dict'])
else:
    disp_net.init_weights()

disp_net = torch.nn.DataParallel(disp_net).cuda()
pose_exp_net = torch.nn.DataParallel(pose_exp_net).cuda()
        
cudnn.benchmark = True
print('=> setting adam solver')

parameters = set()
for net_ in [disp_net, pose_exp_net]:
    parameters |= set(net_.parameters())

When using multi-gpu training, you have to do some changes in inference, while loading he checkpoint:
disp_net = models.DispNetS().cuda()
weights = torch.load(args.ckpt_file)

#if the original model was created with DataParallel, remove it.
new_state_dict = OrderedDict()
for k, v in weights['state_dict'].items():
    # remove `module.`
    name = k[7:] if k.startswith('module.') else k
    new_state_dict[name] = v
weights['state_dict'] = new_state_dict

disp_net.load_state_dict(weights['state_dict'])

Reconstruction loss as NaN

When I went through the function to calculate photometric reconstruction loss I found this line of code
assert((reconstruction_loss == reconstruction_loss).data[0] == 1)
I figured out that this line is there to check if reconstruction loss is NaN. But I couldn't quite figure out why are we doing this. Under what circumstances could reconstruction loss reach NaN.

I'm trying to train only the pose network. I'm using the depth map obtained from Kinect camera in place of training the DispNet. I'm getting assertion error randomly in run time. I'm unable to figure out the cause. I wan to know why we are checking for NaNs in reconstruction loss. Also what are the possible causes for reconstruction loss to reach NaN?

LinAlgError during training

Hi,

Thanks for the PyTorch version. I met a LinAlgError when I'm trying to train on KITTI. Here's the detailed information.

Traceback (most recent call last):
File "train.py", line 390, in
main()
File "train.py", line 188, in main
train_loss = train(train_loader, disp_net, pose_exp_net, optimizer, args.epoch_size, logger, train_writer)
File "train.py", line 236, in train
for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv) in enumerate(train_loader):
File "/home/liyuchen/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 210, in next
return self._process_next_batch(batch)
File "/home/liyuchen/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 230, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
numpy.linalg.linalg.LinAlgError: Traceback (most recent call last):
File "/home/liyuchen/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "./datasets/sequence_folders.py", line 61, in getitem
return tgt_img, ref_imgs, intrinsics, np.linalg.inv(intrinsics)
File "/home/liyuchen/anaconda2/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 513, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/home/liyuchen/anaconda2/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix

Do you know why?
Thank you.

About the logging the warped image

Hi Clement,

Thank you very much for your implementation. I just got a question on this line:

ref_warped = inverse_warp(ref[:1], depth[:1,0], pose[:1,j],

Maybe I think it incorrectly, but I found in the process of logging the warped image, the code seems that applies the pose t+1 on the ref_img t-1? Could you please kindly help to explain this line a bit?

Thank you very much.

Best regards,
Rui

error throwing with multi threads in prepare_train_data.py

Hi,
I tried to tun prepare_train_data.py using the command you provided (num_thread=4), but got the following error message:

joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
r = call_item()
File "/usr/local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in call
return self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/Users/siwei/Desktop/SfmLearner-Pytorch-master/data/prepare_train_data.py", line 33, in dump_example
dump_dir = args.dump_root/scene_data['rel_path']
AttributeError: 'Namespace' object has no attribute 'dump_root'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/siwei/Desktop/SfmLearner-Pytorch-master/data/prepare_train_data.py", line 109, in
main()
File "/Users/siwei/Desktop/SfmLearner-Pytorch-master/data/prepare_train_data.py", line 86, in main
Parallel(n_jobs=args.num_threads)(delayed(dump_example)(scene) for scene in tqdm(data_loader.scenes))
File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 930, in call
self.retrieve()
File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
AttributeError: 'Namespace' object has no attribute 'dump_root'

Do you have any idea why? The error didn't happen when I used only one thread.

Blank output after training on KITTI

I used the following command to train on KITTI

python3 train.py ./data/kitti/kitti_rawdata_formatted/ -b4 -m0.2 -s0.1 --epoch-size 3000 --sequence-length 3 --log-output

However, I get a blank image when I run the inference (attached).
0000000000_disp

Even while training, the dispnet output and the depth outputs (seen on tensorboard) are blank images.

Please help.

inconsistent depth evaluation code with the original author

Hi, @ClementPinard , great pytorch implementation of sfmlearner!

But when I evaluate the same depth prediction results using your evaluation code test_disp.py and the original author's eval_depth.py, I got inconsistency results.

After some experiments, I found you clip the predicted depth value to [min_depth, max_depth] first, then compute the scale factor, as in loss_functions.py :

valid_pred = current_pred[valid].clamp(1e-3, 80)

valid_pred = valid_pred * torch.median(valid_gt)/torch.median(valid_pred)

However, the original author computed the scale factor first, then clip the predicted depth, as in eval_depth.py:

scalor = np.median(gt_depth[mask])/np.median(pred_depth[mask])
pred_depth[mask] *= scalor

pred_depth[pred_depth < args.min_depth] = args.min_depth
pred_depth[pred_depth > args.max_depth] = args.max_depth

This does produce different results. By simply modifying your original code as

valid_pred = current_pred[valid]
valid_pred = valid_pred * torch.median(valid_gt) / torch.median(valid_pred)
valid_pred = valid_pred.clamp(1e-3, 80)

we can get the same results when evaluating the same depth predictions as the original author. Also, this inconsistency exists in the test_disp.py.

Hope you can fix it, thanks.

the number of samples

Hi,
Thanks for your implementation in PyTorch, it helps me much, recently I learn the ego-motion with your code. The procession of data preparation is quiet rambling for me, and I'm quiet uncertain whether there are some errors I have made. Therefore, can you tell me the final number of samples in KITTI and Cityscapes?
For me, I got
KITTI:
43154 samples found in 64 train scenes
1090 samples found in 8 valid scenes

Error during test

I get the following error when trying to do test:

Traceback (most recent call last):
File "test_disp.py", line 144, in
main()
File "test_disp.py", line 50, in main
pose_net.load_state_dict(weights['state_dict'], strict=False)
TypeError: load_state_dict() got an unexpected keyword argument 'strict'

Please help.

what are those "@" symbol?

Hi, thanks for the wonderful code. I haven't started to play with it yet, just saw some @ symbols, for example,

rot_matrices @ inv_transform_matrices[:,:,-1:]

R = gt_pose[:,:3] @ np.linalg.inv(pred_pose[:,:3])

I don't really know what these operations (@) trying to do. Thank you if you could clarify.

interpolate() unexpected keyword argument

Got the following error message when using the latest version:

tgt_img_scaled = F.interpolate(tgt_img, (h, w), method='area', align_corners=False) TypeError: interpolate() got an unexpected keyword argument 'method'

(line 312)

parameters used: --epoch-size 1000 --log-output --with-gt

Changed that line to:

tgt_img_scaled = F.interpolate(tgt_img, (h, w), mode='area')

Seems to working now, did I break something? (great job on the port btw! :) )

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.