Giter VIP home page Giter VIP logo

brnet's Introduction

BRNet: Exploring Comprehensive Features for Monocular Depth Estimation

This is the reference PyTorch implementation for training and testing depth estimation models using the method described in

This paper has been accepted by ECCV2022

@inproceedings{han2022brnet,
  title={BRNet: Exploring Comprehensive Features for Monocular Depth Estimation},
  author={Han, Wencheng and Yin, Junbo and Jin, Xiaogang and Dai, Xiangdong and Shen, Jianbing},
  booktitle={European Conference on Computer Vision},
  pages={586--602},
  year={2022},
  organization={Springer}
}

example input output gif

This code is for non-commercial use; please see the license file for terms.

⚙️ Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=0.4.1 torchvision=0.2.1 -c pytorch
pip install tensorboardX==1.4
conda install opencv=3.3.1   # just needed for evaluation

We ran our experiments with PyTorch 0.4.1, CUDA 9.1, Python 3.6.6 and Ubuntu 18.04. We have also successfully trained models with PyTorch 1.0, and our code is compatible with Python 2.7. You may have issues installing OpenCV version 3.3.1 if you use Python 3.7, we recommend to create a virtual environment with Python 3.6.6 conda create -n brnet python=3.6.6 anaconda .

🖼️ Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path assets/test_image.jpg --model_name mono+stereo_640x192

or, if you are using a stereo-trained model, you can estimate metric depth with

python test_simple.py --image_path assets/test_image.jpg --model_name mono+stereo_640x192 --pred_metric_depth

💾 KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Warning: it weighs about 175GB, so make sure you have enough space to unzip too!

Note: Different from our baseline, we do not convert png into jpg format, because we found that with our architecture, the information dropped by the conversion method would obviously influence the performance of models.

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Custom dataset

You can train on a custom monocular or stereo dataset by writing a new dataloader class which inherits from MonoDataset – see the KITTIDataset class in datasets/kitti_dataset.py for an example.

⏳ Training

By default models and tensorboard event files are saved to ~/tmp/<model_name>. This can be changed with the --log_dir flag.

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set – see paper for details.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

GPUs

The code can only be run on a single GPU. You can specify which GPU to use with the CUDA_VISIBLE_DEVICES environment variable:

CUDA_VISIBLE_DEVICES=2 python train.py --model_name mono_model

All our experiments were performed on a single NVIDIA Titan Xp.

Training modality Approximate GPU memory Approximate training time
Mono 9GB 12 hours
Stereo 6GB 8 hours
Mono + Stereo 11GB 15 hours

💽 Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

🔧 Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

📊 KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the epoch 19 weights of a model named mono_model:

python evaluate_depth.py --load_weights_folder ~/tmp/mono_model/models/weights_19/ --eval_mono

For stereo models, you must use the --eval_stereo flag (see note below):

python evaluate_depth.py --load_weights_folder ~/tmp/stereo_model/models/weights_19/ --eval_stereo

If you train your own model with our code you are likely to see slight differences to the publication results due to randomization in the weights initialization and data loading.

An additional parameter --eval_split can be set. The three different values possible for eval_split are explained here:

--eval_split Test set size For models trained with... Description
eigen 697 --split eigen_zhou (default) or --split eigen_full The standard Eigen test files
eigen_benchmark 652 --split eigen_zhou (default) or --split eigen_full Evaluate with the improved ground truth from the new KITTI depth benchmark
benchmark 500 --split benchmark The new KITTI depth benchmark test files.

Because no ground truth is available for the new KITTI depth benchmark, no scores will be reported when --eval_split benchmark is set. Instead, a set of .png images will be saved to disk ready for upload to the evaluation server.

External disparities evaluation

Finally you can also use evaluate_depth.py to evaluate raw disparities (or inverse depth) from other methods by using the --ext_disp_to_eval flag:

python evaluate_depth.py --ext_disp_to_eval ~/other_method_disp.npy

📷📷 Note on stereo evaluation

Our stereo models are trained with an effective baseline of 0.1 units, while the actual KITTI stereo rig has a baseline of 0.54m. This means a scaling of 5.4 must be applied for evaluation. In addition, for models trained with stereo supervision we disable median scaling. Setting the --eval_stereo flag when evaluating will automatically disable median scaling and scale predicted depths by 5.4.

⤴️⤵️ Odometry evaluation

We include code for evaluating poses predicted by models trained with --split odom --dataset kitti_odom --data_path /path/to/kitti/odometry/dataset.

For this evaluation, the KITTI odometry dataset (color, 65GB) and ground truth poses zip files must be downloaded. As above, we assume that the pngs have been converted to jpgs.

If this data has been unzipped to folder kitti_odom, a model can be evaluated with:

python evaluate_pose.py --eval_split odom_9 --load_weights_folder ./odom_split.M/models/weights_29 --data_path kitti_odom/
python evaluate_pose.py --eval_split odom_10 --load_weights_folder ./odom_split.M/models/weights_29 --data_path kitti_odom/

👩‍⚖️ License

Copyright © Niantic, Inc. 2019. Patent Pending. All rights reserved. Please see the license file for terms.

brnet's People

Contributors

wencheng256 avatar

Stargazers

 avatar mtZ avatar PSYZ1666 avatar  avatar HanShan1 avatar  avatar  avatar  avatar  avatar  avatar Mingkang Xiong avatar Xiaogang Jin avatar

Watchers

James Cloos avatar  avatar

brnet's Issues

Need Requirements Details

Could you please provide more environment details, because I saw that you used nn.MultiheadAttention in resnet_encoder.py. But there is no nn.MultiheadAttention in torch.nn in pytorch older than v1.1. It seems that you just copy the ⚙️ Setup part of monodepth2. 😂

RuntimeError: The size of tensor a (639) must match the size of tensor b (319) at non-singleton dimension 3

Hello, thank you for your excellent work. The code always displays tensor mismatch when running monocular training. Can you help me to provide some solutions? Thank you very much.
The error message is as follows:
Traceback (most recent call last):
File "train.py", line 18, in
trainer.train()
File "/home/yzhang/BRNet/trainer.py", line 189, in train
self.run_epoch()
File "/home/yzhang/BRNet/trainer.py", line 205, in run_epoch
outputs, losses = self.process_batch(inputs)
File "/home/yzhang/BRNet/trainer.py", line 260, in process_batch
losses = self.compute_losses(inputs, outputs)
File "/home/yzhang/BRNet/trainer.py", line 492, in compute_losses
smooth_loss = get_smooth_loss(norm_disp, color)
File "/home/yzhang/BRNet/layers.py", line 214, in get_smooth_loss
grad_disp_x *= torch.exp(-grad_img_x)
RuntimeError: The size of tensor a (639) must match the size of tensor b (319) at non-singleton dimension 3

Need some help on test_simple please

Thanks for your great work. Now I want to test the code and run test_simple, but I got the following error.
-> Loading model from models/mono+stereo_640x192
Loading pretrained encoder
Traceback (most recent call last):
File "test_simple.py", line 171, in
test_simple(args)
File "test_simple.py", line 88, in test_simple
encoder.load_state_dict(filtered_dict_enc)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetEncoder:
Missing key(s) in state_dict: "mha_scale", "global_encoder.layer1.0.conv1.weight", "global_encoder.layer1.0.bn1.weight", "global_encoder.layer1.0.bn1.bias", "global_encoder.layer1.0.bn1.running_mean", "global_encoder.layer1.0.bn1.running_var", "global_encoder.layer1.0.conv2.weight", "global_encoder.layer1.0.bn2.weight", "global_encoder.layer1.0.bn2.bias", "global_encoder.layer1.0.bn2.running_mean", "global_encoder.layer1.0.bn2.running_var", "global_encoder.layer1.1.conv1.weight", "global_encoder.layer1.1.bn1.weight", "global_encoder.layer1.1.bn1.bias", "global_encoder.layer1.1.bn1.running_mean", "global_encoder.layer1.1.bn1.running_var", "global_encoder.layer1.1.conv2.weight", "global_encoder.layer1.1.bn2.weight", "global_encoder.layer1.1.bn2.bias", "global_encoder.layer1.1.bn2.running_mean", "global_encoder.layer1.1.bn2.running_var", "global_encoder.layer2.0.conv1.weight", "global_encoder.layer2.0.bn1.weight", "global_encoder.layer2.0.bn1.bias", "global_encoder.layer2.0.bn1.running_mean", "global_encoder.layer2.0.bn1.running_var", "global_encoder.layer2.0.conv2.weight", "global_encoder.layer2.0.bn2.weight", "global_encoder.layer2.0.bn2.bias", "global_encoder.layer2.0.bn2.running_mean", "global_encoder.layer2.0.bn2.running_var", "global_encoder.layer2.1.conv1.weight", "global_encoder.layer2.1.bn1.weight", "global_encoder.layer2.1.bn1.bias", "global_encoder.layer2.1.bn1.running_mean", "global_encoder.layer2.1.bn1.running_var", "global_encoder.layer2.1.conv2.weight", "global_encoder.layer2.1.bn2.weight", "global_encoder.layer2.1.bn2.bias", "global_encoder.layer2.1.bn2.running_mean", "global_encoder.layer2.1.bn2.running_var", "mha_weight.0.weight", "mha_weight.0.bias", "multihead_attn.in_proj_weight", "multihead_attn.in_proj_bias", "multihead_attn.out_proj.weight", "multihead_attn.out_proj.bias", "pos_edb.row_embed.weight", "pos_edb.col_embed.weight".

What should I do to make it work? Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.