Giter VIP home page Giter VIP logo

monodetr's Introduction

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

Official implementation of the paper 'MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection'.

For our multi-view version, MonoDETR-MV on nuScenes dataset, please refer to MonoDETR-MV.

Introduction

MonoDETR is the first DETR-based model for monocular 3D detection without additional depth supervision, anchors or NMS, which achieves leading performance on KITTI val and test set. We enable the vanilla transformer in DETR to be depth-aware and enforce the whole detection process guided by depth. In this way, each object estimates its 3D attributes adaptively from the depth-informative regions on the image, not limited by center-around features.

Main Results

The randomness of training for monocular detection would cause the variance of ±1 AP3D. For reproducibility, we provide four training logs of MonoDETR on KITTI val set for the car category: (the stable version is still under tuned)

We have relased the ckpts of our implementation for reproducibility. The module names might have some mismatch, which will be rectified in a few days.

Models Val, AP3D|R40 Logs Ckpts
Easy Mod. Hard
MonoDETR 28.84% 20.61% 16.38% log ckpt
26.66% 20.14% 16.88% log ckpt
29.53% 20.13% 16.57% log ckpt
27.11% 20.08% 16.18% log ckpt

MonoDETR on test set from official KITTI benckmark for the car category:

Models Test, AP3D|R40
Easy Mod. Hard
MonoDETR 24.52% 16.26% 13.93%
25.00% 16.47% 13.58%

Installation

  1. Clone this project and create a conda environment:

    git clone https://github.com/ZrrSkywalker/MonoDETR.git
    cd MonoDETR
    
    conda create -n monodetr python=3.8
    conda activate monodetr
    
  2. Install pytorch and torchvision matching your CUDA version:

    conda install pytorch torchvision cudatoolkit
    
  3. Install requirements and compile the deformable attention:

    pip install -r requirements.txt
    
    cd lib/models/monodetr/ops/
    bash make.sh
    
    cd ../../../..
    
  4. Make dictionary for saving training losses:

    mkdir logs
    
  5. Download KITTI datasets and prepare the directory structure as:

    │MonoDETR/
    ├──...
    ├──data/KITTIDataset/
    │   ├──ImageSets/
    │   ├──training/
    │   ├──testing/
    ├──...
    

    You can also change the data path at "dataset/root_dir" in configs/monodetr.yaml.

Get Started

Train

You can modify the settings of models and training in configs/monodetr.yaml and appoint the GPU in train.sh:

bash train.sh configs/monodetr.yaml > logs/monodetr.log

Test

The best checkpoint will be evaluated as default. You can change it at "tester/checkpoint" in configs/monodetr.yaml:

bash test.sh configs/monodetr.yaml

Acknowlegment

This repo benefits from the excellent Deformable-DETR and MonoDLE.

Citation

@article{zhang2022monodetr,
  title={MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection},
  author={Zhang, Renrui and Qiu, Han and Wang, Tai and Xu, Xuanzhuo and Guo, Ziyu and Qiao, Yu and Gao, Peng and Li, Hongsheng},
  journal={arXiv preprint arXiv:2203.13310},
  year={2022}
}

Contact

If you have any question about this project, please feel free to contact [email protected].

monodetr's People

Contributors

zrrskywalker avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.