nicholasli1995 / egonet Goto Github PK

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

Home Page: https://arxiv.org/abs/2011.08464

License: MIT License

Python 76.23% C++ 23.77%

3d-pose-estimation autonomous-driving orientation-estimation kitti-3d 3d-object-detection cvpr2021 pose-estimation vehicles robot-perception representation-learning

egonet's Introduction

EgoNet

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo includes an implementation that performs vehicle orientation estimation on the KITTI dataset from a single RGB image.

News:

(2022-??-??): v-1.1 will be released which include pre-trained models for other object classes (Pedestrian and Cyclist in KITTI).

(2021-08-16): v-1.0 is released. The training documentation is added.

(2021-06-21): v-0.9 (beta version) is released. The inference utility is here! For Q&A, go to discussions. If you believe there is a technical problem, submit to issues.

(2021-06-16): This repo is under final code cleaning and documentation preparation. Stay tuned and come back in a week!

Check our 5-min video (Youtube, 爱奇艺) for an introduction.

中文详解：哔哩哔哩

Run a demo with a one-line command!

Check instructions here.

Performance: AP^BEV@R₄₀ on KITTI val set for Car (monocular RGB)

The validation results in the paper was based on R₁₁, the results using R₄₀ are attached here.

Method	Reference	Easy	Moderate	Hard
M3D-RPN	ICCV 2019	20.85	15.62	11.88
MonoDIS	ICCV 2019	18.45	12.58	10.66
MonoPair	CVPR 2020	24.12	18.17	15.76
D4LCN	CVPR 2020	31.53	22.58	17.87
Kinematic3D	ECCV 2020	27.83	19.72	15.10
GrooMeD-NMS	CVPR 2021	27.38	19.75	15.92
MonoDLE	CVPR 2021	24.97	19.33	17.01
Ours (@R₁₁)	CVPR 2021	33.60	25.38	22.80
Ours (@R₄₀)	CVPR 2021	34.31	24.80	20.16

Performance: AOS@R₄₀ on KITTI test set for Car (RGB)

Method	Reference	Configuration	Easy	Moderate	Hard
M3D-RPN	ICCV 2019	Monocular	88.38	82.81	67.08
DSGN	CVPR 2020	Stereo	95.42	86.03	78.27
Disp-RCNN	CVPR 2020	Stereo	93.02	81.70	67.16
MonoPair	CVPR 2020	Monocular	91.65	86.11	76.45
D4LCN	CVPR 2020	Monocular	90.01	82.08	63.98
Kinematic3D	ECCV 2020	Monocular	58.33	45.50	34.81
MonoDLE	CVPR 2021	Monocular	93.46	90.23	80.11
Ours	CVPR 2021	Monocular	96.11	91.23	80.96

Inference/Deployment

Check instructions here to reproduce the above quantitative results.

Training

Check instructions here to train Ego-Net and learn how to prepare your own training dataset other than KITTI.

Citation

Please star this repository and cite the following paper in your publications if it helps your research:

@InProceedings{Li_2021_CVPR,
author    = {Li, Shichao and Yan, Zengqiang and Li, Hongyang and Cheng, Kwang-Ting},
title     = {Exploring intermediate representation for monocular vehicle pose estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month     = {June},
year      = {2021},
pages     = {1873-1883}
}

License

This repository can be used freely for non-commercial purposes. Contact me if you are interested in a commercial license.

Links

Link to the paper: Exploring intermediate representation for monocular vehicle pose estimation

Link to the presentation video: Youtube, 爱奇艺

Relevant ECCV 2020 work: GSNet

egonet's People

Contributors

Stargazers

Watchers

Forkers

mostafa-mansour jie311 hiyyg selvamarul swipswaps ladt ruanhailiang lqbdd mfkiwl ruhyadi zhangsongdmk superz678 ali-kazzazi sadjadasghari yanteng kooshyarkosari nviolante25 zhumingxu gabrielhendrix

egonet's Issues

Generate 3D rectangular coordinates using 2D rectangular boxes

Thank you for the work you have done, and open source the code.
I would like to ask how to use your code, will only 2D annotation box pictures with labels, generate 3D coordinate box used for detection tasks?
I have a problem now, Is it necessary to have this file? I tried to modify it and got an error. I want to know what it does
{{{ Download the resources folder and unzip its contents. Place the resource folder at ${EgoNet_DIR}/resources }}}
English is translated by machine, If there is any improper wording, please forgive me
Thanks you

Use EgoNet on custom data

Thank you for sharing your work.
I want to use EgoNet for 3D bbox and object orientation estimation tasks on custom data. How should I proceed, do I need any other model's 2D/3D bbox predictions to start with or if I change the input data while testing I can get both 3D bbox and orientation predictions?

Reproduce Result in Kitti dataset

Hello @Nicholasli1995 thank you for your implementation. I have several questions.

What split (train and validation split) did you use for this code? is it subcnn split?
during the inference using testing set do you use ROI from the object detection algorithm?
Is it still use the calibration or this system can automatically generate it?
How to produce the similar table as in paper? I see that you use AOS, AP, also something like Easy, Medium, Hard. However when I tried to use validation set evaluation and use kitti_eval_offline I only got some plots, and statistic informations in txt and final result as here.

python inference.py --cfg "../configs/try_KITTI_inference:test_submission.yml"

Wrote prediction file at ../result/gt_box_test/data/007025.txt
Warning: 007025.png not included in detected images!
PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
==> 1 page written on `car_detection.pdf'.
PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
==> 1 page written on `car_orientation.pdf'.
PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
==> 1 page written on `car_detection_ground.pdf'.
PDFCROP 1.38, 2012/11/02 - Copyright (c) 2002-2012 by Heiko Oberdiek.
==> 1 page written on `car_detection_3d.pdf'.
Thank you for participating in our evaluation!
Loading detections...
number of files for evaluation: 1634
  done.
save ../result/submission/plot/car_detection.txt
car_detection AP: 93.934082 84.951851 67.594185
save ../result/submission/plot/car_orientation.txt
car_orientation AP: 93.664360 84.683617 67.141930
save ../result/submission/plot/car_detection_ground.txt
car_detection_ground AP: 41.456226 31.880611 24.876181
save ../result/submission/plot/car_detection_3d.txt
car_detection_3d AP: 29.933027 24.017235 18.650723
Your evaluation results are available at:
../result/submission

It only produce AP no AOS. Is it because you use testing set then evaluate directly on KITTI evaluation systems so they will provide you the detail result? Suppose we want to measure in Validation set, which value must I use? because there are many information such as car_detection AP car_orientation AP car_detection_ground AP car_detection_3d AP. So in specific how to produce table 1 as in paper if we use validation set?

-Thank you-

Cannot use my predicted files to run the inference.py

I replace the cfg load_prediction_file in test_submission.yml to my prediction file using YOLOv8 and it's in the KITTI format. But when I run the CLI python inference.py --cfg "../configs/KITTI_inference:test_submission.yml" I receive a error like:
'''
Traceback (most recent call last):
File "inference.py", line 288, in
main()
File "inference.py", line 278, in main
generate_empty_file(output_dir, test_calib_dir)
File "inference.py", line 207, in generate_empty_file
detected = os.listdir(os.path.join(output_dir, 'data'))
FileNotFoundError: [Errno 2] No such file or directory: '../MyOutput/submission/data'
'''

My folder structure is the same with provided folder and the name of .txt file is the same too.
I don't know what causes this error, please give me some advice!

Visualazition problem

Thank you for share your huge work ! Can the output coordinates of the model be directly visualized, does any transformation is needed to display them in the original image?

如何使用该模型推断自己的图片

您好，感谢您代码的开源
本人代码能力较弱，想请教一个简单的问题：如何使用训练好的参数对本地图片进行推断？

Training Script

Thanks for your awesome work!
Could you provide some hints about how to training the EgoNet?
Thank you in advance.

Stream from camera

hello, thanks for you amazing work.

I would like to ask, what kind of changes do I need to apply in order to perform a stream from camera and detect a pose of object e.g pedestrians.
Do I need to re-train your model as well?

Thanks for answer in an advance

Reproduce results on the test split

Thanks for your great work.
However, I have some confusion in “Reproduce results on the test split”. If I understand correctly, the inputs to the model are 2D bounding boxes which located in “../resources/test_boxes” and the outputs should be 3D bounding boxes which placed in “../output/submission/data”. However, the results I got were not 3D bounding boxes, as follows:

The results do not contain 3D information. Is it because I made a mistake somewhere?

About the license for this model

Thank you for sharing your great code. 😺

What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.
https://github.com/PINTO0309/PINTO_model_zoo

Thank you.

Running the model on a single image

Hey Nicholasli1995,

Thanks for putting together this awesome repo, I really appreciate how thoroughly documented the setup is.

Would it be possible to get a few hints as to how to run the model on a single image as opposed to a directory?

I see that the data_loader in 'inference' is nearly directly from PyTorch, so it's probably my lack of experience with PyTorch that has me confused.

Separately, I also tried making my own directory and referencing it in my .yaml file, the following command still gave me an error about referencing the KITTI dataset

python .\inference.py --cfg "../configs/single_inference_test.yml" --visualize True --batch_to_show 1

I was a little bit surprised to see "training" in the file not found path as I'm not asking to train -- I would expect to have to place an analogous 'test.txt' in this directory with the single test image filename, but I don't think that's currently the issue.

Thanks!
Clayton

如何使用该模型推断自己的图片

您好，感谢您代码的开源
本人代码能力较弱，想请教一个简单的问题：如何使用训练好的参数对本地图片进行推断？

Inference on custom dataset does not give proper 3d bboxes

Hi! I am trying to use the EgoNet on a custom dataset (such as Waymo Dataset) and generate 3D bboxes from the 2D predictions generated by 2D bbox detectors such as Faster R-CNN.

A sample of KITTI predictions where 2D bboxes are output but 3D bboxes are not yet output are as follows:

Car -1 -1 0.0 158.0 386.0 220.0 438.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 0.0 0.95
Car -1 -1 0.0 565.0 386.0 720.0 438.0 0.0 0.0 0.0 -1.0 -1.0 -1.0 0.0 0.95

However, when I set these 2D predictions as additional input for predicting bboxes using EgoNet, the 3D bboxes are not properly output at all and are just a bunch of zeros and minus-ones.

What should I do?

Result Image

Hi,

May I know how to generate a visual result image such as,

All I got is,

No arg_max in KITTI_train_ICRs.yml file

Hello, firstly I would like to thanks for an amazing work and that you made it public with great documentation in order to reproduce and re train the network.

Today I wanted to train your network according to training - stage2. However, by running the code I got an error as following

$ python train_IGRs.py --cfg "../configs/KITTI_train_IGRs.yml"

=> init weights from normal distribution
=> loading pretrained model ../resources/start_point.pth

Total Parameters: 63,978,471
----------------------------------------------------------------------------------------------------------------------------------
Total Multiply Adds (For Convolution and Linear Layers only): 19.573845863342285 GFLOPs
----------------------------------------------------------------------------------------------------------------------------------
Number of Layers
Conv2d : 306 layers   BatchNorm2d : 304 layers   ReLU : 269 layers   Bottleneck : 4 layers   BasicBlock : 108 layers   Upsample : 28 layers   HighResolutionModule : 8 layers   Sigmoid : 1 layers   
Initializing KITTI train set, please wait...
Found prepared keypoints at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_train_['Car'].npy
Found prepared instance_ids at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_train_['Car']_ids.npy
Found prepared rotations at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_train_['Car']_rots.npy
Initialization finished for KITTI train set
Initializing KITTI valid set, please wait...
Found prepared keypoints at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_valid_['Car'].npy
Found prepared instance_ids at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_valid_['Car']_ids.npy
Found prepared rotations at ../kitti_dataset/training/keypoints/bbox9[0.332, 0.667]_valid_['Car']_rots.npy
Initialization finished for KITTI valid set
Traceback (most recent call last):
  File "train_IGRs.py", line 159, in <module>
    main()
  File "train_IGRs.py", line 154, in main
    train(model, model_settings, GPUs, cfgs, logger, final_output_dir)
  File "train_IGRs.py", line 89, in train
    trainer.train(train_dataset=train_dataset, 
  File "../libs/trainer/trainer.py", line 154, in train
    evaluator = Evaluator(cfgs['training_settings']['eval_metrics'], 
  File "../libs/metric/criterions.py", line 546, in __init__
    self.metrics.append(eval(metric + '(cfgs=cfgs, num_joints=num_joints)'))
  File "<string>", line 1, in <module>
  File "../libs/metric/criterions.py", line 183, in __init__
    self.arg_max = cfgs['testing_settings']['arg_max']
KeyError: 'arg_max

It seems that arg_max key is missing in the KITTI_train_IGRs.yml

How to arg_max parameter should be defined?

Thanks for answer in advance.

Relation between kpts_3d_pred and pose

Hello,
Thank you for open-sourcing this amazing project!

I have a question about the convention for the transformation of the 3D box. EgoNet only produces an egocentric pose (i.e. camera coordinates) corresponding to the rotation between the 3D box extracted from the keypoints and a template 3D box. We also have a translation corresponding to the first point in kpts_3D_pred, here.

To better understand the coordinate systems involved I'm doing the following experiment:

Create a template 3D bounding box following this, in the canonical pose.
Rotate it with the rotation matrix given by EgoNet, this one

After doing these two steps, I still need one translation to place the 3D box in space (in the camera system). The question is, what translation should I use? Is it the one corresponding to the first point in kpts_3d_pred?

Thank you for your time

关于“car_instance”中代码的请教

您好，很感谢您的开源代码！
在阅读“car_instance”中的“_prepare_2d_pose_annot”方法时，其中313行的“for image_name in self.keypoints.keys():” 中的 “self.keypoints”，我并没有在KITTI类中找到定义。请问这里的“self.keypoints”是什么呢？如果有定义是在哪里定义的呀？