Giter VIP home page Giter VIP logo

abhi1kumar / deviant Goto Github PK

View Code? Open in Web Editor NEW
197.0 197.0 27.0 8.34 MB

[ECCV 2022] Official PyTorch Code of DEVIANT: Depth Equivariant Network for Monocular 3D Object Detection

Home Page: https://arxiv.org/abs/2207.10758

License: MIT License

CMake 0.01% Shell 0.80% C++ 64.72% Python 34.47%
3d-computer-vision 3d-object-detection autonomous-driving autonomous-vehicle depth-equivariance eccv eccv-2022 eccv2022 equivariance geometric-deep-learning kitti-3d monocular-3d-detection monocular-3d-localization nuscenes object-detection one-stage-detector projective-geometry projective-manifold waymo

deviant's People

Contributors

abhi1kumar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deviant's Issues

Testing on Rope3D Dataset

I converted the Rope3D dataset to KITTI format. I tried to test the model with the KITTI pre-training file you provided. But the result is not very satisfactory.
For the description of the data after I converted Rope3D:

  1. Original image resolution:1920×1080 , validation with resolution set to 960×512
  2. calib:
    P0: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00
    P1: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00
    P2: 2.173379882812e+03 0.000000000000e+00 9.618704833984e+02 0.000000000000e+00 0.000000000000e+00 2.322043945312e+03 5.883443603516e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
    P3: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00
    R0_rect: 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00
    Tr_velo_to_cam: 1.994594966642e-03 -9.998606204387e-01 1.657520384002e-02 -1.115697257486e-01 -2.372202408477e-01 -1.657520384002e-02 -9.713144706380e-01 6.538036584690e+00 9.714538501993e-01 -1.994594966642e-03 -2.372202408477e-01 1.596758475422e+00
    Tr_imu_to_velo: 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00
  3. Visualisation of validation results
    image

So I have the following thoughts

  1. For KITTI, the data is acquired with the camera of the acquisition vehicle parallel to the ground. In Rope3D, its camera is located on the roadside traffic light frame, and the camera is non-parallel to the ground. So the geometric projection a priori for traditional 3D object detection does not apply to Rope3D's similar roadside dataset does it?
  2. In addition to this, when I validated the private dataset, I only provided the P2 internal reference, but I don't know how to combine the rotation and translation matrices of the external reference (I think this must be a problem of my knowledge base), but I can't find the corresponding question and answer, so I am asking this question, and I hope that you can answer it.
  3. For the next step, I would like to use Rope3D for training. Thank you very much for your outstanding contribution and activity. Salute!!!

CUDA memory issue and multi-GPU training

Hi,
First of all, thanks for the code.
I installed everything by the steps you provided, and I'm trying to run only the deviant model using the following cmd taken from scripts_training.sh:

CUDA_VISIBLE_DEVICES=0 python -u tools/train_val.py --config=experiments/run_221.yaml

My problem is that I get an error for CUDA out of memory at the first epoch just after the logging of the weights.
I did manage to run the code on a very small partial dataset of the KITTI dataset (100 images).
Do you have any advice on how to approach this error?

Waymo dataset convert

thanks for the great work , I used the convert script you provided to convert to the kitti format, but there is no label in label_all folder, and then I tried to use the label.zip in #5 , and found that it cannot correspond to the converted image. Do you have any experience with this?

Waymo Dataset Training Epochs

I'm trying to train deviant on waymo, and I read 1051.yaml. Is it ok to train 30 epochs on waymo? Or is it just a pretrained model config, and I should train more epochs.

Run on raw live video

I downloaded the pre-trained weighs. I would like to use the kitti weights to run on my raw video/webcam and get the output with 3d box and bird's eye view. How can I do that? Also, how do I include the my extrinsic and intrinsic camera calibration parameters?

For submission on KITTI 3D OD benchmark

Hello. Thank you for the great research.

I have two questions.

  1. When submitting to the KITTI dataset benchmark, is it correct to use all 7,481 images in the training folder for training?

  2. If so, could you please explain how you validated for submission to the test?

Thank you.

evaluate result is too low

Hello, thanks for your great work~
I trained for 20 epoch on 2*GTX1080, but I find evaluate result is too low, pls help.

Here is the comparison between yours(left) and mine(right):
Screen Shot 2023-08-03 at 10 17 14

Here is my training log:
20230802_171210.txt

Nuscenes evalution

Thanks for your great work!

Do you happen to know methods for evaluating nuScenes using metrics other than MAE, such as mAP in your code?

There may be some errors in class

rect_to_img
In your project, I used my own training data to train the model and found that some targets could never be trained.
Therefore, I searched for the problem and found the location on the image.
pts_rect is
pts_rect
P2 is
P2
Obtain results
results
But my image size is 2560*1150, The projected 3D coordinates exceeded the boundary, causing the target in the image to be unable to enter training.
According to coordinate transformation rules
微信截图_20231226093442
I think this position should be

pts_img = (pts_2d_hom[:, 0:2].T / pts_2d_hom[:, 2]).T

rather than

pts_img = (pts_2d_hom[:, 0:2].T / pts_rect_hom[:, 2]).T

However, after changing this position, I still haven't achieved ideal results. May I ask if my thinking is correct?? If there is an error, please point it out. If it is correct, are there any other relevant positions that need to be changed? Why did I not achieve the desired result
Sincerely in need of help, thank you very much

Local vs Global Orientation

I'm very sorry, but I have come across another place that I haven't quite understood and would like to ask for advice
The kitti dataset is annotated with alpha, Why obtain alpha through ry conversion. And I found through testing that the converted alpha and the annotated alpha are not exactly the same. I really don't understand why everyone is doing this in engineering.
Hope to receive guidance, thank you very much
微信截图_20240112173437

Understanding hard-coded values

Thank you again for your great work. I am still failing to make it work on another custom dataset. I tried getting more images, now more than 10,000, but the model still seems to not make any prediction at all. I was looking at the code, and I noticed that there are certain assumptions for max depth in kitti_utils.py that weren't mentioned elsewhere:

  • L302 : np.linspace(2,78,wsize*hsize).reshape(hsize,wsize,1)],-1)).reshape(-1,3)

  • L333 : random_depth = np.linspace(2,78,wsize*hsize).reshape(hsize,wsize,1)

Is there any documentation of similar assumptions made to fit Kitti dataset?

nan loss after 5 epochs on custom dataset

Hi,
Thanks for sharing your work.
I was training on a custom dataset.
The losses after 6 epochs are nan. Tried reducing the learning rate but that didnt help either. Wondering if @abhi1kumar you encountered this issue while training.

INFO  ------ TRAIN EPOCH 006 ------
INFO  Learning Rate: 0.001250
INFO  Weights:  depth_:nan, heading_:nan, offset2d_:1.0000, offset3d_:nan, seg_:1.0000, size2d_:1.0000, size3d_:nan,
INFO  BATCH[0020/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO  BATCH[0040/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO  BATCH[0060/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO  BATCH[0080/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO  BATCH[0100/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO  BATCH[0120/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,

Before epoch 6 losses are reducing as expected.

Shape of tensor a does not match the shape of tensor b

I am trying to work with the model on a custom dataset. I made a config file which is very similar to run_221.yaml.
I changed the dataset type(created another one to fit my custom classes and dimensions) and the resolution. The resolution of my images is WxH = 1024x750. I believe that the downsampling factor is causing the error, which states that shape of tensor a (188, ) does not match shape of tensor b (187, ). Note that 750/4 = 187.5, right in the middle of the shape mismatch.

After several tries of playing with the resolutions, the one that worked and that was closest to my original ratio was 704, 512. I wanted to know whether there is some part of code which I could change, so that I maintain my original image size. Also generally I wanted to know if this causes problems for the model to learn. Also, do I need to change the sesn_scales?

Thanks in advance!

I'm new to 3D object detection . I meet some troubles, can you give me some advice? thanks!

INFO ------ EVAL EPOCH 020 ------
Evaluation Progress: 100%|████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:54<00:00, 2.75s/it]
2023-10-07 15:28:18,751 INFO ==> Saving results in output/config_run_201_a100_v0_1/result_20
Traceback (most recent call last):
File "/home/ys/DEVIANT/code/tools/train_val.py", line 158, in
main()
File "/home/ys/DEVIANT/code/tools/train_val.py", line 152, in main
trainer.train()
File "/home/ys/DEVIANT/code/lib/helpers/trainer_helper.py", line 87, in train
self.eval_one_epoch()
File "/home/ys/DEVIANT/code/lib/helpers/trainer_helper.py", line 207, in eval_one_epoch
use_logging= True, logger= self.logger)
File "/home/ys/DEVIANT/code/lib/helpers/rpn_util.py", line 254, in evaluate_kitti_results_verbose
results_obj.main = run_kitti_eval_script(eval_binary_path, results_data= stats_save_folder, gt_folder= gt_folder, lbls= lbls, use_40=True)
File "/home/ys/DEVIANT/code/lib/helpers/rpn_util.py", line 345, in run_kitti_eval_script
_ = subprocess.check_output([eval_binary_path, results_data, gt_folder], stderr=devnull)
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 488, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object': 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object'

Cannot find nuScenes blobs for converting to KITTI format in data_setup_README

First many thanks for your work. I am new on object detection and have spent many time on this data_setup part. Yet I cannot setup the data like your structure.
A light suggestion: Download the nuScenes and Waymo datasets. As my understanding, these 2 datasets can be downloaded to somewhere outside the project because we will use soft link to connect them with the DEVIANT project.

My confusion lays in this step:
Then follow the instructions at convert_nuscenes_to_kitti_format_and_evaluate.sh to get nusc_kitti_org folder.

I think I should download the datasets following yours and nuScene's github. But I found it hard to follow convert_nuscenes_to_kitti_format_and_evaluate.sh as I don't have v1.0-trainval#number_blobs_camera.tgz and v1.0-trainval01_blobs_lidar.tgz and many other directories in this .sh file.

So I am not able to generate nusc_kitti_org folder and continue.

P-matrix in KITTI

P matrix in calib in KITTI, what exactly does it mean?
According to my search, it is the product of the Intrinsic and Extrinsic parameters of the camera, where the Extrinsic parameters are the rotation matrix and translational vector.
Here's a simple matrix I got using checkerboard correction in matlab. He seems to have problems at [0,3] and [1,3]. I use this matrix to replace P2 in calib,but the inference results in null. It seems worse than before when you suggested that I just use something like containing only the Intrinsic parameters without the Extrinsic parameters. It is possible to get inference results using only the Intrinsic parameters, except that the 3D box do not match exactly
I'm sorry to bother you again, but this question has been bothering me for a long time and I can't find any useful information!
image

How can I visualize the 3D boxes?

Hi all!
I managed to run the experiments on kitti, no issues.
But I would like to run the model and visualize the output of a single image. Is there a script to do that?
thank you!

Waymo Dataset converter.py

First of all, thank you for the great work. I used the converter as instructed to convert the Waymo dataset. Afterward, I counted the number of images and calibrations in the Waymo validation_org set and found that there were 39,987 of each, but only 39,047 labels. When I applied the setup_split, I discovered that some data had not been converted due to the lack of labels. As a result, I only have 51,257 training samples and 38,960 validation samples, while your paper states that there are 52,386 training and 39,848 validation samples. How can I obtain the same number of samples as described in the paper (52,386, 39,848)?

Inference Error

Firstly, thanks for your excellent work!

When I try to validate my checkpoint on the KITTI validation set, i.e. CUDA_VISIBLE_DEVICES=7 python -u tools/train_val.py --config=experiments/run_221.yaml --resume_model output/run_221/checkpoints/checkpoint_epoch_20.pth -e.
I meet the FileNotFoundError: [Errno 2] No such file or directory: 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object': 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object'

The log is as follows:

Traceback (most recent call last):
  File "tools/train_val.py", line 155, in <module>
    main()
  File "tools/train_val.py", line 132, in main
    tester.test()
  File "/home/linhb/code/DEVIANT-main/code/lib/helpers/tester_helper.py", line 118, in test
    use_logging= True, logger= self.logger)
  File "/home/linhb/code/DEVIANT-main/code/lib/helpers/rpn_util.py", line 254, in evaluate_kitti_results_verbose
    results_obj.main = run_kitti_eval_script(eval_binary_path, results_data= stats_save_folder, gt_folder= gt_folder, lbls= lbls, use_40=True)
  File "/home/linhb/code/DEVIANT-main/code/lib/helpers/rpn_util.py", line 345, in run_kitti_eval_script
    _ = subprocess.check_output([eval_binary_path, results_data, gt_folder], stderr=devnull)
  File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 488, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object': 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object'

Any help?

About the Total Loss Function

Encountered a problem that I haven't quite understood. This problem exists in multiple 3D monocular object detection algorithms
The total loss in the algorithm is obtained by adding up multiple loss terms, However, adding the loss term of aa loss during the training process may result in negative values, as shown in the above figure. Will this simple summation method have a negative impact on the total loss??
loss1
The total loss in the algorithm is obtained by adding multiple loss terms, but I have found that some loss terms may be negative. I would like to ask if this negative loss value will have a negative impact on the total loss.
I found through research that negative losses come from laplacian_aleatoric_uncertainty_loss
loss
I have also tried to increase the absolute value of the loss terms that result in negative numbers, so that all the loss terms are positive, but the resulting model performance is not ideal
I have not fully understood this position and hope to receive guidance. Looking forward to your reply. Thank you very much

Failure during Multi-GPU evaluation

Hi,
I'm encountering an error in the first eval epoch. The error I get is:
Screenshot from 2024-01-17 16-32-04
I am running the gupnet model training:

CUDA_VISIBLE_DEVICES=0,1 python -u tools/train_val.py --config=experiments/run_221.yaml 

I was successful training the model on a sub dataset of only 300 images. The error appears what I train the full dataset.
Any suggestions?

Waymo Dataset Filtering

Hi,
congratulations to your very nice paper!

I would have a question regarding Waymo. In the paper you mention that you filter out objects with depth <= 2m and objects with too few lidar points (car: 100, pedestrian / cyclists: 50). In general I think it makes sense to do that.

I wonder whether that is even strict enough. Here is an example where I used your approach for data generation and plotted the results:
284152
Here the labels for the two cars on the right that are not even visible anymore:
Car 0 0 -10 1858.27 625.86 1920.0 872.42 1.84 2.14 4.75 9.95 1.78 17.5 1.52 1094
Car 0 0 -10 1769.02 728.22 1920.0 1280.0 1.8 2.16 4.86 4.17 2.15 5.16 -1.62 7286

Do you do more filtering that I am not aware of at the moment?
And do you also filter the ground truth labels in the same way for evaluation as for training? If not, what is the difference?

Best wishes

Johannes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.