abhi1kumar / deviant Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2022] Official PyTorch Code of DEVIANT: Depth Equivariant Network for Monocular 3D Object Detection
Home Page: https://arxiv.org/abs/2207.10758
License: MIT License
[ECCV 2022] Official PyTorch Code of DEVIANT: Depth Equivariant Network for Monocular 3D Object Detection
Home Page: https://arxiv.org/abs/2207.10758
License: MIT License
I converted the Rope3D dataset to KITTI format. I tried to test the model with the KITTI pre-training file you provided. But the result is not very satisfactory.
For the description of the data after I converted Rope3D:
So I have the following thoughts
Hi,
First of all, thanks for the code.
I installed everything by the steps you provided, and I'm trying to run only the deviant model using the following cmd taken from scripts_training.sh:
CUDA_VISIBLE_DEVICES=0 python -u tools/train_val.py --config=experiments/run_221.yaml
My problem is that I get an error for CUDA out of memory at the first epoch just after the logging of the weights.
I did manage to run the code on a very small partial dataset of the KITTI dataset (100 images).
Do you have any advice on how to approach this error?
thanks for the great work , I used the convert script you provided to convert to the kitti format, but there is no label in label_all folder, and then I tried to use the label.zip in #5 , and found that it cannot correspond to the converted image. Do you have any experience with this?
I'm trying to train deviant on waymo, and I read 1051.yaml. Is it ok to train 30 epochs on waymo? Or is it just a pretrained model config, and I should train more epochs.
I downloaded the pre-trained weighs. I would like to use the kitti weights to run on my raw video/webcam and get the output with 3d box and bird's eye view. How can I do that? Also, how do I include the my extrinsic and intrinsic camera calibration parameters?
Hello. Thank you for the great research.
I have two questions.
When submitting to the KITTI dataset benchmark, is it correct to use all 7,481 images in the training folder for training?
If so, could you please explain how you validated for submission to the test?
Thank you.
Hello, thanks for your great work~
I trained for 20 epoch on 2*GTX1080, but I find evaluate result is too low, pls help.
Here is the comparison between yours(left) and mine(right):
Here is my training log:
20230802_171210.txt
Thanks for your great work!
Do you happen to know methods for evaluating nuScenes using metrics other than MAE, such as mAP in your code?
In your project, I used my own training data to train the model and found that some targets could never be trained.
Therefore, I searched for the problem and found the location on the image.
pts_rect is
P2 is
Obtain results
But my image size is 2560*1150, The projected 3D coordinates exceeded the boundary, causing the target in the image to be unable to enter training.
According to coordinate transformation rules
I think this position should be
pts_img = (pts_2d_hom[:, 0:2].T / pts_2d_hom[:, 2]).T
rather than
pts_img = (pts_2d_hom[:, 0:2].T / pts_rect_hom[:, 2]).T
However, after changing this position, I still haven't achieved ideal results. May I ask if my thinking is correct?? If there is an error, please point it out. If it is correct, are there any other relevant positions that need to be changed? Why did I not achieve the desired result
Sincerely in need of help, thank you very much
What causes this
Your work is great! Can you provide a nuscenes config?
I'm very sorry, but I have come across another place that I haven't quite understood and would like to ask for advice
The kitti dataset is annotated with alpha, Why obtain alpha through ry conversion. And I found through testing that the converted alpha and the annotated alpha are not exactly the same. I really don't understand why everyone is doing this in engineering.
Hope to receive guidance, thank you very much
Thank you again for your great work. I am still failing to make it work on another custom dataset. I tried getting more images, now more than 10,000, but the model still seems to not make any prediction at all. I was looking at the code, and I noticed that there are certain assumptions for max depth in kitti_utils.py that weren't mentioned elsewhere:
L302 : np.linspace(2,78,wsize*hsize).reshape(hsize,wsize,1)],-1)).reshape(-1,3)
L333 : random_depth = np.linspace(2,78,wsize*hsize).reshape(hsize,wsize,1)
Is there any documentation of similar assumptions made to fit Kitti dataset?
Hi,
Thanks for sharing your work.
I was training on a custom dataset.
The losses after 6 epochs are nan
. Tried reducing the learning rate but that didnt help either. Wondering if @abhi1kumar you encountered this issue while training.
INFO ------ TRAIN EPOCH 006 ------
INFO Learning Rate: 0.001250
INFO Weights: depth_:nan, heading_:nan, offset2d_:1.0000, offset3d_:nan, seg_:1.0000, size2d_:1.0000, size3d_:nan,
INFO BATCH[0020/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO BATCH[0040/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO BATCH[0060/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO BATCH[0080/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO BATCH[0100/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
INFO BATCH[0120/3150] depth_loss:nan, heading_loss:nan, offset2d_loss:nan, offset3d_loss:nan, seg_loss:nan, size2d_loss:nan, size3d_loss:nan,
Before epoch 6 losses are reducing as expected.
I am trying to work with the model on a custom dataset. I made a config file which is very similar to run_221.yaml.
I changed the dataset type(created another one to fit my custom classes and dimensions) and the resolution. The resolution of my images is WxH = 1024x750. I believe that the downsampling factor is causing the error, which states that shape of tensor a (188, ) does not match shape of tensor b (187, ). Note that 750/4 = 187.5, right in the middle of the shape mismatch.
After several tries of playing with the resolutions, the one that worked and that was closest to my original ratio was 704, 512. I wanted to know whether there is some part of code which I could change, so that I maintain my original image size. Also generally I wanted to know if this causes problems for the model to learn. Also, do I need to change the sesn_scales?
Thanks in advance!
INFO ------ EVAL EPOCH 020 ------
Evaluation Progress: 100%|████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:54<00:00, 2.75s/it]
2023-10-07 15:28:18,751 INFO ==> Saving results in output/config_run_201_a100_v0_1/result_20
Traceback (most recent call last):
File "/home/ys/DEVIANT/code/tools/train_val.py", line 158, in
main()
File "/home/ys/DEVIANT/code/tools/train_val.py", line 152, in main
trainer.train()
File "/home/ys/DEVIANT/code/lib/helpers/trainer_helper.py", line 87, in train
self.eval_one_epoch()
File "/home/ys/DEVIANT/code/lib/helpers/trainer_helper.py", line 207, in eval_one_epoch
use_logging= True, logger= self.logger)
File "/home/ys/DEVIANT/code/lib/helpers/rpn_util.py", line 254, in evaluate_kitti_results_verbose
results_obj.main = run_kitti_eval_script(eval_binary_path, results_data= stats_save_folder, gt_folder= gt_folder, lbls= lbls, use_40=True)
File "/home/ys/DEVIANT/code/lib/helpers/rpn_util.py", line 345, in run_kitti_eval_script
_ = subprocess.check_output([eval_binary_path, results_data, gt_folder], stderr=devnull)
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 488, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/home/ys/.conda/envs/DEVIANT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object': 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object'
Does the waymo model use the V1.1 version released by the official website?
What should I use if I downloaded version 1.0?
Thank you!
First many thanks for your work. I am new on object detection and have spent many time on this data_setup part. Yet I cannot setup the data like your structure.
A light suggestion: Download the nuScenes and Waymo datasets. As my understanding, these 2 datasets can be downloaded to somewhere outside the project because we will use soft link to connect them with the DEVIANT project.
My confusion lays in this step:
Then follow the instructions at convert_nuscenes_to_kitti_format_and_evaluate.sh to get nusc_kitti_org folder.
I think I should download the datasets following yours and nuScene's github. But I found it hard to follow convert_nuscenes_to_kitti_format_and_evaluate.sh as I don't have v1.0-trainval#number_blobs_camera.tgz and v1.0-trainval01_blobs_lidar.tgz and many other directories in this .sh file.
So I am not able to generate nusc_kitti_org folder and continue.
P matrix in calib in KITTI, what exactly does it mean?
According to my search, it is the product of the Intrinsic and Extrinsic parameters of the camera, where the Extrinsic parameters are the rotation matrix and translational vector.
Here's a simple matrix I got using checkerboard correction in matlab. He seems to have problems at [0,3] and [1,3]. I use this matrix to replace P2 in calib,but the inference results in null. It seems worse than before when you suggested that I just use something like containing only the Intrinsic parameters without the Extrinsic parameters. It is possible to get inference results using only the Intrinsic parameters, except that the 3D box do not match exactly
I'm sorry to bother you again, but this question has been bothering me for a long time and I can't find any useful information!
Hi all!
I managed to run the experiments on kitti, no issues.
But I would like to run the model and visualize the output of a single image. Is there a script to do that?
thank you!
Thank for your outstanding work. I have finished training the model, how do I validate the model using the original images or videos? Can you give me some advice?
First of all, thank you for the great work. I used the converter as instructed to convert the Waymo dataset. Afterward, I counted the number of images and calibrations in the Waymo validation_org set and found that there were 39,987 of each, but only 39,047 labels. When I applied the setup_split, I discovered that some data had not been converted due to the lack of labels. As a result, I only have 51,257 training samples and 38,960 validation samples, while your paper states that there are 52,386 training and 39,848 validation samples. How can I obtain the same number of samples as described in the paper (52,386, 39,848)?
Firstly, thanks for your excellent work!
When I try to validate my checkpoint on the KITTI validation set, i.e. CUDA_VISIBLE_DEVICES=7 python -u tools/train_val.py --config=experiments/run_221.yaml --resume_model output/run_221/checkpoints/checkpoint_epoch_20.pth -e
.
I meet the FileNotFoundError: [Errno 2] No such file or directory: 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object': 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object'
The log is as follows:
Traceback (most recent call last):
File "tools/train_val.py", line 155, in <module>
main()
File "tools/train_val.py", line 132, in main
tester.test()
File "/home/linhb/code/DEVIANT-main/code/lib/helpers/tester_helper.py", line 118, in test
use_logging= True, logger= self.logger)
File "/home/linhb/code/DEVIANT-main/code/lib/helpers/rpn_util.py", line 254, in evaluate_kitti_results_verbose
results_obj.main = run_kitti_eval_script(eval_binary_path, results_data= stats_save_folder, gt_folder= gt_folder, lbls= lbls, use_40=True)
File "/home/linhb/code/DEVIANT-main/code/lib/helpers/rpn_util.py", line 345, in run_kitti_eval_script
_ = subprocess.check_output([eval_binary_path, results_data, gt_folder], stderr=devnull)
File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 488, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "/home/linhb/miniconda3/envs/DEVIANT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object': 'data/KITTI/kitti_split1/devkit/cpp/evaluate_object'
Any help?
Encountered a problem that I haven't quite understood. This problem exists in multiple 3D monocular object detection algorithms
The total loss in the algorithm is obtained by adding up multiple loss terms, However, adding the loss term of aa loss during the training process may result in negative values, as shown in the above figure. Will this simple summation method have a negative impact on the total loss??
The total loss in the algorithm is obtained by adding multiple loss terms, but I have found that some loss terms may be negative. I would like to ask if this negative loss value will have a negative impact on the total loss.
I found through research that negative losses come from laplacian_aleatoric_uncertainty_loss
I have also tried to increase the absolute value of the loss terms that result in negative numbers, so that all the loss terms are positive, but the resulting model performance is not ideal
I have not fully understood this position and hope to receive guidance. Looking forward to your reply. Thank you very much
Hi,
I'm encountering an error in the first eval epoch. The error I get is:
I am running the gupnet model training:
CUDA_VISIBLE_DEVICES=0,1 python -u tools/train_val.py --config=experiments/run_221.yaml
I was successful training the model on a sub dataset of only 300 images. The error appears what I train the full dataset.
Any suggestions?
Hi, thanks for your excellent work, I want to know can this model export libtorch or onnx model? I want to know can SES blocks work on TensorRT?
Hi,
congratulations to your very nice paper!
I would have a question regarding Waymo. In the paper you mention that you filter out objects with depth <= 2m and objects with too few lidar points (car: 100, pedestrian / cyclists: 50). In general I think it makes sense to do that.
I wonder whether that is even strict enough. Here is an example where I used your approach for data generation and plotted the results:
Here the labels for the two cars on the right that are not even visible anymore:
Car 0 0 -10 1858.27 625.86 1920.0 872.42 1.84 2.14 4.75 9.95 1.78 17.5 1.52 1094
Car 0 0 -10 1769.02 728.22 1920.0 1280.0 1.8 2.16 4.86 4.17 2.15 5.16 -1.62 7286
Do you do more filtering that I am not aware of at the moment?
And do you also filter the ground truth labels in the same way for evaluation as for training? If not, what is the difference?
Best wishes
Johannes
Where is your visualized code
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.