- 2024.04.17 Support for ray-iou metric
- 2024.03.22 Release the code for FlashOCCV2
- 2024.02.03 Release the training code for FlashOcc on UniOcc
- 2024.01.20 TensorRT Implement Writen In C++ With Cuda Acceleration
- 2023.12.23 Release the quick testing code via TensorRT in MMDeploy.
- 2023.11.28 Release the training code for FlashOCC.
This repository is an official implementation of FlashOCC
Given the capability of mitigating the long-tail deficiencies and intricate-shaped absence prevalent in 3D object detection, occupancy prediction has become a pivotal component in autonomous driving systems. However, the procession of three-dimensional voxel-level representations inevitably introduces large overhead in both memory and computation, obstructing the deployment of to-date occupancy prediction approaches. In contrast to the trend of making the model larger and more complicated, we argue that a desirable framework should be deployment-friendly to diverse chips while maintaining high precision. To this end, we propose a plug-and-play paradigm, namely FlashOCC, to consolidate rapid and memory-efficient occupancy prediction while maintaining high precision. Particularly, our FlashOCC makes two improvements based on the contemporary voxel-level occupancy prediction approaches. Firstly, the features are kept in the BEV, enabling the employment of efficient 2D convolutional layers for feature extraction. Secondly, a channel-to-height transformation is introduced to lift the output logits from the BEV into the 3D space. We apply the FlashOCC to diverse occupancy prediction baselines on the challenging Occ3D-nuScenes benchmarks and conduct extensive experiments to validate the effectiveness. The results substantiate the superiority of our plug-and-play paradigm over previous state-of-the-art methods in terms of precision, runtime efficiency, and memory costs, demonstrating its potential for deployment.
Config | Backbone | Input Size |
mIoU | FPS (Hz) |
Flops (G) |
Params (M) |
Model | Log |
---|---|---|---|---|---|---|---|---|
BEVDetOCC (1f) | R50 | 256x704 | 31.60 | 92.1 | 241.76 | 29.02 | gdrive | log |
M0: FlashOCC (1f) | R50 | 256x704 | 31.95 | 197.6 | 154.1 | 39.94 | gdrive | log |
M1: FlashOCC (1f) | R50 | 256x704 | 32.08 | 152.7 | 248.57 | 44.74 | gdrive | log |
BEVDetOCC-4D-Stereo (2f) | R50 | 256x704 | 36.1 | - | - | - | baidu | log |
M2:FlashOCC-4D-Stereo (2f) | R50 | 256x704 | 37.84 | - | - | - | gdrive | log |
BEVDetOCC-4D-Stereo (2f) | Swin-T | 512x1408 | 42.0 | - | - | - | baidu | log |
M3:FlashOCC-4D-Stereo (2f) | Swin-T | 512x1408 | 43.52 | - | 1490.77 | 144.99 | gdrive | log |
FPS are tested via TensorRT on 3090 with FP16 precision. Please refer to Tab.2 in paper for the detail model settings for M-number.
In FlashOCCV2, we have made the following 3 adjustments to FlashOCC:
- Without using camera mask for training. This is because its use significantly improves the prediction performance in the visible region, but at the expense of prediction in the invisible region.
- Using category balancing.
- Using stronger loss settings.
More results for different configurations will be released soon.
Config | Backbone | Input Size |
Ray-Iou | mIoU | FPS (Hz) |
Flops (G) |
Params (M) |
Model | Log |
---|---|---|---|---|---|---|---|---|---|
M1: FlashOCC (1f) | R50 | 256x704 | - | 15.41 | 25.5 | 248.57 | 44.74 | gdrive | log |
FlashOCCV2-Depth-tiny (1f) | R50 | 256x704 | 34.57 | 28.83 | 29.0 | 175.00 | 45.32 | gdrive | log |
FlashOCCV2-Depth (1f) | R50 | 256x704 | 34.93 | 28.91 | 22.6 | 269.47 | 50.12 | gdrive | log |
FlashOCCV2-4D-Depth (2f) | R50 | 256x704 | 35.99 | 29.57 | 22.0 | - | - | gdrive | log |
FlashOCCV2-4DLongterm-Depth (8f) | R50 | 256x704 | 38.51 | 31.49 | 20.3 | - | - | gdrive | log |
FlashOCCV2-4DLongterm-Depth (16f) | R50 | 256x704 | 38.31 | 31.55 | 19.2 | - | - | gdrive | log |
- Please note that the FPS here is measured with Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz and NVIDIA RTX3090 GPU (PyTorch fp32 backend).
Backend | mIOU | FPS(Hz) |
---|---|---|
PyTorch-FP32 | 31.95 | - |
TRT-FP32 | 30.78 | 96.2 |
TRT-FP16 | 30.78 | 197.6 |
TRT-FP16+INT8(PTQ) | 29.60 | 383.7 |
TRT-INT8(PTQ) | 29.59 | 397.0 |
A detail video can be found at baidu
Many thanks to the authors of BEVDet, FB-BEV, RenderOcc and SparseBEV
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{yu2023flashocc,
title={FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin},
author={Zichen Yu and Changyong Shu and Jiajun Deng and Kangjie Lu and Zongdai Liu and Jiangyong Yu and Dawei Yang and Hui Li and Yan Chen},
year={2023},
eprint={2311.12058},
archivePrefix={arXiv},
primaryClass={cs.CV}
}