Giter VIP home page Giter VIP logo

vision-centric-bev-perception's Introduction

Vision-Centric-BEV-Perception

Vision-Centric BEV Perception: A Survey

Introduction

(1) Datasets

(2) GEOMETRY BASED PV2BEV

Homograph based PV2BEV

Public Papers:

  • IPM: Inverse perspective mapping simplifies optical flow computation and obstacle detection (Biological Cybernetics'1991) [paper]
  • DSM: Automatic Dense Visual Semantic Mapping from Street-Level Imagery (IROS'12) [paper]
  • MapV: Learning to map vehicles into bird’s eye view (ICIAP'17) [paper]
  • BridgeGAN: Generative Adversarial Frontal View to Bird View Synthesis (3DV'18) [paper][project page]
  • VPOE: Deep learning based vehicle position and orientation estimation via inverse perspective mapping image (IV'19) [paper]
  • 3D-LaneNet: End-to-End 3D Multiple Lane Detection (ICCV'19) [paper]
  • The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping (IV'19) [paper]
  • Cam2BEV: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View (ITSC'20) [paper] [project page]
  • MonoLayout: Amodal Scene Layout from a Single Image (WACA'20) [paper] [project page]
  • MVNet: Multiview Detection with Feature Perspective Transformation (ECCV'20) [paper] [project page]
  • OGMs: Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning (WACA'21) [paper] [project page]
  • TrafCam3D: Monocular 3D Vehicle Detection Using Uncalibrated Traffic Camerasthrough Homography (IROS'21) [paper] [project page]
  • SHOT:Stacked Homography Transformations for Multi-View Pedestrian Detection (ICCV'21) [paper]
  • HomoLoss: Homography Loss for Monocular 3D Object Detection (CVPR'22) [paper]

Chronological Overview:

Depth based PV2BEV

Public Papers:

  • OFT: Orthographic Feature Transform for Monocular 3D Object Detection (BMVC'19) [paper] [project page]
  • CaDDN: Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR'21) [paper] [project page]
  • DSGN: Deep Stereo Geometry Network for 3D Object Detection (CVPR'20) [paper] [project page]
  • Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV'20) [paper] [project page]
  • PanopticSeg: Bird’s-Eye-View Panoptic Segmentation Using Monocular Frontal View Images (RA-L'22) [paper] [project page]
  • FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV'21) [paper] [project page]
  • LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector (ICCV'21) [paper] [project page]
  • ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection (WACV'22) [paper] [project page]
  • BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View (Arxiv'21) [paper] [project page]
  • M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation (Arxiv'22) [paper] [project page]
  • StretchBEV: Stretching Future Instance Prediction Spatially and Temporally (ECCV'22) [paper] [project page]
  • DfM: Monocular 3D Object Detection with Depth from Motion (ECCV'22) [paper] [project page]
  • BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection (Arxiv'22) [paper] [project page]
  • BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (Arxiv'22) [paper] [project page]
  • MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones (Arxiv'22) [paper] [project page]
  • Putting People in their Place: Monocular Regression of 3D People in Depth (CVPR'22) [Code] [Project Page] [Paper] [Video] [RH Dataset]

Chronological Overview:

Benchmark Results:

(3) NETWORK BASED PV2BEV

MLP based PV2BEV

Public Papers:

  • VED: Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks (RA-L'19) [paper] [project page]
  • VPN: Cross-view Semantic Segmentation for Sensing Surroundings (IROS'20) [paper] [project page]
  • FishingNet: Future Inference of Semantic Heatmaps In Grids (Arxiv'20) [paper]
  • PON: Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks (CVPR'20) [paper] [project page]
  • STA-ST: Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation (ICRA'21) [paper]
  • HDMapNet: An Online HD Map Construction and Evaluation Framework (ICRA'22) [paper] [project page]
  • Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation (CVPR'21) [paper] [project page]
  • HFT: Lifting Perspective Representations via Hybrid Feature Transformation (Arxiv'22) [paper] [project page]

Chronological Overview:

Benchmark Results:

Transformer based PV2BEV

Public Papers:

  • STSU: Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images (ICCV'21) [paper] [project page]
  • Image2Map: Translating Images into Maps (ICRA'22) [paper] [project page]
  • DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries (CoRL'21) [paper] [project page]
  • TopologyPL: Topology Preserving Local Road Network Estimation from Single Onboard Camera Image (CVPR'22) [paper] [project page]
  • PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV'22) [paper] [project page]
  • BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs (Arxiv'22) [paper]
  • PersFormer: a New Baseline for 3D Laneline Detection (ECCV'22) [paper] [project page]
  • MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer (CVPR'22) [page] [project page]
  • MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection (Arxiv'22) [paper] [project page]
  • BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers (ECCV'22) [paper] [project page]
  • GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation (ECCV'22) [paper]
  • Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection (MM'22) [paper]
  • CVT: Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR'22) [paper] [project page]
  • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Arxiv'22) [paper] [project page]
  • Ego3RT: Learning Ego 3D Representation as Ray Tracing (ECCV'22) [paper] [project page]
  • GKT: Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer (Arxiv'22) [paper] [project page]
  • PolarDETR: Polar Parametrization for Vision-based Surround-View 3D Detection (Arxiv'22) [paper] [project page]
  • LaRa: Latents and Rays for Multi-Camera Bird’s-Eye-View Semantic Segmentation (Arxiv'22) [paper]
  • SRCN3D: Sparse R-CNN 3D Surround-View Cameras 3D Object Detection and Tracking for Autonomous Driving (Arxiv'22) [paper] [project page]
  • PolarFormer: Multi-camera 3D Object Detection with Polar Transformers (Arxiv'22)[paper] [project page]
  • ORA3D: ORA3D: Overlap Region Aware Multi-view 3D Object Detection (Arxiv'22) [paper]
  • CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers (Arxiv'22) [paper]

Chronological Overview:

Benchmark Results:

(4) EXTENSION

Multi-Task Learning under BEV

  • FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV'21) [paper] [project page]
  • StretchBEV: Stretching Future Instance Prediction Spatially and Temporally (ECCV'22) [paper] [project page]
  • BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (Arxiv'22) [paper] [project page]
  • M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Bird’s-Eye View Representation (Arxiv'22) [paper] [project page]
  • STSU: Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images (ICCV'21) [paper] [project page]
  • BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers (ECCV'22) [paper] [project page]
  • Ego3RT: Learning Ego 3D Representation as Ray Tracing (ECCV'22) [paper] [project page]
  • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Arxiv'22) [paper] [project page]
  • PolarFormer: Multi-camera 3D Object Detection with Polar Transformers (Arxiv'22)[paper] [project page]

Fusion under BEV

Multi-Modality Fusion:

  • PointPainting: Sequential Fusion for 3D Object Detection (CVPR'19) [paper] [project page]
  • 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection (ECCV'20) [paper] [project page]
  • FUTR3D: A Unified Sensor Fusion Framework for 3D Detection (Arxiv'22) [paper] [project page]
  • MVP: Multimodal Virtual Point 3D Detection (NIPS'21) [paper] [project page]
  • PointAugmenting: Cross-Modal Augmentation for 3D Object Detection (CVPR'21) [paper] [project page]
  • FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection (ITSC'21) [paper] [project page]
  • Unifying Voxel-based Representation with Transformer for 3D Object Detection (Arxiv'21) [paper] [project page]
  • TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers (CVPR'22) [paper] [project page]
  • AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection (IJCAI'22) [paper] [project page]
  • AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection (ECCV'22) [paper] [project page]
  • CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection (WACV'21) [paper] [project page]
  • MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection (Arxiv'22) [paper][project page]

Temporal Fusion:

  • BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection (Arxiv'22) [paper] [project page]
  • Image2Map: Translating Images into Maps (ICRA'22) [paper] [project page]
  • FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV'21) [paper] [project page]
  • Ego3RT: Learning Ego 3D Representation as Ray Tracing (ECCV'22) [paper] [project page]
  • PolarFormer: Multi-camera 3D Object Detection with Polar Transformers (Arxiv'22)[paper] [project page]
  • BEVStitch: Understanding Bird’s-Eye View of Road Semantics using an Onboard Camera (ICRA'22) [paper] [project page]
  • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Arxiv'22) [paper] [project page]
  • BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers (ECCV'22) [paper] [project page]
  • UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird’s-Eye-View (Arxiv'22) [paper]
  • DfM: Monocular 3D Object Detection with Depth from Motion (ECCV'22) [paper] [project page]

Multi-agent Fusion:

  • CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers (Arxiv'22) [paper]

Empirical Know-Hows

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{Ma2022VisionCentricBP,
  title={Vision-Centric BEV Perception: A Survey},
  author={Yuexin Ma and Tai Wang and Xuyang Bai and Huitong Yang and Yuenan Hou and Yaming Wang and Y. Qiao and Ruigang Yang and Dinesh Manocha and Xinge Zhu},
  year={2022}
}

Contributing

Please feel free to submit a pull request to add the new paper or related project page.

Related Repos

vision-centric-bev-perception's People

Contributors

4dvlab avatar arthur151 avatar sxjyjay avatar xinge008 avatar xuyangbai avatar yangh8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vision-centric-bev-perception's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.