Giter VIP home page Giter VIP logo

occnerf's Introduction

OccNeRF

Project Page | Paper | Checkpoints & Videos

OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields

Chubin Zhang*, Juncheng Yan* Yi Wei*, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu

Updates:

  • ๐Ÿ”” 2023/12/15 Initial code and paper release.

๐Ÿ•น Demos

Demos are a little bit large; please wait a moment to load them. If you cannot load them or feel them blurry, you can click the hyperlink of each demo for the full-resolution raw video.

๐Ÿ“ Introduction

In this paper, we propose an OccNeRF method for self-supervised multi-camera occupancy prediction. Different from bounded 3D occupancy labels, we need to consider unbounded scenes with raw image supervision. To solve the issue, we parameterize the reconstructed occupancy fields and reorganize the sampling strategy. The neural rendering is adopted to convert occupancy fields to multi-camera depth maps, supervised by multi-frame photometric consistency. Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.

๐Ÿ’ก Method

Method Pipeline:

We first use a 2D backbone to extract multi-camera features, which are lifted to 3D space to get volume features with interpolation. The parameterized occupancy fields are reconstructed to describe unbounded scenes. To obtain the rendered depth and semantic maps, we perform volume rendering with our reorganized sampling strategy. The multi-frame depths are supervised by photometric loss. For semantic prediction, we adopted pretrained Grounded-SAM with prompts cleaning. The green arrow indicates supervision signals.

๐Ÿ”ง Installation

Clone this repo and install the dependencies:

git clone --recurse-submodules https://github.com/LinShan-Bin/OccNeRF.git
cd OccNeRF
conda create -n occnerf python=3.8
conda activate occnerf
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

Our code is tested with Python 3.8, PyTorch 1.9.1 and CUDA 11.3 and can be adapted to other versions of PyTorch and CUDA with minor modifications.

๐Ÿ— Dataset Preparation

  1. Download nuScenes V1.0 full dataset data from nuScenes and link the data folder to ./data/nuscenes/nuscenes/.

  2. Download the ground truth occupancy labels from Occ3d and unzip the gts.tar.gz to ./data/nuscenes/gts. Note that we only use the 3d occpancy labels for validation.

  3. Generate the ground truth depth maps for validation:

    python tools/export_gt_depth_nusc.py
  4. Download the dataset index pickle file from SurroundOcc and place nuscenes_infos_train.pkl under ./data/nuscenes/. Then generate the ground truth semantic maps:

    cd GroundedSAM_OccNeRF
    bash ./run.sh
  5. Download the pretrained weights of our model from Checkpoints and move them to ./ckpts/.

  6. Refer to README.md in ./GroundedSAM_OccNeRF/ and prepare semantic prediction results of the training dataset if you want to train OccNeRF with semantic supervision.

The Final folder structure should be like:

OccNeRF/
โ”œโ”€โ”€ ckpts/
โ”‚   โ”œโ”€โ”€ nusc-depth/
โ”‚   โ”‚   โ”œโ”€โ”€ encoder.pth
โ”‚   โ”‚   โ”œโ”€โ”€ depth.pth
โ”‚   โ”œโ”€โ”€ nusc-sem/
โ”‚   โ”‚   โ”œโ”€โ”€ encoder.pth
โ”‚   โ”‚   โ”œโ”€โ”€ depth.pth
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ nuscenes/
โ”‚   โ”‚   โ”œโ”€โ”€ nuscenes/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ maps/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ samples/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ sweeps/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ v1.0-trainval/
โ”‚   โ”‚   โ”œโ”€โ”€ gts/
โ”‚   โ”‚   โ”œโ”€โ”€ nuscenes_depth/
โ”‚   โ”‚   โ”œโ”€โ”€ nuscenes_semantic/
โ”‚   โ”‚   โ”œโ”€โ”€ nuscenes_infos_train.pkl
โ”œโ”€โ”€ ...

๐Ÿš€ Quick Start

Training

Train OccNeRF without semantic supervision:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-depth.txt

In order to train the full model, you need at least 80 GB GPU memory. If you have less GPU memory (e.g., 40 GB), you can train with a single frame (set auxiliary_frame = False in the config file). See section 4.4 in the paper for the ablation study. Evaluation can be done with 24 GB GPU memory.

Train OccNeRF with semantic supervision:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-sem.txt

Evaluation

Evaluate the depth estimation:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-depth.txt --eval_only --load_weights_folder ckpts/nusc-depth

Evaluate the occupancy prediction:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-sem.txt --eval_only --load_weights_folder ckpts/nusc-sem

Visualization

Visualize the depth estimation:

python tools/export_vis_data.py  # You can modify this file to choose scenes you want to visualize. Otherwise, all validation scenes will be visualized.
python -m torch.distributed.launch --nproc_per_node=8 run_vis.py --config configs/nusc-depth.txt --load_weights_folder ckpts/nusc-depth --log_dir your_log_dir
python gen_scene_video.py scene_folder_generated_by_the_above_command

๐Ÿ™ Acknowledgement

Many thanks to these excellent projects:

๐Ÿ“ƒ Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{chubin2023occnerf, 
      title   = {OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields}, 
      author  = {Chubin Zhang and Juncheng Yan and Yi Wei and Jiaxin Li and Li Liu and Yansong Tang and Yueqi Duan and Jiwen Lu},
      journal = {arXiv preprint arXiv:2312.09243},
      year    = {2023}
}

occnerf's People

Contributors

junchengyan avatar linshan-bin avatar weiyithu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.