Giter VIP home page Giter VIP logo

visfusion's Introduction

VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos (CVPR 2023)



Installation

sudo apt install libsparsehash-dev
conda env create -f environment.yaml
conda activate visfusion

ScanNet Dataset

We use the same input data structure as NeuralRecon. You could download and extract ScanNet v2 dataset by following the instructions provided at http://www.scan-net.org/ or the scannet_wrangling_scripts provided by SimpleRecon.

Expected directory structure of ScanNet:

DATAROOT
└───scannet
│   └───scans
│   |   └───scene0000_00
│   |       └───color
│   |       │   │   0.jpg
│   |       │   │   1.jpg
│   |       │   │   ...
│   |       │   ...
│   └───scans_test
│   |   └───scene0707_00
│   |       └───color
│   |       │   │   0.jpg
│   |       │   │   1.jpg
│   |       │   │   ...
│   |       │   ...
|   └───scannetv2_test.txt
|   └───scannetv2_train.txt
|   └───scannetv2_val.txt

Then generate the input fragments and the ground truth TSDFs for the training/val data split by

python tools/tsdf_fusion/generate_gt.py --data_path PATH_TO_SCANNET \ 
                                        --save_name all_tsdf_9 \ 
                                        --window_size 9

and for the test split by

python tools/tsdf_fusion/generate_gt.py --test \ 
                                        --data_path PATH_TO_SCANNET \ 
                                        --save_name all_tsdf_9 \ 
                                        --window_size 9

Example data

We provide an example ScanNet scene (scene0785_00) to quickly try out the code. Download it from here and unzip it into the main directory of the project code.

The reconstructed meshes will be saved to PROJECT_PATH/results.

python main.py --cfg ./config/test.yaml \
                SCENE scene0785_00 \ 
                TEST.PATH ./example_data/ScanNet \ 
                LOGDIR: ./checkpoints \ 
                LOADCKPT pretrained/model_000049.ckpt

By default, it will output double layer meshes (for NeuralRecon's evaluation). Set MODEL.SINGLE_LAYER_MESH=True to directly output single layer meshes for TransformerFusion's evaluation.

python main.py --cfg ./config/test.yaml \
                SCENE scene0785_00 \ 
                TEST.PATH ./example_data/ScanNet \ 
                LOGDIR: ./checkpoints \ 
                LOADCKPT pretrained/model_000049.ckpt \ 
                MODEL.SINGLE_LAYER_MESH True

Training

Change TRAIN.PATH to your own data path in config/train.yaml and start training by running ./train.sh.

train.sh:

#!/usr/bin/env bash
export CUDA_VISIBLE_DEVICES=0

python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 20 MODEL.FUSION.FUSION_ON False
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 41
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 44 TRAIN.FINETUNE_LAYER 0 MODEL.PASS_LAYERS 0
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 47 TRAIN.FINETUNE_LAYER 1 MODEL.PASS_LAYERS 1
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 50 TRAIN.FINETUNE_LAYER 2 MODEL.PASS_LAYERS 2

The training is seperated to five phases:

  • Phase 1 (epoch 1 - 20), train single fragments. MODEL.FUSION.FUSION_ON=False

  • Phase 2 (epoch 21 - 41), train the whole model with GRUFusion.

  • Phase 3 (epoch 42 - 44), finetune the first layer with GRUFusion. TRAIN.FINETUNE_LAYER=0, MODEL.PASS_LAYERS=0

  • Phase 4 (epoch 45 - 47), finetune the second layer with GRUFusion. TRAIN.FINETUNE_LAYER=1, MODEL.PASS_LAYERS=1

  • Phase 5 (epoch 48 - 50), finetune the third layer with GRUFusion. TRAIN.FINETUNE_LAYER=2, MODEL.PASS_LAYERS=2

Test

Change TEST.PATH to your own data path in config/test.yaml and start testing by running

python main.py --cfg ./config/test.yaml

Evaluation

We use NeuralRecon's evaluation for our main results.

python tools/evaluation.py --model ./results/scene_scannet_checkpoints_fusion_eval_49 --n_proc 16

You could print previous evaluation results by

python tools/visualize_metrics.py --model ./results/scene_scannet_checkpoints_fusion_eval_49

Here is the 3D metrics on ScanNet generated by the provided checkpoint using NeuralRecon's evaluation:

Acc ↓ Comp ↓ Chamfer ↓ Prec ↑ Recall ↑ F-Score↑
5.6 10.0 7.80 0.694 0.537 0.604

and using TransformerFusion's evaluation (set MODEL.SINGLE_LAYER_MESH=True to output single layer meshes):

Acc ↓ Comp ↓ Chamfer ↓ Prec ↑ Recall ↑ F-Score↑
4.10 8.66 6.38 0.757 0.588 0.660

ARKit data

To try with your own data captured from ARKit, please refer to NeuralRecon's DEMO.md for more details.

python test_scene.py --cfg ./config/test_scene.yaml \ 
                     DATASET ARKit \ 
                     TEST.PATH ./example_data/ARKit_scan \ 
                     LOADCKPT pretrained/model_000049.ckpt

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{gao2023visfusion,
  title={VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos},
  author={Gao, Huiyu and Mao, Wei and Liu, Miaomiao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={17317--17326},
  year={2023}
}

Acknowledgment

This repository is partly based on the repo NeuralRecon. Many thanks to Jiaming Sun for the great code!

visfusion's People

Contributors

huiyu-gao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.