CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving

This repository contains the code for the paper 'CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving' that currently submitted to the IEEE Transactions on Intelligent Vehicles journal for reviewing.

This repository contains the source code of training and testing for CLFT and CLFCN models.

Please note that this repository is still under maintance. Author is focusing on his PhD thsis at the moment and will chean up code and optimize README gradually. You can write to [email protected] for details.

TODO list here: Provide segmentation videos of Waymo Open Dataset for three models campared in paper. CLFT, CLFCN and Panoptic SegFormer

Abstract

Critical research about camera-and-LiDAR-based semantic object segmentation for autonomous driving significantly benefited from the recent development of deep learning. Specifically, the vision transformer is the novel ground-breaker that successfully brought the multi-head-attention mechanism to computer vision applications. Therefore, we propose a vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation applied to autonomous driving. Our proposal uses the novel progressive-assemble strategy of vision transformers on a double-direction network and then integrates the results in a cross-fusion strategy over the transformer decoder layers. Unlike other works in the literature, our camera-LiDAR fusion transformers have been evaluated in challenging conditions like rain and low illumination, showing robust performance. The paper reports the segmentation results over the vehicle and human classes in different modalities: camera-only, LiDAR-only, and camera-LiDAR fusion. We perform coherent controlled benchmark experiments of CLFT against other networks that are also designed for semantic segmentation. The experiments aim to evaluate the performance of CLFT independently from two perspectives: multimodal sensor fusion and backbone architectures. The quantitative assessments show our CLFT networks yield an improvement of up to 10% for challenging dark-wet conditions when comparing with Fully-Convolutional-Neural-Network-based (FCN) camera-LiDAR fusion neural network. Contrasting to the network with transformer backbone but using single modality input, the all-around improvement is 5-10%.

Method

The framework overview diagram will be available here as soon as the perprint of paper is available.

Installation

The experiments were carried out on TalTech HPC. For CLFT and CLFCN, we progrmmed upon pytorch directly and avoid too much high-level apis, thus we believe the code should be compatible with various environments. Here list out the package versions on HPC:

Dataset

RUN

The script 'visual_run.py' will load single camera (PNG) and LiDAR (PKL) file from folder 'test_images', then produce the segmentation result. The 'vehicle' class will be rendered as green color and 'human' class was rendered as red. We provide the example CLFT and FCN models for visualized prediction.

CLFT

python visua_run.py -m <modality, chocies: 'rgb' 'lidar' 'cross_fusion> -bb dpt

Here is the example of the CLFT segmentation prediction:

FCN

python visua_run.py -m <modality, chocies: 'rgb' 'lidar' 'cross_fusion> -bb fcn

Here is the example of the FCN segmentation prediction:

Training and Testing

The parameters related to the training and testing all defined in file 'config.json'. Here list some important defination in this file.

[General][sensors_modality] --> Model modalities, choose 'rgb' 'lidar' or 'cross_fusion'.
[General][model_timm] --> The backbone of CLFT variants. We proposed base, large, and hybrid in the paper. choose 'vit_base_patch16_384' 'vit_large_patch16_384' 'vit_base_resnet50_384'
[General][emb_dim] --> The embedded dimension for CLFT models, 768 for base and hybrid, 1024 for large.
[General][resume_training] --> The flag to resume training from saved path. Set to false for scratch training.
[Log] --> Place to save the model paths.
[Dataset][name] --> We provide two datasets, choose 'waymo' or 'iseauto'. The pre-processing parameters of these two datasets are different.

CLFT

Training.

python3 train.py -bb dpt

If want to resume the training, use the same command but modify the 'resume_training' flag in 'config.json' file.

Testing.

python3 test.py -bb dpt

FCN

Training.

python3 train.py -bb fcn

Testing.

python3 test.py -bb fcn

claud1234 / fcn_transformer_object_segmentation Goto Github PK

fcn_transformer_object_segmentation's Introduction

CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving

Abstract

Method

Installation

Dataset

RUN

CLFT

FCN

Training and Testing

CLFT

FCN

TO BE CONTINUE....

fcn_transformer_object_segmentation's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent