This repository contains the code for the paper 'CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving' that currently submitted to the IEEE Transactions on Intelligent Vehicles journal for reviewing.
This repository contains the source code of training and testing for CLFT and CLFCN models.
Please note that this repository is still under maintance. Author is focusing on his PhD thsis at the moment and will chean up code and optimize README gradually. You can write to [email protected] for details.
TODO list here: Provide segmentation videos of Waymo Open Dataset for three models campared in paper. CLFT, CLFCN and Panoptic SegFormer
Critical research about camera-and-LiDAR-based semantic object segmentation for autonomous driving significantly benefited from the recent development of deep learning. Specifically, the vision transformer is the novel ground-breaker that successfully brought the multi-head-attention mechanism to computer vision applications. Therefore, we propose a vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation applied to autonomous driving. Our proposal uses the novel progressive-assemble strategy of vision transformers on a double-direction network and then integrates the results in a cross-fusion strategy over the transformer decoder layers. Unlike other works in the literature, our camera-LiDAR fusion transformers have been evaluated in challenging conditions like rain and low illumination, showing robust performance. The paper reports the segmentation results over the vehicle and human classes in different modalities: camera-only, LiDAR-only, and camera-LiDAR fusion. We perform coherent controlled benchmark experiments of CLFT against other networks that are also designed for semantic segmentation. The experiments aim to evaluate the performance of CLFT independently from two perspectives: multimodal sensor fusion and backbone architectures. The quantitative assessments show our CLFT networks yield an improvement of up to 10% for challenging dark-wet conditions when comparing with Fully-Convolutional-Neural-Network-based (FCN) camera-LiDAR fusion neural network. Contrasting to the network with transformer backbone but using single modality input, the all-around improvement is 5-10%.
The framework overview diagram will be available here as soon as the perprint of paper is available.
The experiments were carried out on TalTech HPC. For CLFT and CLFCN, we progrmmed upon pytorch directly and avoid too much high-level apis, thus we believe the code should be compatible with various environments. Here list out the package versions on HPC:
The script 'visual_run.py' will load single camera (PNG) and LiDAR (PKL) file from folder 'test_images', then produce the segmentation result. The 'vehicle' class will be rendered as green color and 'human' class was rendered as red. We provide the example CLFT and FCN models for visualized prediction.
python visua_run.py -m <modality, chocies: 'rgb' 'lidar' 'cross_fusion> -bb dpt
Here is the example of the CLFT segmentation prediction:
python visua_run.py -m <modality, chocies: 'rgb' 'lidar' 'cross_fusion> -bb fcn
Here is the example of the FCN segmentation prediction:
The parameters related to the training and testing all defined in file 'config.json'. Here list some important defination in this file.
- [General][sensors_modality] --> Model modalities, choose 'rgb' 'lidar' or 'cross_fusion'.
- [General][model_timm] --> The backbone of CLFT variants. We proposed base, large, and hybrid in the paper. choose 'vit_base_patch16_384' 'vit_large_patch16_384' 'vit_base_resnet50_384'
- [General][emb_dim] --> The embedded dimension for CLFT models, 768 for base and hybrid, 1024 for large.
- [General][resume_training] --> The flag to resume training from saved path. Set to false for scratch training.
- [Log] --> Place to save the model paths.
- [Dataset][name] --> We provide two datasets, choose 'waymo' or 'iseauto'. The pre-processing parameters of these two datasets are different.
Training.
python3 train.py -bb dpt
If want to resume the training, use the same command but modify the 'resume_training' flag in 'config.json' file.
Testing.
python3 test.py -bb dpt
Training.
python3 train.py -bb fcn
Testing.
python3 test.py -bb fcn