Giter VIP home page Giter VIP logo

fcn_transformer_object_segmentation's Introduction

CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving

This repository contains the code for the paper 'CLFT: Camera-LiDAR Fusion Transformer for Semantic Segmentation in Autonomous Driving' that currently submitted to the IEEE Transactions on Intelligent Vehicles journal for reviewing.

This repository contains the source code of training and testing for CLFT and CLFCN models.

Please note that this repository is still under maintance. Author is focusing on his PhD thsis at the moment and will chean up code and optimize README gradually. You can write to [email protected] for details.

TODO list here: Provide segmentation videos of Waymo Open Dataset for three models campared in paper. CLFT, CLFCN and Panoptic SegFormer

Abstract

Critical research about camera-and-LiDAR-based semantic object segmentation for autonomous driving significantly benefited from the recent development of deep learning. Specifically, the vision transformer is the novel ground-breaker that successfully brought the multi-head-attention mechanism to computer vision applications. Therefore, we propose a vision-transformer-based network to carry out camera-LiDAR fusion for semantic segmentation applied to autonomous driving. Our proposal uses the novel progressive-assemble strategy of vision transformers on a double-direction network and then integrates the results in a cross-fusion strategy over the transformer decoder layers. Unlike other works in the literature, our camera-LiDAR fusion transformers have been evaluated in challenging conditions like rain and low illumination, showing robust performance. The paper reports the segmentation results over the vehicle and human classes in different modalities: camera-only, LiDAR-only, and camera-LiDAR fusion. We perform coherent controlled benchmark experiments of CLFT against other networks that are also designed for semantic segmentation. The experiments aim to evaluate the performance of CLFT independently from two perspectives: multimodal sensor fusion and backbone architectures. The quantitative assessments show our CLFT networks yield an improvement of up to 10% for challenging dark-wet conditions when comparing with Fully-Convolutional-Neural-Network-based (FCN) camera-LiDAR fusion neural network. Contrasting to the network with transformer backbone but using single modality input, the all-around improvement is 5-10%.

Method

The framework overview diagram will be available here as soon as the perprint of paper is available.

Installation

The experiments were carried out on TalTech HPC. For CLFT and CLFCN, we progrmmed upon pytorch directly and avoid too much high-level apis, thus we believe the code should be compatible with various environments. Here list out the package versions on HPC:

Dataset

RUN

The script 'visual_run.py' will load single camera (PNG) and LiDAR (PKL) file from folder 'test_images', then produce the segmentation result. The 'vehicle' class will be rendered as green color and 'human' class was rendered as red. We provide the example CLFT and FCN models for visualized prediction.

CLFT

python visua_run.py -m <modality, chocies: 'rgb' 'lidar' 'cross_fusion> -bb dpt

Here is the example of the CLFT segmentation prediction:

dpt_seg_visual

FCN

python visua_run.py -m <modality, chocies: 'rgb' 'lidar' 'cross_fusion> -bb fcn

Here is the example of the FCN segmentation prediction:

Training and Testing

The parameters related to the training and testing all defined in file 'config.json'. Here list some important defination in this file.

  • [General][sensors_modality] --> Model modalities, choose 'rgb' 'lidar' or 'cross_fusion'.
  • [General][model_timm] --> The backbone of CLFT variants. We proposed base, large, and hybrid in the paper. choose 'vit_base_patch16_384' 'vit_large_patch16_384' 'vit_base_resnet50_384'
  • [General][emb_dim] --> The embedded dimension for CLFT models, 768 for base and hybrid, 1024 for large.
  • [General][resume_training] --> The flag to resume training from saved path. Set to false for scratch training.
  • [Log] --> Place to save the model paths.
  • [Dataset][name] --> We provide two datasets, choose 'waymo' or 'iseauto'. The pre-processing parameters of these two datasets are different.

CLFT

Training.

python3 train.py -bb dpt

If want to resume the training, use the same command but modify the 'resume_training' flag in 'config.json' file.

Testing.

python3 test.py -bb dpt

FCN

Training.

python3 train.py -bb fcn

Testing.

python3 test.py -bb fcn

TO BE CONTINUE....

fcn_transformer_object_segmentation's People

Contributors

claud1234 avatar bellonemauro avatar malayjerdi avatar

Stargazers

 avatar Kevin Patel avatar

Watchers

 avatar Toomas Tahves avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.