Giter VIP home page Giter VIP logo

sygnet-for-autonomous-driving-recognition's Introduction

SYGNet-for-Real-time-Driving-Scene-Parsing

Executive summary

In this research, the SYGNet is proposed to further strengthen the perception ability of autonomous driving under complicated road conditions. SYGNet includes feature extraction module and SVD-YOLO GhostNet module. SVD-YOLO GhostNet module combines Singular Value Decomposition (SVD), You Only Look Once (YOLO) and GhostNet. In the feature extraction module, we propose an algorithm based on VoxelNet to extract point cloud features and image features. In SVD-YOLO GhostNet module, the image data is decomposed by SVD, and data with stronger spatial and environmental characteristics is obtained. YOLOv3 are used to obtain the future map, then convert to GhostNet, which can generate more feature maps with fewer parameters. In the process of system model evaluation, we use public data sets to do experiments and the results show that the SYGNet proposed in this paper is more robust and can further enhance the accuracy of autonomous driving perception.

Code Organization

All code are written in Python3

Model

SYGNet is composed of two modules: feature extraction module and SVD-YOLO GhostNet module. The first module is used to extract important perceptual scene features. The second module is responsible for using the model and training parameters to obtain high accuracy perceptual recognition results.


All instances of "FesygNet" have been replaced with "SYGNet" as per your request.

  • Architecture diagram of feature extraction module.png: This figure illustrates the architecture for feature extraction module, which is the first module in SYGNet. Feature extraction learning includes two branches: LiDAR Stream and Camera Stream, which extract point cloud features and image features respectively. The VFE layer designed by VoxelNet is inspired by PointNet. The VFE layer is composed of a large number of stacked full connection layers (FCN). Through the mapping operation, the features of the internal points of each voxel grid can be aggregated to encode the surface shape information inside the voxel. Here, FCN is composed of linear layer, batch normalization (BN) layer and ReLU layer, When a point wise feature representation is obtained, element by element maximization is used (Element-wise MaxPooling) to obtain the local aggregation features. Finally, in order to strengthen the point level features, the point level features output by FCN layer are spliced to obtain the final point level combination features. The voxel feature extractor designed in this paper consists of two VEF layers. For the camera branch, a 2D convolution neural network similar to ResNet is designed to extract the point level features Take the image features corresponding to the point cloud data, which can capture deeper image texture features to achieve better modal fusion. Then the result of two streams will merge in feature fusion stage, where has an attention fusion layer to deal with the combined data. After that, it will convert to Region Proposal Network (RPN) in 3D box estimation stage. Finally, the future map with extracted feature will be generated.

  • Architecture diagram of SVD-YOLO3 structure.png: This figure illustrates the architecture for SVD-YOLO3 structure in the second module in FesygNet. After finishing the operations of SVD-YOLO algorithm in this module, I connect a GhostNet at the end of the module. The GhostNet is the plug-and-play module that converts the original model into a more compact model while maintaining comparable performance.

Data

We use KITTI dataset which is available in this link (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)

Experiment figure

  • Figure1-ablation.png: This figure illustrates the SVD-YOLOv3 algorithm has the best performance in the whole test process compared with the performance of YOLOv3, RCNN, SVD-YOLOv2, SVD-RCNN and SVD-YOLOv3. Hence, we decide to use SVD-YOLOv3 as the beginning component of the second module in FesygNet.

Figure1-ablation

  • Figure2-qualitative.png: This figure shows the qualitative results on the KITTI data set. Different colors mark different categories of objects recognized. We can see that our perceptual recognition effect is good, and we have good recognition processing on the edges of two different types of objects.

Figure2-qualitative

Experiment table

  • Table1.png: This table presents the results of ablation experiments. We can see that the overall model is better than the other two components, and the GhostNet model is also better than YOLOv3 in most cases, which proves that the combination of components in our system model can significantly improve the perceptual accuracy.
Table1
  • Table2.png: This table demonstrates the SVD-YOLOv3 algorithm has the best performance in five different mode (Cars, People, Edge, Side and Light mode) in KITTI data set, which further confirmed that SVD-YOLOv3 has good performance in perceptual recognition. Hence I decide to use SVD-YOLOv3 in this module.
Table2
  • Table3.png: This table show the comparison with different methods in three different scene modes (cars, human and edge mode), it can be found that the FesygNet has good performance in terms of reliability, which verifies the reliability under the same data set.

sygnet-for-autonomous-driving-recognition's People

Contributors

wanghewei16 avatar keviniscaiji avatar hutton0214 avatar kaiwengong avatar zhubolun avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.