Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Data Set Summary & Exploration

I used the pandas and numpy to analyze the data set.

The size of training set is 34799
The size of the validation set is 4410
The size of test set is 12630
The shape of a traffic sign image is (32, 32, 3)
The number of unique classes/labels in the data set is 43

Below are the distributions of train, val and test dataset:

Examples of each class

Distribution of training dataset

Distribution of validation dataset

Distribution of test dataset

Design and Test a Model Architecture

Data Normalization

I tested several ways of data normalization including transfer images into range of -0.5 to 0.5, range of 0 to 1, zero mean equal standard deviation and grayscale.

I compared the performances with different methods of data normalization and show the validation accuracy during training based on LeNet below:

Grayscale input (accuracy around 92%)

Transfer images into range of 0 to 1 (accuracy around 90%)

Transfer images into range of -0.5 to 0.5 (accuracy around 75%)

Rescale the images by zero mean and equal standard deviation (accuracy around 93%)

So I use the method of zero mean and equal standard deviation to normalize the images.

Training hyperparameters

Batch size: 128
Number of epochs trained: 20
Type of optimizer: Adam tf.train.AdamOptimizer
Learning rate: 0.001

Then, based on those training hyperparameters, I saved the model which gets Validation Accuracy = 0.936 on the validation dataset.

Besides, I tried to exploit the cross entropy loss with weights, which is to deal with the imbalanced dataset. At first I calculate the median frequence of each class and use tf.losses.softmax_cross_entropy with the corresponding weights according to the classes selected in each batch. Below is the result of LeNet with weighted loss:

LeNet with weighted loss (accuracy around 94%)

Then, based on the saved model, I evaluate its performance of the test dataset and got Test Accuracy = 0.928. Below is the confusion matrix of each class on the test dataset (for the drawing, please see the codes):

Confusion matrix of test dataset based on the trained LeNet

For further testing the performance on the web images, I downloaded five traffic sign images shown as:

Downloaded test images from website

Based on the saved LeNet model, the inference results are displayed as:

Classified result based on the trained LeNet model

It can be seen that 3 out of 5 are correctly classified.

Further, I demonstrated the top 5 predicted classes with the probabilities of those images as:

Classified result of top 5 classes based on the trained LeNet model

Discussion about the performance on the downloaded images

As we can see that, since the original downloaded images from website are of big size, after resizing to (32x32x3), the symbols inside the sign cannot be seen very clearly in the image, e.g. slippery road. This can be one reason which influences the performance of the network. Another reason is that there is other information such as copyright characters in the orginal images, which can be condisered as noise, and also influence the classification.

As comparing with the performance of the provided test images, the accuracy of downloaded 5 images is lower than that (0.6 vs 0.928). The reason is that the generalizaiton capability of the network on the website images is lower than the test images collected from the same domain as the train and validation images.

Feature map demonstration

In order to obtain the intuition that what the network learnt, I got the feature maps of the first Conv2D layer as:

Feature maps of the first Conv2D layer of the trained network on one image from website

As we can see, the active region is the borders of the traffic sign and the symbol inside it.

Improve the architecture

Next, I would like to improve the architecture for learning a more powerful feature for the traffic sign classification. Since I run the training on my laptop with the GPU GeForce GTX 960M, I should not use big network architectures.

Inspired from the utilized network in blog, I modified a little bit and achieved the performance of validation accuracy with 98.6% and test accuracy with 97.2%. Note that the training is without any data augmentation.

There are two modifications, the first one is dimension reduction of each scale of features to the same size (128) of embedded feature, and then concatenate them as a feature which combines each level of feature maps. In this way, the combined feature is just with the size 3*128=384, which is far more less parameters than this. The second one is further reduction of the number of parameters of the last 2 fully connected layers, which are 256 and 128, respectively.

Based on that, this network can be handled for training on my laptop.

Below are some results to display:

Validation accuracy during training SinNet

Confusion matrix of SinNet on the test dataset

My final model (SinNet) consisted of the following layers:

Layer	Description
Input X	32x32x3 RGB image
Convolution 1x1	1x1 stride, same padding, outputs 32x32x3
Convolution 3x3	1x1 stride, valid padding, outputs 30x30x32
RELU
Convolution 3x3	1x1 stride, valid padding, outputs 28x28x32
RELU
Max pooling	2x2 stride, outputs 14x14x32
Dropout => X	0.5
Dimension reduction => X_1	6272=>128
X => Convolution 3x3	1x1 stride, valid padding, outputs 12x12x64
RELU
Max pooling	2x2 stride, outputs 6x6x64
Dropout => X	0.5
Dimension reduction => X_2	2304=>128
X => Convolution 3x3	1x1 stride, valid padding, outputs 4x4x128
RELU
Dropout => X	0.5
Dimension reduction => X_3	2048=>128
Concatenate(X_1, X_2, X_3)	384
Fully connected	384=>256
Fully connected	256=>128
Fully connected	128=>43
Softmax	43

Summary

Combining the different level of features can be helpful for the network classification.
A linear transformation of input image with 1x1x3x3 Conv2D kernel may learn a proper color space for the classification.
Data augmentation and other training tricks, such as learning rate decay, may further improve the performance of SinNet.

jtpils / traffic_sign_classifier Goto Github PK

traffic_sign_classifier's Introduction

Traffic Sign Recognition

Data Set Summary & Exploration

Design and Test a Model Architecture

Data Normalization

Training hyperparameters

Discussion about the performance on the downloaded images

Feature map demonstration

Improve the architecture

Summary

traffic_sign_classifier's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent