Giter VIP home page Giter VIP logo

traffic_sign_classifier's Introduction

Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

  • Load the data set (see below for links to the project data set)
  • Explore, summarize and visualize the data set
  • Design, train and test a model architecture
  • Use the model to make predictions on new images
  • Analyze the softmax probabilities of the new images
  • Summarize the results with a written report

Data Set Summary & Exploration

I used the pandas and numpy to analyze the data set.

  • The size of training set is 34799
  • The size of the validation set is 4410
  • The size of test set is 12630
  • The shape of a traffic sign image is (32, 32, 3)
  • The number of unique classes/labels in the data set is 43

Below are the distributions of train, val and test dataset:

alt text

Examples of each class

alt text

Distribution of training dataset

alt text

Distribution of validation dataset

alt text

Distribution of test dataset

Design and Test a Model Architecture

Data Normalization

I tested several ways of data normalization including transfer images into range of -0.5 to 0.5, range of 0 to 1, zero mean equal standard deviation and grayscale.

I compared the performances with different methods of data normalization and show the validation accuracy during training based on LeNet below:

alt text

Grayscale input (accuracy around 92%)

alt text

Transfer images into range of 0 to 1 (accuracy around 90%)

alt text

Transfer images into range of -0.5 to 0.5 (accuracy around 75%)

alt text

Rescale the images by zero mean and equal standard deviation (accuracy around 93%)

So I use the method of zero mean and equal standard deviation to normalize the images.

Training hyperparameters

  • Batch size: 128
  • Number of epochs trained: 20
  • Type of optimizer: Adam tf.train.AdamOptimizer
  • Learning rate: 0.001

Then, based on those training hyperparameters, I saved the model which gets Validation Accuracy = 0.936 on the validation dataset.

Besides, I tried to exploit the cross entropy loss with weights, which is to deal with the imbalanced dataset. At first I calculate the median frequence of each class and use tf.losses.softmax_cross_entropy with the corresponding weights according to the classes selected in each batch. Below is the result of LeNet with weighted loss:

alt text

LeNet with weighted loss (accuracy around 94%)

Then, based on the saved model, I evaluate its performance of the test dataset and got Test Accuracy = 0.928. Below is the confusion matrix of each class on the test dataset (for the drawing, please see the codes):

alt text

Confusion matrix of test dataset based on the trained LeNet

For further testing the performance on the web images, I downloaded five traffic sign images shown as:

alt text

Downloaded test images from website

Based on the saved LeNet model, the inference results are displayed as: alt text

Classified result based on the trained LeNet model

It can be seen that 3 out of 5 are correctly classified.

Further, I demonstrated the top 5 predicted classes with the probabilities of those images as: alt text

Classified result of top 5 classes based on the trained LeNet model

Discussion about the performance on the downloaded images

As we can see that, since the original downloaded images from website are of big size, after resizing to (32x32x3), the symbols inside the sign cannot be seen very clearly in the image, e.g. slippery road. This can be one reason which influences the performance of the network. Another reason is that there is other information such as copyright characters in the orginal images, which can be condisered as noise, and also influence the classification.

As comparing with the performance of the provided test images, the accuracy of downloaded 5 images is lower than that (0.6 vs 0.928). The reason is that the generalizaiton capability of the network on the website images is lower than the test images collected from the same domain as the train and validation images.

Feature map demonstration

In order to obtain the intuition that what the network learnt, I got the feature maps of the first Conv2D layer as: alt text

Feature maps of the first Conv2D layer of the trained network on one image from website

As we can see, the active region is the borders of the traffic sign and the symbol inside it.


Improve the architecture

Next, I would like to improve the architecture for learning a more powerful feature for the traffic sign classification. Since I run the training on my laptop with the GPU GeForce GTX 960M, I should not use big network architectures.

Inspired from the utilized network in blog, I modified a little bit and achieved the performance of validation accuracy with 98.6% and test accuracy with 97.2%. Note that the training is without any data augmentation.

There are two modifications, the first one is dimension reduction of each scale of features to the same size (128) of embedded feature, and then concatenate them as a feature which combines each level of feature maps. In this way, the combined feature is just with the size 3*128=384, which is far more less parameters than this. The second one is further reduction of the number of parameters of the last 2 fully connected layers, which are 256 and 128, respectively.

Based on that, this network can be handled for training on my laptop.

Below are some results to display:

alt text

Validation accuracy during training SinNet

alt text

Confusion matrix of SinNet on the test dataset

My final model (SinNet) consisted of the following layers:

Layer Description
Input X 32x32x3 RGB image
Convolution 1x1 1x1 stride, same padding, outputs 32x32x3
Convolution 3x3 1x1 stride, valid padding, outputs 30x30x32
RELU
Convolution 3x3 1x1 stride, valid padding, outputs 28x28x32
RELU
Max pooling 2x2 stride, outputs 14x14x32
Dropout => X 0.5
Dimension reduction => X_1 6272=>128
X => Convolution 3x3 1x1 stride, valid padding, outputs 12x12x64
RELU
Max pooling 2x2 stride, outputs 6x6x64
Dropout => X 0.5
Dimension reduction => X_2 2304=>128
X => Convolution 3x3 1x1 stride, valid padding, outputs 4x4x128
RELU
Dropout => X 0.5
Dimension reduction => X_3 2048=>128
Concatenate(X_1, X_2, X_3) 384
Fully connected 384=>256
Fully connected 256=>128
Fully connected 128=>43
Softmax 43

Summary

  1. Combining the different level of features can be helpful for the network classification.
  2. A linear transformation of input image with 1x1x3x3 Conv2D kernel may learn a proper color space for the classification.
  3. Data augmentation and other training tricks, such as learning rate decay, may further improve the performance of SinNet.

traffic_sign_classifier's People

Contributors

andrewpaster avatar antorsae avatar awbrown90 avatar baumanab avatar brok-bucholtz avatar citlaligm avatar davidawad avatar domluna avatar dsilver829 avatar josemacenteno avatar mvirgo avatar ryan-keenan avatar swwelch avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.