Traffic Sign Detection under Challenging Conditions

This repository hosts a pipeline for the detection, tracking and classification of Traffic Signs as part of the IEEE Video & Image Processing Cup 2017. The dataset used in this project is the CURE-TSD Dataset which consists video sequences of Traffic signs under real and augmented conditions. This project won the place of 2nd runner-up in the IEEE Video and Image Processing Challenge 2017.

Simplified System Overview:

Introduction

Traffic sign recognition is a multi-class classification problem where the class frequencies are practically random. It deals with a real-time computer vision problem of high practical interest. Now-a-days, it has become an essential component of the Driver Assistance System (DAS) and Unmanned Ground Vehicle (UGV).

Dataset

This traffic sign detection architecture is proposed where the detection is to be done from the CURE-TSD Dataset.

The video sequences in the CURE-TSD dataset are grouped into two classes:

Real data correspond to processed versions of sequences acquired from real world. 49 real sequences.
Unreal data corresponds to synthesized sequences generated in a virtual environment. 49 unreal sequences.
Train-Test Split: 70-30 (34 Training Videos, 15 Test Videos)
300 frames/sequence
12 types of effects
5 different challenge levels
Total of 2,989 (49125+49) Real video sequences
Total of 2,744 (49115+49) Synthesized video sequences
Total No. of Frames: Around 1.72 Million

Challenging Conditions

Algorithm Overview:

An overview of the Traffic Sign Recognition and Classification Pipeline is illustrated in the following figure:

The input to the System is the video frames. The video frames are extracted using a Frame-extractor. The frame extractor then feeds the frames to the Challenge Classifier which utilizes an RCNN to classify the challenge type of the video. The decision of the frame extractor is used to select the appropriate sign-type classifier.

The extracted frames are also fed to the Bounding Box Detector Module which uses an F-RCNN to generate Raw Bounding Boxes of the Region of Interest. A Hybrid Tracker Module is used to keep track of the bounding boxes frame-by-frame.

The region proposals are fed to the Sign Type Classifier which classifies the Traffic Signs into the relevant categories or provides a negative output if the proposal does not contain a sign. This negative feedback is also fed to the Sign Type Classifier.

Frame Extractor

The frame extractor utilizes OpenCV to extract frames from the video sequence. These frames are utilized for training all the subsequent deep learning blocks of the system.

Challenge Classifier

Objectives:

Classify the Challenge Type
Aid Decision Making of Sign Classifier and Tracker Module
Allow application of dynamic image processing on image to remove artifacts

Network Used: RCNN (Recurrent Convolutional Neural Network)

6 Recurrent Layers with Pooling & Dropout in between

Recurrent Layer Structure:

4 Convolutional Layers
Activation Layers & Normalization Layers
Output of differnt Conv. Layers summed

A few of the 12 challenge type classes were combined to yield higher accuracy in classification:

No Challenge + Codec Error + Decolor
Gaussian Blur + Lens Blur
Rain + Noise

The challenge classifier was trained for a total of 9 classes instead of 12

Bounding Box Detector

Objective:

To Locate Region of Interest (ROI)
Provide Bounding Boxes around Possible Regions
Provide Potential Locations for Tracking

The Bounding Box Detector uses FRCNN (Faster Recurrent Convolutional Neural Networks) for the detection of regions of interest. FRCNNs are used instead of typical RCNNs because they consume less time. The FRCNN contains two modules, the Region Proposal Network (RPN) and fast-RCNN detector. The region proposal network gives some rectangular object proposal and their objectness scores. It tells the detector module where to look at. A Resnet50 network is used both in the RPN and the RCNN as backbone. Maximum overlap threshold used for the RPN is 0.7 and RPN stride is 16.

The following figures illustrates the working principle of the Bounding Box Detector:

Only top half of frames are searched
Logical Assumption: Traffic Signs placed in upper field of view of driver

cropped frame divided into two halves: Left & Right
Halves separately passed to FRCNN network

Convolution:

Creates Feature Maps
Reveals sub-surface features

Output of final convolution passed to the Region Proposal Network (RPN)

Sliding Window (3x3) Moves across feature map, Generating Feature array
Feature array fed to paired, fully connected networks:
- Box Regression Network: Outputs co-ordinates of bounding boxes
- Box Classification Network: Decides whether proposed boxes are ROI

Tracker Module

Objective:

Contingency Plan for FRCNN failure
Keeps track of potential Regions of Interest

Two Separate Tracker modules are used to improve Robustness:

Lucas-Kanade Tracker
Kalman Filtering

Tracker System compensates for FRCNN dropping boxes
Green Box is tracker-predicted position of Box
Prediction based on
- Optical Flow (Pixel Motion)
- Kalman Filtering

Kalman Filters

Kalman filter is used for tracking traffic signs in the system. Kalman filter works by predicting and correcting the states of a wide range of linear processes.
In the dataset sometimes there are multiple signs in one frame. As FRCNN detects multiple signs at one instance, there are multiple predictions and multiple measurements. Here the challenge is to assign the measurement of the current state to the prediction of the current state which is based upon the estimate of the previous state. The nearest neighbour concept is adopted to solve the problem. As the positions of the traffic signs do not change abruptly from frame to frame, the Euclidean distance is calculated between the points of prediction and measurement of the current frame. It has been assumed that if any distance between a prediction and a measurement is less than 50 pixels, then that measurement belongs to that prediction. And that measurement-prediction pair is used for updating a priori estimate to get a posteriori estimate.

Lucas-Kanade Method for Traffic Sign Tracking

The Lucas-Kanade method assumes that the displacement of the iamge contenst between two nearby instants (frames) is small and aprroximately constant within a neighborhood of the point p under consideration. Thus the optical flow equations can be assumed to hold for all pixels within a centered window.

The optical flow equations contain more equations than unknowns. The Lucas-Kanade method obtains a compromise solution by the least square principle.

Harris corner detection method has been used to detect corners in images and Lucas-Kanade method is used to track those points and the optical flow vectors of those points are received. Then, from FRCNN, the positins of the signs i.e. region of interests (ROIs) in the image are obtained. Then the nearest points for each ROI from Harris corner detection are detected and used their optical flow vectors to estimate the new positions of the ROIs in the next frame.

Traffic Sign Classifier

Objective:

Determine whether a region contains a sign or not
Classify the incoming bounding boxes
Be able to deal with undesired output from the FRCNN�

Network: CNN (Convolutional Neural Net)

Classes: 14 Sign Types + 9 Extra Classes

Traffic Signs:

Traffic Signs

Speed Limit
Goods Vehicles
No Overtaking
No Stopping
No Parking
Stop
Bicycle
Hump
No Left
No Right
Priority To
No entry
Yield
Parking

Why Extra Classes?

FRCNN has a tendency to detect unwanted elements as potential ROI (doors, windows, rims etc.)
Certain road signs / sign like objects exist which are not labeled in the ground truth
Finally, the CNN needs to decide whether a frame is ROI or not to help the tracker take decision.

Extra Classes:

Tree Leaves
Miscellaneous Road Signs
Vertical Pole-like Structures
Car Parts and windows
Car Tires
House windows
Texture Fills
Horizontal Pole-like structures
Diagonal Pole-like structures

Demonstration

Accuracy Metrics

Limitations & Further Improvement

Implement Pre-Processing: Reduce Challenging artifacts
FRCNN (Bounding Box Detector):
- Use Single FRCNN instead of splitting: Effectively speed up the program
- More Anchor Box Sizes
- Take Larger Training Sample
- Tune FRCNN Parameters
- Hard Negative Mining: Retrain Model with more negative samples, reinforces Model
Tracker System:
- Implement advanced Kalman Filter that takes into account varying acceleration
- Tune parameters (Life Expectancy, Box Decay rate: Will filter bad boxes)
Sign Classifier:
- More Sign Classes: to account for sign like objects that were detected

suhailnajeeb / traffic-sign-detection-vip2017 Goto Github PK

traffic-sign-detection-vip2017's Introduction

Traffic Sign Detection under Challenging Conditions

Simplified System Overview:

Introduction

Dataset

Challenging Conditions

Algorithm Overview:

Frame Extractor

Challenge Classifier

Bounding Box Detector

Tracker Module

Kalman Filters

Lucas-Kanade Method for Traffic Sign Tracking

Traffic Sign Classifier

Traffic Signs

Why Extra Classes?

Demonstration

Accuracy Metrics

Limitations & Further Improvement

traffic-sign-detection-vip2017's People

Contributors

Stargazers

Recommend Projects

Recommend Topics

Recommend Org