Giter VIP home page Giter VIP logo

traffic-sign-detection-vip2017's Introduction

Traffic Sign Detection under Challenging Conditions

This repository hosts a pipeline for the detection, tracking and classification of Traffic Signs as part of the IEEE Video & Image Processing Cup 2017. The dataset used in this project is the CURE-TSD Dataset which consists video sequences of Traffic signs under real and augmented conditions. This project won the place of 2nd runner-up in the IEEE Video and Image Processing Challenge 2017.

Simplified System Overview:

Introduction

Traffic sign recognition is a multi-class classification problem where the class frequencies are practically random. It deals with a real-time computer vision problem of high practical interest. Now-a-days, it has become an essential component of the Driver Assistance System (DAS) and Unmanned Ground Vehicle (UGV).

Dataset

This traffic sign detection architecture is proposed where the detection is to be done from the CURE-TSD Dataset.

The video sequences in the CURE-TSD dataset are grouped into two classes:

  • Real data correspond to processed versions of sequences acquired from real world. 49 real sequences.

  • Unreal data corresponds to synthesized sequences generated in a virtual environment. 49 unreal sequences.

  • Train-Test Split: 70-30 (34 Training Videos, 15 Test Videos)

  • 300 frames/sequence

  • 12 types of effects

  • 5 different challenge levels

  • Total of 2,989 (49125+49) Real video sequences

  • Total of 2,744 (49115+49) Synthesized video sequences

  • Total No. of Frames: Around 1.72 Million

Challenging Conditions

Algorithm Overview:

An overview of the Traffic Sign Recognition and Classification Pipeline is illustrated in the following figure:

The input to the System is the video frames. The video frames are extracted using a Frame-extractor. The frame extractor then feeds the frames to the Challenge Classifier which utilizes an RCNN to classify the challenge type of the video. The decision of the frame extractor is used to select the appropriate sign-type classifier.

The extracted frames are also fed to the Bounding Box Detector Module which uses an F-RCNN to generate Raw Bounding Boxes of the Region of Interest. A Hybrid Tracker Module is used to keep track of the bounding boxes frame-by-frame.

The region proposals are fed to the Sign Type Classifier which classifies the Traffic Signs into the relevant categories or provides a negative output if the proposal does not contain a sign. This negative feedback is also fed to the Sign Type Classifier.

Frame Extractor

The frame extractor utilizes OpenCV to extract frames from the video sequence. These frames are utilized for training all the subsequent deep learning blocks of the system.

Challenge Classifier

Objectives:

  • Classify the Challenge Type
  • Aid Decision Making of Sign Classifier and Tracker Module
  • Allow application of dynamic image processing on image to remove artifacts

Network Used: RCNN (Recurrent Convolutional Neural Network)

  • 6 Recurrent Layers with Pooling & Dropout in between

Recurrent Layer Structure:

  • 4 Convolutional Layers
  • Activation Layers & Normalization Layers
  • Output of differnt Conv. Layers summed

A few of the 12 challenge type classes were combined to yield higher accuracy in classification:

  • No Challenge + Codec Error + Decolor
  • Gaussian Blur + Lens Blur
  • Rain + Noise

The challenge classifier was trained for a total of 9 classes instead of 12

Bounding Box Detector

Objective:

  • To Locate Region of Interest (ROI)
  • Provide Bounding Boxes around Possible Regions
  • Provide Potential Locations for Tracking

The Bounding Box Detector uses FRCNN (Faster Recurrent Convolutional Neural Networks) for the detection of regions of interest. FRCNNs are used instead of typical RCNNs because they consume less time. The FRCNN contains two modules, the Region Proposal Network (RPN) and fast-RCNN detector. The region proposal network gives some rectangular object proposal and their objectness scores. It tells the detector module where to look at. A Resnet50 network is used both in the RPN and the RCNN as backbone. Maximum overlap threshold used for the RPN is 0.7 and RPN stride is 16.

The following figures illustrates the working principle of the Bounding Box Detector:

  • Only top half of frames are searched
  • Logical Assumption: Traffic Signs placed in upper field of view of driver

  • cropped frame divided into two halves: Left & Right
  • Halves separately passed to FRCNN network

Convolution:

  • Creates Feature Maps
  • Reveals sub-surface features

Output of final convolution passed to the Region Proposal Network (RPN)

  • Sliding Window (3x3) Moves across feature map, Generating Feature array
  • Feature array fed to paired, fully connected networks:
    • Box Regression Network: Outputs co-ordinates of bounding boxes
    • Box Classification Network: Decides whether proposed boxes are ROI

Tracker Module

Objective:

  • Contingency Plan for FRCNN failure
  • Keeps track of potential Regions of Interest

Two Separate Tracker modules are used to improve Robustness:

  • Lucas-Kanade Tracker
  • Kalman Filtering

  • Tracker System compensates for FRCNN dropping boxes
  • Green Box is tracker-predicted position of Box
  • Prediction based on
    • Optical Flow (Pixel Motion)
    • Kalman Filtering

Kalman Filters

Kalman filter is used for tracking traffic signs in the system. Kalman filter works by predicting and correcting the states of a wide range of linear processes.
In the dataset sometimes there are multiple signs in one frame. As FRCNN detects multiple signs at one instance, there are multiple predictions and multiple measurements. Here the challenge is to assign the measurement of the current state to the prediction of the current state which is based upon the estimate of the previous state. The nearest neighbour concept is adopted to solve the problem. As the positions of the traffic signs do not change abruptly from frame to frame, the Euclidean distance is calculated between the points of prediction and measurement of the current frame. It has been assumed that if any distance between a prediction and a measurement is less than 50 pixels, then that measurement belongs to that prediction. And that measurement-prediction pair is used for updating a priori estimate to get a posteriori estimate.

Lucas-Kanade Method for Traffic Sign Tracking

The Lucas-Kanade method assumes that the displacement of the iamge contenst between two nearby instants (frames) is small and aprroximately constant within a neighborhood of the point p under consideration. Thus the optical flow equations can be assumed to hold for all pixels within a centered window.

The optical flow equations contain more equations than unknowns. The Lucas-Kanade method obtains a compromise solution by the least square principle.

Harris corner detection method has been used to detect corners in images and Lucas-Kanade method is used to track those points and the optical flow vectors of those points are received. Then, from FRCNN, the positins of the signs i.e. region of interests (ROIs) in the image are obtained. Then the nearest points for each ROI from Harris corner detection are detected and used their optical flow vectors to estimate the new positions of the ROIs in the next frame.

Traffic Sign Classifier

Objective:

  • Determine whether a region contains a sign or not
  • Classify the incoming bounding boxes
  • Be able to deal with undesired output from the FRCNN�

Network: CNN (Convolutional Neural Net)

Classes: 14 Sign Types + 9 Extra Classes

Traffic Signs:

Traffic Signs

  1. Speed Limit
  2. Goods Vehicles
  3. No Overtaking
  4. No Stopping
  5. No Parking
  6. Stop
  7. Bicycle
  8. Hump
  9. No Left
  10. No Right
  11. Priority To
  12. No entry
  13. Yield
  14. Parking

Why Extra Classes?

  • FRCNN has a tendency to detect unwanted elements as potential ROI (doors, windows, rims etc.)
  • Certain road signs / sign like objects exist which are not labeled in the ground truth
  • Finally, the CNN needs to decide whether a frame is ROI or not to help the tracker take decision.

Extra Classes:

  1. Tree Leaves
  2. Miscellaneous Road Signs
  3. Vertical Pole-like Structures
  4. Car Parts and windows
  5. Car Tires
  6. House windows
  7. Texture Fills
  8. Horizontal Pole-like structures
  9. Diagonal Pole-like structures

Demonstration

Accuracy Metrics

Limitations & Further Improvement

  • Implement Pre-Processing: Reduce Challenging artifacts
  • FRCNN (Bounding Box Detector):
    • Use Single FRCNN instead of splitting: Effectively speed up the program
    • More Anchor Box Sizes
    • Take Larger Training Sample
    • Tune FRCNN Parameters
    • Hard Negative Mining: Retrain Model with more negative samples, reinforces Model
  • Tracker System:
    • Implement advanced Kalman Filter that takes into account varying acceleration
    • Tune parameters (Life Expectancy, Box Decay rate: Will filter bad boxes)
  • Sign Classifier:
    • More Sign Classes: to account for sign like objects that were detected

traffic-sign-detection-vip2017's People

Contributors

suhailnajeeb avatar

Stargazers

 avatar  avatar Nezahat Korkmaz avatar James Zhang avatar Amr Alfayoumy avatar  avatar Minzhen Li avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.