Giter VIP home page Giter VIP logo

street-view-house-numbers-svhn-detection-and-classification-using-cnn's Introduction

Street View House Numbers (SVHN) Detection and Classification using CNN

This is my (not very successful) attempt to do both detection and classification of numbers in SVHN dataset using 2 CNNs.

This project contains 2 parts:

  1. Using CNN to do bounding box regression to find the top, left, width and height of the bounding box which contains all the digits in a given image
  2. Use the bounding box from step 1 to extract a part of the image with only digits and use a another multi-output CNN to classify the digits of the cut image.

My original intension was that this would improve the accuracy compared to the case where we just feed the entire svhn image into the CNN and let the CNN predict all the digits in the image. But the entire pipeline gave me only 51% accuracy where all the digits match exactly and individual digit accuracies of 71%, 65%, 84% and 98% for the 1st, 2nd, 3rd and 4th digit respectively (we only consider max of 4-digit prediction).

The detection and classification pipeline:

  1. Get input image (so far, this has only been tested on test dataset images of SVHN dataset)
  2. Resize to 64x64, convert to greyscale and normalize the image
  3. Feed processed image into detection CNN to get bounding box
  4. Re-scale bounding box to image's original size
  5. Cut the bounding box alone and resize to 64x64
  6. Feed the image we just cut and resized to the classification CNN to get digits
  7. Convert CNN predictions into an understandable format
  8. Output digits

Examples where the detection and classification pipeline worked well:

The bounding boxes in the images below are coordinates predicted by the detection CNN and the number prediction is done by the classification CNN.

Image Predicted value Actual value
working_img1 1522 1502
working_img2 135 135
working_img3 861 861
working_img4 348 348
working_img5 114 114
working_img6 23 23

Examples where the detection and classification pipeline did not work well:

The bounding boxes in the images below are coordinates predicted by the detection CNN and the number prediction is done by the classification CNN.

Image Predicted value Actual value
not_working_img1 32 863
not_working_img2 6 7
not_working_img3 8 26
not_working_img4 1 184
not_working_img5 1410 44
not_working_img6 27 6

Improvements that can be made:

  • I did not want to use YOLO for such a simple task, but detection CNN could be improved
  • Augmenting the dataset by shifting the actual bounding boxes for training the detection CNN slighlty improved the accuracy (+5%) more augmentation can be exlored
  • Same can be done for classification CNN - but it was not done in this project

Project files:

construct_datasets.py
Uses the images downloaded from SVHN dataset website website along with the .mat files describing the bounding box to build a single table for each test and train for easy use in other files. If you don't want to run this file, download it .h5 files from the google drive link below.

train_digit_classification.py
Uses the processed .h5 files in data folder to train a classification CNN.

train_digit_detection.py
Uses the processed .h5 files in data folder to train a detection CNN.

combi_models.py
After training both networks, this file uses both networks to implement all the steps described in the pipeline section above.

Download weights and processed datasets from here:

Weights for both CNNs and .h5 files for train and test datasets are available in the link below:

CNN Weights: https://drive.google.com/open?id=1vv7vzqzGjjUqjcCZYeX_NaGrqSU1Ami2
Dataset: https://drive.google.com/open?id=1KfVqQHjimQnXdzsCtQurwmTSpMe2mmA7

Environment

Python 3.5
All code was run on Amazon EC2 Deep Learning AMI version 7 (ami-139a476c)
I also tested this on my local Windows 10 PC with the following libraries:

  • Numpy 1.13.1
  • Keras 2.0.5
  • Pandas 0.20.3
  • OpenCV 3.2.0
  • TensorFlow 1.2.1 (with GPU support)

street-view-house-numbers-svhn-detection-and-classification-using-cnn's People

Contributors

pavitrakumar78 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

street-view-house-numbers-svhn-detection-and-classification-using-cnn's Issues

Error

ValueError: 'img_name' is both an index level and a column label, which is ambiguous.

I am getting this error when I run the file construct_datasets.py

How do I solve this error?

Weights Link

Hi can u mention the link,

'Weights for both CNNs and .h5 files for train and test datasets are available in the link below:'

Inaccurate prediction

Hi, I have a clear house number image,however I can't get the correct predict.
What kind of pretreatment should I do?
Thank you for your time.
1
2

How we can use it ?

Hello,
thank you so much about shairng your poject .

can i ask you moore details about how we can use it well ?

thank for advance

cnn models

Hi
You have not included cnn models, model json.

And can I know the scripts running flow. if you have already trained the model, then have only to upload an image and have to test only :) so can I know where the main testing and image uploading is handled

HDF5 error / Unable to open train_data_processed.h5

Hello,when I run combi_models an error occur
here is the error messages
Traceback (most recent call last):
File "d:\RCB\Street-View-House-Numbers-SVHN-Detection-and-Classification-using-CNN-master\combi_models.py", line 47, in
train_data = pd.read_hdf(os.path.join(root_dir,'data','train_data_processed.h5'),'table')
File "D:\ANACONDA\lib\site-packages\pandas\io\pytables.py", line 389, in read_hdf
return store.select(key, auto_close=auto_close, **kwargs)
File "D:\ANACONDA\lib\site-packages\pandas\io\pytables.py", line 740, in select
return it.get_result()
File "D:\ANACONDA\lib\site-packages\pandas\io\pytables.py", line 1518, in get_result
results = self.func(self.start, self.stop, where)
File "D:\ANACONDA\lib\site-packages\pandas\io\pytables.py", line 733, in func
columns=columns)
File "D:\ANACONDA\lib\site-packages\pandas\io\pytables.py", line 2995, in read
start=_start, stop=_stop)
File "D:\ANACONDA\lib\site-packages\pandas\io\pytables.py", line 2540, in read_array
ret = node[0][start:stop]
File "D:\ANACONDA\lib\site-packages\tables\vlarray.py", line 681, in getitem
return self.read(start, stop, step)[0]
File "D:\ANACONDA\lib\site-packages\tables\vlarray.py", line 821, in read
listarr = self._read_array(start, stop, step)
File "tables\hdf5extension.pyx", line 2155, in tables.hdf5extension.VLArray._read_array
ValueError: cannot set WRITEABLE flag to True of this array

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.