Giter VIP home page Giter VIP logo

quick-draw's Introduction

Quick-Draw

Project Overview

This project is done as a part of Computer Vision Course.

In Quick Draw the AI system tries to classify the hand-drawn doodle into a predetermined category. By this project we are trying to achieve the same using different feature extraction techniques like HOG, LBP, SIFT, SURF, pixel values with feature reduction techniques PCA, LDA and applying various classifiers such as Naive Bayes, Random Forest, SVM, XGBoost, Bagging, ADA-boost, KNN and CNN to compare their performance on different evaluation metric such as Accuracy, MAP@3, CMC Curve and Confusion Matrix.

Project Poster can be found in CV-Poster-Final.pdf.

Problem Usecase

  • It is a challenge in Computer Vision & Machine Learning to handle noisy data and dataset with many different representations of the same class. The Quick Draw Doodle Recognition challenge is a good example of these issues because different users may draw the same object differently or the doodles could be incomplete which is similar to noisy data.
  • This application can be used as a fast prototyping tool for designers or artists by suggesting them the accurate templates on the basis of the rough doodles made by them.
  • It can be extended by replacing the doodles with doodles of alphabets and then convert the hand-written text into digital text format.

Dataset

  • The Quick Draw dataset is a collection of millions of doodle drawings of 300+ categories. The drawings draw by the players were captured as 28 x 28 grayscale images in .npy format with respect to each category.
  • The complete dataset is huge (~73GB) and so we have used only a subset of the complete data (20 categories).
  • The dataset is split in training and test set with 80-20 ratio, the training set is further split into train and validation with 70-30 ratio.
  • Fig above shows a doodle image of each class in our sampled dataset.
  • Dataset can be downloaded from here.

Proposed Algorithm

  • CNN Model Architecture
  • We have followed a conventional computer vision pipeline to train our model. Fig. below shows the training pipeline followed.

  • Feature Extraction: Extract texture information from HOG & LBP, Spatial information from SIFT & SURF and pixel information from grayscale image.

  • Preprocessing: Feature normalization by Min-Max and Z-score to bring features on a similar scale.

  • Dimensionality Reduction: PCA or LDA was applied to project the features with max separation. In PCA number of components were selected by plotting the variance over projected data.

  • Classification: Different classifiers were trained and tested with different parameters and feature combinations.

  • Prediction and Evaluation Metrics: Metrics such as accuracy, MAP@3, CMC curve was found to compare the performance of classifiers.

  • For Production time the following pipeline was used where contours were used to find the object.

Evaluation Metrics and Results

Follwing are the results of the project:

  • Confusion Matrices were plotted for best performing classifiers.

  • Mean Average Precision (MAP@3) score were found for classifiers to find performance in top 3 predictions.

  • CMC Curve was plotted to find the identification accuracy at different ranks.

  • Accuracy of different classifiers was used to compare the performance using PCA and LDA.

Interpretation of Results

  • In Dimensionality reduction technique LDA performs better than PCA as it is able to separate data on the basis of classes.
  • Texture based features gave good classification accuracy as compared to other features.
  • XGBoost shows best performance as compared to all the other non-deep learning models as the dataset includes images of multiple classes over which XGboost is able to learn better because of boosting technique.
  • CNN gives the best performance with a MAP@3 of 96.01%. This is because the kernels are able to learn different feature representations which help the model to differentiate between the classes well.

References

  1. Lu, W., & Tran, E. (2017). Free-hand Sketch Recognition Classification.
  2. M. Eitz, J. Hays, and M. Alexa. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH), 31(4):44:1โ€“ 44:10, 2012.
  3. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  4. Kim, J., Kim, B. S., & Savarese, S. (2012). Comparing image classification methods: K-nearest-neighbor and support-vector machines. Ann Arbor, 1001, 48109-2122.
  5. Ha, D., & Eck, D. (2017). A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477.

Project Team Members

  1. Anubhav Shrimal
  2. Vrutti Patel

quick-draw's People

Contributors

anubhavshrimal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

quick-draw's Issues

ValueError: not enough values to unpack (expected 3, got 2)

Traceback (most recent call last):
File "draw.py", line 78, in
_, contour_gs, _ = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
ValueError: not enough values to unpack (expected 3, got 2)

environment:
Package Version


certifi 2020.6.20
chardet 3.0.4
cycler 0.10.0
idna 2.10
joblib 0.17.0
kiwisolver 1.2.0
matplotlib 3.3.2
mkl-fft 1.2.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
olefile 0.46
opencv-python 4.4.0.44
Pillow 8.0.1
pip 20.2.4
pyparsing 2.4.7
python-dateutil 2.8.1
quickdraw 0.1.0
requests 2.24.0
scikit-learn 0.23.2
scipy 1.5.3
setuptools 50.3.0.post20201006
six 1.15.0
sklearn 0.0
threadpoolctl 2.1.0
torch 1.6.0
torchvision 0.7.0
urllib3 1.25.11
wheel 0.35.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.