Giter VIP home page Giter VIP logo

hackerearth-keep-babies-safe's Introduction

KEEP BABIES SAFE MODEL

GOAL

Develop a model to classify products into consumer products and toys and Extract and tag brand names of these products from each image. In case no brand names are mentioned, tag it as ‘Unnamed’.

NOTE : We can cluster visually similar images (image clustering) together using deep learning and clustering.

DATASET

https://www.kaggle.com/datasets/kunalgupta2616/hackerearth-dl-challenge-keep-babies-safe

CONTENT

Data for the case is available in CSV format having 1131 rows and 3 columns. It also has an images folder containing 1131 images of different products.

STEPS TAKEN

All the required libraries and packages were imported and then the required dataset for the project was loaded.

Basic EDA was carried out.

The code uses Resnet50, a pre-trained CNN, for feature extraction, we remove its topmost/head or the final layer of neurons used for prediction of classes, we then feed our image to the CNN and get a feature vector as an output, which essentially is a flattened array of all the feature maps learned by our CNN at the second last layer of Resnet50. This output vector can be fed into any clustering algorithm ( kmeans(n_cluster = 2) or agglomerative clustering) which classifies our images into the desired number of classes.

Model building was then implemented using different algorithms. Two different OCR libraries were used.

MODELS USED

The classification models used are:

  1. PyTesseract
  2. Keras-OCR
  3. Agglomerative Clustering

THEORY

Optical Character Recognition (OCR) is the process that converts an image of text into a machine-readable text format. An OCR program extracts and repurposes data from scanned documents, camera images and image-only pdfs. OCR software singles out letters on the image, puts them into words and then puts the words into sentences, thus enabling access to and editing of the original content.

ocr_flow

AGGLOMERATIVE CLUSTERING

Agglomerative Clustering is a type of Hierarchical Method of Clustering which creates a tree-like structure through decomposition. It uses distance between the nearest/farthest points in neighbouring clusters for refinement.

Agglomerative Clustering uses a bottom-up approach. It starts with each object forming its own cluster and then iteratively merges the clusters according to their similarity hence forming Large Clusters. It terminates either when all the clusters merge into a single cluster or if a certain clustering threshold is imposed.

aglo

LIBRARIES REQUIRED

  • Pandas - for data analysis
  • Numpy - for data analysis
  • matplotlib - for data visualization
  • seaborn - for data visualization
  • scikit-learn - for data analysis

VISUALIZATION

Dataset Head snapshot

Head

Samples

Sample

RESULT

tesseract_result

EVALUATION OF ACCURACY

CER calculation is based on the concept of Levenshtein distance, where we count the minimum number of character-level operations required to transform the ground truth text (aka reference text) into the OCR output. image

  • S = Number of Substitutions
  • D = Number of Deletions
  • I = Number of Insertions
  • N = Number of characters in reference text (aka ground truth)

Word Error Rate (WER) might be more applicable if it involves the transcription of paragraphs and sentences of words with meaning (e.g., pages of books, newspapers).

WER operates at the word level instead. It represents the number of word substitutions, deletions, or insertions needed to transform one sentence into another.

WER is generally well-correlated with CER (provided error rates are not excessively high), although the absolute WER value is expected to be higher than the CER value. image

TESSERACT OCR ACCURACY image

image

Keras-OCR

image

CONCLUSION

Tesseract OCR has an upper-hand over the keras-ocr mostly for high-resolution images.

After comparing the CER and WER accuracies of Tesseract OCR and Keras-OCR, we can conclude that for this case Pytesseract will be a better fit as it has a lower CER and WER as compared to Keras-OCR, hence better accuracy.

Prajwal Uday

Connect with me on Linkedin: https://www.linkedin.com/in/prajwal-uday-1b9678229/

Check out my Github profile: https://github.com/prajwal-144

hackerearth-keep-babies-safe's People

Contributors

prajwal-144 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.