Giter VIP home page Giter VIP logo

ocr's Introduction

OCR Model for Character Recognition (Chars74K-Digital-English-Font)

Project Overview

This project focuses on creating a robust Optical Character Recognition (OCR) model using the Chars74K-Digital-English-Font dataset. The OCR model is built on top of the MobileNetV2 architecture and is capable of recognizing 62 different characters, including digits (0-9), uppercase letters (A-Z), and lowercase letters (a-z).

The project involves several stages, including data preparation, model training, character segmentation, and prediction. The model is trained using TensorFlow and Keras libraries, and the character segmentation is performed using OpenCV.

Project Structure

Data Preparation

The dataset is organized into separate folders for each character and is split into training and validation sets. The data is then loaded and preprocessed using the ImageDataGenerator class from Keras, ensuring proper normalization and resizing of the images.

Model Training

The model is built using the following steps:

  • Load the MobileNetV2 architecture as the base model.
  • Freeze the base layers to utilize the pre-trained weights.
  • Add custom layers, including a GlobalAveragePooling2D layer, a Dense layer, and a Dropout layer, to adapt the model to our specific task.
  • Compile the model using the Adam optimizer and categorical_crossentropy loss function.
  • Train the model using the preprocessed data from the ImageDataGenerator.

Character Segmentation

Character segmentation is performed using OpenCV by:

  • Converting the Region of Interest (ROI) to grayscale.
  • Applying binary thresholding to segment the characters.
  • Finding contours and drawing bounding boxes around each character.

Character Prediction

After segmenting the characters, the model predicts the class of each character by:

  • Preprocessing each character image to match the input requirements of the model.
  • Predicting the class using the trained model.
  • Mapping the predicted class index to the corresponding character using an index_to_char dictionary.

Visualization

The characters, along with their bounding boxes, are visualized using cv2_imshow.

ocr's People

Contributors

catorch avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.