Giter VIP home page Giter VIP logo

princysinghal / document-interaction-assistant Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kanikakj/document-interaction-assistant

0.0 0.0 0.0 118 KB

Our Document Interaction Assistant optimizes document tasks with advanced machine learning, OCR, and efficient recognition of various documents. The project prioritizes user-friendly interactions with PDFs and images, featuring "Read Aloud" for enhanced accessibility and "Document Summarization" for efficient and concise summaries.

Jupyter Notebook 100.00%

document-interaction-assistant's Introduction

Document-Interaction-Assistant

Table of Contents
  1. About The Project
  2. Salient Features
  3. Description
  4. Data Preprocessing
  5. Document Classification Model
  6. Results
  7. Information extraction model
  8. Team

About the project

We put out a model that can recognize the collection of papers contained in a PDF or image made up of numerous documents. To accomplish this, the input PDF is divided into individual pages. The CNN model is used to categorize each page into the appropriate document category. After that, each document's data is extracted using OCR (optical character recognition). This is being recommended for five documents: voter identification, driver's license, PAN, and Aadhar. Except for the front and back of the same document, the input PDF must include a single document on a single page. Initially, our data classification model achieved an accuracy of 0.7342 on the training set and 0.7736 on the validation set, with gains of 0.6923 and losses of 0.8340.

In our ongoing efforts to enhance performance, we explored and discovered VGG16 and VGG19 models. Hyperparameter tuning was applied to our model, incorporating additional layers to the pre-trained models. As a result, we achieved a validation loss of 0.3677 and a validation accuracy of 0.8769 for VGG16.

In addition to this, we incorporated two more features:

1. Read Aloud:

  • Utilizes text-to-speech technology for accessibility.
  • Translates text into spoken words.
  • Supports auditory learners and those with visual impairments.
  • Enhances accessibility and consumability.

2. Document Summarization:

  • Aids time-constrained users by condensing lengthy papers.
  • Uses Hugging Face Transformers library for NLP models.
  • Provides clear and instructive document synopses.
  • Maximizes time efficiency by distilling crucial insights.

Salient Features

Hyperparameter tuning, regularization(early stopping,dropout), document split

Tech stack used

  • models: CNN, VGG16, VGG19 and OCR engine tesseract
  • Google TTS(Text to speech), Hugging Face Transformers for text summarization
  • Framework-Keras

User Flow

image

Data Description

When we began searching for an appropriate dataset, we observed that there is no publicly available dataset of identity documents as they hold sensitive and personal information. But we came across a dataset on Kaggle that consisted of six folders, i.e., Aadhar Card, PAN Card, Voter ID, single-page Gas Bill, Passport, and Driver's License. We added a few more images to each folder. These were our own documents that we manually scanned, with the rest coming from Google Images. Thus, these are the five documents we are classifying and extracting information from. image

Data Preprocessing

Originally, we implemented horizontal and vertical data augmentation through random flips to enhance dataset size and diversity. Currently, we have transitioned to utilizing image data generators for both the train and test sets.

Document Classification Model

CNN model

image

Various hyperparameters like the number of layers, neurons in each layer, number of filters, kernel size, the value of p in dropout layers, number of epochs, batch size, etc. were changed until satisfactory training and validation accuracy was achieved.

image

image

CNN Model results

image

image

VGG16

The VGG model's architecture uses small convolution filters and deep structure that allows it to capture fine details, which is crucial for distinguishing between various ID documents that often have subtle differences. image

4 additional layers were incorporated into the pre-trained model.

image

Before landing unto our final chosen model shown above, we tweaked the pre-trained architecture until satifactory results were acheived. ![Comparative results of identity document classification models]image

Information extraction model

Following are the steps of OCR done on images:

image

image

Ongoing Improvements:

  1. Interactive Summarization and Query Answering
  2. Advanced Handwritten Text Extraction
  3. Global Accessibility with Multilingual Support
  4. Wider document classfication systems covering legal documents
  5. Exploring advanced CNN architectures

Team

document-interaction-assistant's People

Contributors

kanikakj avatar princysinghal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.