dibyendubiswas1998 / document-tagging Goto Github PK

View Code? Open in Web Editor NEW

Creare CI-CD pipeline that helps to automate the process of assigning relevant tags or categories to large volumes of unstructured textual data.

License: GNU General Public License v3.0

Python 4.82% Jupyter Notebook 94.96% CSS 0.12% HTML 0.08% Dockerfile 0.02%

document-tagging's Introduction

Document Tagging

Problem Statement:

Document-Tagging system that leverages advanced natural language processing (NLP) techniques and transformer-based pre-trained models for efficient document categorization. This system aims to automatically generate the relevant tags or categories based on large volumes of unstructured textual data. Users can upload documents, which undergo preprocessing and are then classified into multiple relevant categories using NLP models. Download Problem Statement PDF

Here I build CI-CD pipeline for generating relevant tags based on given textual data.
DagsHub
WebApp

Project Documentation:

High Level Design Dcocument (HLD)

Architecture Document

Wireframe Document

Low Level Design Document (LLD)

Detailed Project Report (DPR)

Project Workflow:

Step-01: Load the raw or custom data from AWS S3 Bucket, provided by user. And save the data into particular directory.
Step-02: Preprocessed the raw data, like handle the missing values, duplicate values, text-preprocessing, vectorization, separate the X and Y, create tensor dataset and split them into train, test and validation sets.
Step-03: Create the model (default: bert-base-uncased), and train the model. After that save the pre-trained model & tokenizer in a particular directory.
Step-04: Evaluate the model baed on test datasets and save the inflormation on DagsHub using mlflow.
Step-05: Create a Web Application for generating the tags and host the entier application on AWS.

Tech Stack:

How to Run the Application:

    # For Windows OS:
    docker pull dibyendubiswas1998/document_tagging
    docker run -p 8080:8080 dibyendubiswas1998/document_tagging

    # For Ubuntu OS:
    sudo docker pull dibyendubiswas1998/document_tagging
    sudo docker run -p 8080:8080 dibyendubiswas1998/document_tagging

Web Interface:

Recommend Projects

dibyendubiswas1998 / document-tagging Goto Github PK