Giter VIP home page Giter VIP logo

dibyendubiswas1998 / document-tagging Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 39.11 MB

Creare CI-CD pipeline that helps to automate the process of assigning relevant tags or categories to large volumes of unstructured textual data.

Home Page: http://65.0.74.77:8080/

License: GNU General Public License v3.0

Python 4.82% Jupyter Notebook 94.96% CSS 0.12% HTML 0.08% Dockerfile 0.02%
aws-ec2 docker dvc git mlflow python3 pytorch s3-bucket

document-tagging's Introduction

Document Tagging

Problem Statement:

Document-Tagging system that leverages advanced natural language processing (NLP) techniques and transformer-based pre-trained models for efficient document categorization. This system aims to automatically generate the relevant tags or categories based on large volumes of unstructured textual data. Users can upload documents, which undergo preprocessing and are then classified into multiple relevant categories using NLP models. Download Problem Statement PDF

Here I build CI-CD pipeline for generating relevant tags based on given textual data.
DagsHub
WebApp

Project Documentation:

High Level Design Dcocument (HLD)

Architecture Document

Wireframe Document

Low Level Design Document (LLD)

Detailed Project Report (DPR)

Project Workflow:

  • Step-01: Load the raw or custom data from AWS S3 Bucket, provided by user. And save the data into particular directory.

  • Step-02: Preprocessed the raw data, like handle the missing values, duplicate values, text-preprocessing, vectorization, separate the X and Y, create tensor dataset and split them into train, test and validation sets.

  • Step-03: Create the model (default: bert-base-uncased), and train the model. After that save the pre-trained model & tokenizer in a particular directory.

  • Step-04: Evaluate the model baed on test datasets and save the inflormation on DagsHub using mlflow.

  • Step-05: Create a Web Application for generating the tags and host the entier application on AWS.

Tech Stack:

Tech Stack

How to Run the Application:

    # For Windows OS:
    docker pull dibyendubiswas1998/document_tagging
    docker run -p 8080:8080 dibyendubiswas1998/document_tagging

    # For Ubuntu OS:
    sudo docker pull dibyendubiswas1998/document_tagging
    sudo docker run -p 8080:8080 dibyendubiswas1998/document_tagging



Web Interface:

Web Interface

document-tagging's People

Contributors

dibyendubiswas1998 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.