Giter VIP home page Giter VIP logo

e17-co328-document-tag-generator's Introduction

Document Tag Generator


Table of Contents

  1. Introduction
  2. Local Installation
  3. Links
  4. System Organization

Introduction

✨ Problem

The project website of the Department of Computer Engineering currently has nearly 150 projects. These projects are categorized by only batches and subjectwise. Also, some of the projects have some tags but some of them are not relevant to those projects. Also, some projects do not have any tags. Currently, users can search projects by keywords, but those keywords are derived only from project descriptions.

Screenshot 2022-03-14 222117

✨ Our goal

Our goal is to generate relevant tags for each project according to the description of the projects and other valid data available on the project pages.

✨ Solution

Our plan is to build an ML model to generate relevant tags. The data needed to implement the ML model is retrieved from the project pages and the project repositories. To get all details (link to the repositories and project pages + other details) of the project pages and repositories API of the website is used. To get the data from the project pages a scraping tool will be used.




Local Installation

The site is built by Jekyll Builder and hosted on GitHub pages.

  • Fork the repository and clone that into your local machine.
  • Follow the build instruction to install the necessary dependencies to run the Jekyll builder in your local machine ##Build Instruction

gem install just-the-docs
gem install jekyll-sitemap
bundle exec just-the-docs rake search:init jekyll-sitemap

Note: - If you face any dependency/version issue follow the instruction in this link to downgrade/upgrade the versions

current version is 2.7.1

rbenv install 2.7.1
rbnev global 2.7.1

For the API install this additional python packages

pip install requests
cd ./python_scripts/
python3 stat_script.py


Architecture


___image

___architecture

In department projects website, frontend, and backend are already implemented. According to the current implementation, users can search projects using tags. But the tagging was done using a simple algorithm such that it checks whether the project description contains the searching tag. Our goal is to implement a machine learning model, which can do tagging in a much better way.

In order to train the machine learning model we need a data set that contains the details of the projects. We hope to use project descriptions, project repositories, and project pages to generate the data set. By using this dataset, we have to train a good ML model, which can tag projects in the department website in a better way.

After implementing the ML model, we have to integrate it with the backend of the department website. Then we need to run the ML model to generate tags and those tags should be stored in a json file inside the backend repository.

Backend of the department project website can be accessed by a API. It contains a end point to access that json file which contains all the generated tags and their corresponding tags. When a user search a project using tags, by using tags file, relevant projects will be shown to the user.

When new project is added to the department project website, we need to run the ML model again and update the tags file. GitHub actions can be used for that.

Since project pages and project repositories are update regularly, we hope to run the ML model weekly.




Project Owner : Mr. Nuwan Jaliyagoda
Scrum Master: Mr. Thushara Bandara



✨Our Team

E/17/100 - Gunathilaka R.M.S.M

E/17/246 - Perera K.S.D

E/17/284 - Rathnayaka R.L.D.A.S

.....

Links

e17-co328-document-tag-generator's People

Contributors

achinthasandakalum avatar sachinthamadhushanka avatar shenalperera avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.