Giter VIP home page Giter VIP logo

learningunit_skillassignment's Introduction

Data Science Internship Assignment

Assignment for candidates

Table of Contents

  1. Data
  2. Task - Assigning skills to learning units
  3. Working with files
  4. Modelling
  5. Further Development

Data

  • Time series. I have choosen this dataset [provided file](data/raw/Learning Catalogue.csv) and provided file as a process fluctuating in time

Task

  • Assigning Skills to learning unit. - Building a word level model for predicting which skill should be assigned to the learning unit?

Working with files

At the end of this file, you can input the new text description and see the skills assignment.

For ease you can also directly use colab notebook for checking the results and approach - link

Caution: while using the colab, you have to load the dataset using the files section in the left hand side.

  • app.py - It contains the simple app build using streamlit for using the build model

Some glimpse of app -:

App

  • predict.py - It is predicting the new test input - you have to call one function.

  • requirements.txt - It contains all the requirements for running this project

How to execute the files/code

- One is directly go through/run the python notebook- LearningUnitSkillAssignment.ipynb.

- Else run app.py use the model as per your choice.

Modelling ๐Ÿš€

  • Create embeddings using universal sentence encoder embeddings link, it is a multillingual model.
  • Word based model - so created embeddings for each word in description and also skills.
  • Find similarity between each word of description and the skill.
  • Assign those top skills to that description which are crossing that particular threshold and highest count skill is assigned accordingly.

Further development

I have trained two models one with removing simple german stop words and second by removing german stop words including words which are occuring more times in our dataset.

If more time will be there these things could be done -:

- Different embeddings should be tried and thier metrics scores
- Due to non-labelled dataset , cant appl any supervised learning, but if we have more data then we can apply self learning and then can use supervised learning.

learningunit_skillassignment's People

Contributors

ridhimagarg avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.