Giter VIP home page Giter VIP logo

group-9c-capstone-1's Introduction

Group 9C Capstone

Improving Demand Forecasting with Natural Language Processing (NLP)

This project is our group's initial foray into the world of Natural Language Processing and Machine Learning. We worked on this as part of the NUS SGUS Fintech Program's Capstone Project. In this repository, we are mainly focusing on delivering two things:

  1. Being able to classify statements into different "types" of statements.
  2. Being able to extract temporal information from text.

Train Data File Data for training the model

Test Data File Data for testing the model

In file "1 SemEval10 Multiclassification", you will find a code write in python and run on google colab to evaluate several models that can be used for text classification. In the training and test data, we have 10 "types" of statements, including cause-effect, product-producer, instrument-agency etc. We have evaluated 4 different models for doing text classification: Multinomial Naive Bayes, Multinomial Naive Bayes w optimized parameters, Support Vector Machine and Support Vector Machine w Linear Kernel (optimized). The last model returned the greatest accuracy.

After we had showcased file 1 to our stakeholders, we gathered some feedback on how to better the code. In file "2 SemEval10 Binary Classification", we implemented some of these improvements, including changing the classification to a binary one "cause-effect" vs "others". What we noticed is that this change had resulted in skewed train/test data, with many statements in "others". In file "2A SemEval10 Binary Classification", we tried doing oversampling to tackle the skewed data. We also ran the data through a preprocessing step (though for this particular dataset it did not seem to make any difference in accuracies). Lastly, we had done some additional research on the use of gridsearch for hyperparameter tuning and corrected our codes to incorporate that.

In file "3 SVM Binary Classification with Temporal Extraction", we had utilized the model with the best accuracy from file "2 SemEval10 Binary Classification" - Support Vector Machine with Linear Kernel - to conduct a binary classification to identify "cause-effect" statements. The code then uses spacy with a roberta transformer to extract temporal information. This performed much better than spacy without the transformer - better in the sense it could pick out dates in any form eg. "04.10.2022" and also relevant time period information like "Q2" which normal spacy could not do.

Team Member

Connect with us

group-9c-capstone-1's People

Contributors

beatriceyapsm avatar squaluz avatar chenjinghao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.