Giter VIP home page Giter VIP logo

sentiment-analysis's Introduction

Approach

When reading the problem statement the easiest way was to use an OpenAI API key, incorporating langchain lib and passing a prompt with a few examples.

However this would not make use of the labelled dataset. On further data exploration I saw that with the 8129 labelled rows, there were only 32 unique categories, which means this problem is two fold

- Perform multi-label classification on a given sentence.
- using the given categories, run sentiment analysis to get whether positive or negative

For the sake of my sanity and ease of readability, I've made two datasets, one to be used for category training and the other to be used for sentiment training. This means we will also be using two models.

The key advantage to this is that we get to locally keep the weights of our pretrained model. So we have reproducable results

Kindly refer to the notebook for step by step guide

Methodology

Data Preprocessing

  • Break the dataset into two: One for category classification and the other for sentiment analysis training
  • Split the values in each column into seperate tokens. eg. value for money positive -> "value for money" and "positive"
  • Basic preprocessing like dropping all the rows where there is no labelling
  • More info in preprocess.py file
  • Files are category.csv and sentiment.csv

Training

  • Using the BERT transformer for training, specifically 'bert-base-uncased' from BertForSequenceClassification.
  • Using the transformer library which makes the code much more readble and reduces number of lines
  • More information in the notebook

Testing

  • The final block in the python notebook joins both the models and gives us a dictionary output for the input sentence

NOTE:

I've found that some of the labels are wrong. This has affected training but we still get reasonable answers eg. "The two tyres were fitted efficiently and I was unable to attend at the time of the appointment so they fitted me in when I was able to attend. The staff were friendly, generous with their coffee and were able to fit the two tyres on the wheels of my choice. Iโ€™m very satisfied!" has

"garage service negative", "wait time negative", "length of fitting negative"

clearly that is not the case here

sentiment-analysis's People

Contributors

markvrma avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.