Giter VIP home page Giter VIP logo

airline_twitter_sentiment_prediction_and_negative_reason_identification's Introduction

Airline Twitter Sentimental Prediction and Negative Reason Identification

Twitter is one of the widely used social media platform across the world, where an individual can express his/her thoughts. The tweets posted by a person may be related to a scenario, person, product or service. We can use these tweets to identify the customer's opinion on the product/service. This helps in finding out the ratio of satisfaction or disappointment customers. We can also use the same tweets to identify the reason for the satisfaction or disappointment in the product/service.

Dataset

The Dataset used is "Twitter US Airline Sentiment". It is obtained from Kaggle. The dataset contains the scapped tweets of US Airline Companies of February 2015. It contains 15 columns including text(tweet), the sentiment of tweet, reason for negative text, airline name, tweet posted location, retweet count, etc...

Objective

To Create two Machine Learning Models using Python.

  • First model to predict if a Tweet has "Positive", "Negative" or "Neutral" sentiment.
  • Second model to predict the reason for disappointment of the customer, if the Tweet is a "Negative" tweet. This helps the airline company to identify the services in which they lack, so that it can be focused and improved.

Text Preprocessing Steps performed on a Tweet

  • Identify all the negative stopwords and remove them from the stopwords list.
  • The text is split into words
  • Words which are stop words or twitter account name ( starting with '@' ) or website links ( starting with 'http') or ampersand are removed
  • Words are converted to Lower case
  • All negative stopwords are replaced with 'not'
  • Only the English alphabets in the word are retained.
  • The POS tag (Part of Speech tag) of the word is identified and passed to Lemmatizer to obtain root word. If Error is thrown, then word is passed to Lemmarizer without the POS tag. In this case the POS tag is defaultly considered as a noun.
  • All the cleaned words are concatinated as a string

Steps for training models

Steps for training Airline Sentiment Prediction Models:

  • Drop the rows which have "airline_sentiment_confidence" less than 1.
  • Retain only the "text" as independent feature and "airline_sentiment" as dependent feature.
  • For each Tweet, Text Preprocessing Steps mentioned above is followed
  • Create Unigram Bag of Words seperately and Bigram Bag of Words seperately
    • Split Each Bag of Words into train and test ratio - 0.8 and 0.2
    • Try fitting models
  • The Dataset is imbalanced, so Accuracy Score cannot be used as metric. Here both Precision and Recall are Important. So F1-Score is considered as a metric.

Steps for training Reason Prediction For Negative Tweet Models:

  • Drop the rows which have "negativereason_confidence" less than 1.
  • Retain only the "text" as independent feature and "negativereason" as dependent feature.
  • For each Tweet, Text Preprocessing Steps mentioned above is followed
  • Create Unigram Bag of Words
  • Split Bag of Words into train and test ratio - 0.8 and 0.2
  • Try fitting models
  • The Dataset is imbalanced, so Accuracy Score cannot be used as metric. Here both Precision and Recall are Important. So F1-Score is considered as a metric.

Machine Learning Approaches Used:

  • MultinomialNB
  • LinearSVM
  • Random Forest

The above models are used for Airline Sentiment Prediction for both Unigram and Bigram Bag of Words. The above models are used for Reason Prediction For Negative Tweet for the Unigram Bag of Words.

Output:

For Airline Sentiment Prediction, the LinearSVC model of UNIGRAM is finalized, it has

  • F1-Score of 80.74%
  • average accuracy of 87.00%.

For Reason Prediction For Negative Tweet, the LinearSVC model of UNIGRAM is finalized, it has

  • F1-Score of 65.40%
  • average accuracy of 82.45%.

Finally created a function called "Sentiment_Prediction", which includes the both finalized LinearSVC Unigram models to predict the sentiment of the tweet and if sentiment is negative then predict the reason.

`

airline_twitter_sentiment_prediction_and_negative_reason_identification's People

Contributors

harini-shre avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.