Giter VIP home page Giter VIP logo

tnm108-twitter-sentiment-analysis's Introduction

TNM108 Project - Twitter Sentiment Analysis

This is a project made for the university course TNM108 - Machine Learning for Social Media at Linköpings University 2022.

The project is made by Anna Jonsson and Amanda Bigelius, and the goal is to make a Twitter Sentiment Analysis Algorithm.

In the end, the project resulted in two different solutions. One solution where TextBlob, a lexicon-based method, was used, and one where Logistic Regression was used.

Twitter Sentiment Analysis using TextBlob

The algorithm will be heavily based on Nikita Silaparasetty's code from this tutorial

Her repository for the tutorial can be found here

Our modifications and thoughts

Our first modification was to move all the API_KEYS to a separate file in order to be able to uplead the code on GitHub.

We also added our own list of stopwords since the NLTK stopwords removed some words we found important for the classification.

We added a way to check the most frequent words from the tweets, without the query and only using words longer than 2 characters. Later on we added filtered out the NLTK stopwords on our most common words, since the analysis was done and these stopwords weren't relevant when looking at the word frequency. Then we displayed it as a bar plot.

Lastly we added a simple GUI to make it more intuitive for the user where to put the query.

Our assignment was to make a algorithm using machine learning, and although TextBlob is a good tool, it doesn't cover our needs for this assignment.

Graphical User Interface

The GUI has been made with the library PySimpleGUI, and this stackoverflow answer was very helpful.

Requirements 🛠️

In order for this algorithm to work you need to have python installed on your computer, as well as the following libraries:

Install libraries using pip

To install the libraries using pip, write the following command lines one by one:

  • Tweepy: pip install tweepy
  • Matplotlib: pip install matplotlib
  • Pandas: pip install pandas
  • TextBlob: pip install -U textblob as well as python -m textblob.download_corpora to download the necessary NLTK corpora.
  • WordCloud: pip install wordcloud
  • Better Profanity: pip install better_profanity
  • PySimpleGUI: pip install pysimplegui
  • NLTK: pip install nltk
  • Collection: pip install collection

Twitter Sentiment Analysis using Logistic Regression

The algorithm will be heavily based on Kate Arbuzova's code from this tutorial.

The dataset used for this method can be found on Kaggle.

Our modifications and thoughts

Our first modifications to Kate's code was to only look at the Logistic Regression methods she used.

We also increased the numbers of features to 10,000 - this was probably a bad move, but we still did it.

Then we commented out a lot of code, just to make the program print less stuff.

The runtime for this was extremely long, so we would recommend scaling everything down.

Requirements 🛠️

In order for this algorithm to work you need to have python installed on your computer, as well as the following libraries:

Install libraries using pip

To install the libraries using pip, write the following command lines one by one:

  • Scikit-learn: pip install scikit-learn
  • SciPy: pip install scipy
  • NLTK: pip install nltk
  • Statsmodels: pip install statsmodels
  • Emoji: pip install emoji
  • Regex: pip install regex
  • Spacy: pip install spacy
  • TQDM: pip install tqdm
  • Matplotlib: pip install matplotlib
  • Pandas: pip install panda
  • Pickle: pip install pickle
  • Seaborn: pip install seaborn

tnm108-twitter-sentiment-analysis's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.