Giter VIP home page Giter VIP logo

tweets-sentiment-analysis's Introduction

Brief Summary of Project

This project trained several models using different training datasets with Logistic Regression and Bernouilli Naive Bayes models to fulfill the sentiment analysis task.

Five training datasets were used to train the classification model, including sentiment 140, Apple Twitter Sentiment, Twitter US Airline Sentime, Depression Sentiment, and Russia invade tweets. These models generated were then tested on the [Putin tweets] dataset to demonstrate their accuracy in predicting tweet content related to Russian president Putin. ([Putin tweets] is provided in this file)

How to use the codes? - usage example

Introduction: There are five .py files: preprocess.py, building_model.py, evaluatemodel.py, predicting.py, and analyzing.py. The preprocess.py and evaluate.py are two helper files for builing_model.py and predicting.py. Finally, the analyzing.py is for analyzing our dataset. building_model.py:

Firstly, you should import the dataset. Then, you should choose different commands and modify the parameters following the comments based on the dataset you upload. After that, the code will preprocess the data, split it into train and test datasets, and transform X_train into tf-idf features. Afterward, the code will create and evaluate a Bernoulli Naive Bayes model and a Logistic Regression model. Finally, you can save the vectorizer and models into pickle files.

How to use the [predicting_model.py]: First, download the vectorizer and models from pickle files. Second, download the text and labels of the test dataset. Third, use the models to make predictions. Fourth, calculate the specificity scores and metrics.

How to use the [analyzing.py]: The file has two functions. Firstly, it can create the wordnet plot and list out the top negative and non-negative words in a few datasets. Secondly, it can label the dataset using VADER models.

Writeup

  1. Models: in this task, overall Logistic Regression performed better than Naive Bayes. In the future, more types of Naive Bayes, such as Multinomial, could be explored.
  2. Data size: in this task, larger training dataset performed slightly better than smaller dataset.To better understand the relationship between corpus size and performance, we could try out more training datasets from the same topic but different corpus sizes.
  3. Topic: In the future, more datasets of different topics could be explored, especially the tweet content regarding other controversial political figures.
  4. Label: The standard of negative and non-negative content may vary from person to person. A better way to label the testing dataset could be to involve more people labeling the data.

tweets-sentiment-analysis's People

Contributors

jchen255 avatar yuqingjinjess avatar

Stargazers

Michael Rossetti avatar

Watchers

 avatar

Forkers

yuqingjinjess

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.