Giter VIP home page Giter VIP logo

minions's Introduction

Develop a Twitter Data Crawler and Perform Sentiment Analysis

Background:
An article by New York Times wrote: "The U.S. is falling to the lowest vaccination rates of the world's wealthiest democracies."

Table of Contents

Project Overview

Problem statement:

  1. US is experiencing significant decrease in COVID-19 vaccination rate.
  2. Evaluate President Joe Biden's sentiments towards COVID-19 vaccinations on Twitter and its impact on US vaccination rate.

Code and Resources Used

Python Version: 3.9
Packages: Tweepy, Pandas, Time, RE, Sqlalchemy, Matplotlib, NLTK, Wordcloud, Numpy, Pillow, Itertools, Collections, Textblob, Datetime, Decimal
Dataset: Utilized Our World In Data's COVID Deaths dataset to observe Biden's tweets impact

Database Schema

Schema
The data crawled will be loaded into 3 tables and stored in a PostGreSQL database namely:

user_info:

  • Twitter user id
  • User's name
  • User's profile description
  • User's location

tweets:

  • Tweet ID
  • Tweet
  • Tweet's like counts
  • Tweet's retweet counts
  • User's tweet
  • User's name

user_social_network:

  • User's followers count
  • User's follow count
  • User's name

Data Cleaning

  • Crawled tweets are filtered based on keywords "vaccinated" and "covid-19"
  • Tweets are stored in a list
  • Tweets are changed to lowercase and split into individual elements
  • Unwanted characters and URLs are removed using Regular Expressions library
  • Stop words are removed using NLTK library
  • Words are then sent for analysis

Analysis Results

1. Common words found in tweets

Schema

2. Sentiment analysis - Word cloud

Schema

  • After pre-processing Joe Biden’s tweets by removing stop words and url links, word cloud (on the right) was created using python in Jupyter notebook.
  • The size of each word indicates its frequency in his tweets. The bigger the word, the higher the frequency count. Other than the two search keywords “vaccinated” and “covid-19”, the most frequently appearing words are “get”, and “vaccine”.

3. Sentiment analysis - Polarity

Schema

  • Using the python module “Textblob”, we calculated the polarity score of his tweets with a range of -1 to 1. Values closer to 1 indicate more positivity, while values closer to -1 indicate more negativity.
  • We then categorized all polarity scores into 3 buckets with a positive tweet having a polarity score > 0, neutral tweet having a score of 0 and negative tweet having a score of < 0
  • From the pie chart on the left, we can observe that Joe Biden’s tweets are largely positive at 58%, hence, we can conclude his sentiments about vaccination for covid-19 are more positive.

4. Impact of President Joe Biden's tweets on vaccination rate

Tweet's impact on vaccination rate
Red lines indicate when the President tweeted about vaccination

  • From the previous observations we know that Joe Biden's sentiment towards getting vaccinated is positive and he also has a high level of influence on Twitter as compared to Boris Johnson on the same issue. 
  • The assumption then is that Joe Biden's tweets should have a positive and powerful impact on Covid-19's vaccination rate in USA.
  • However, as shown in the below graph of Joe Biden's tweet date vs number of new vaccinations, there is no obvious trend to support our assumption. It is therefore inconclusive. 

Limitations

  • Limited number of tweets pulled due to twitter constraints (7 days limitation)
  • Popularity score and reach score might not be a sufficient influence indicator of JoeBiden. Better indicator would be klout whereby it takes all his social media presence into account
  • To better measure the impact of Biden's tweet on vaccination rate, we would need to execute specific statistical analysis such as time series which is not in the scope of this project
  • Textblob may not be sophisticated enough to properly determine if a statement is truly "positive" or "negative"​
  • Accuracy of analysis may not be good enough as usually large amount of data is needed

Contributors

Hello there! We are a group of mid-career switchers undergoing a 3 months bootcamp in Business Intelligence and Data Analytics. This is our 2 weeks interim project where we showcased what we have learnt and brainstormed a solution to a given problem.

By Team Yellow Minions

Minion

minions's People

Contributors

sinlihchin avatar khloeli avatar olliechan92 avatar alaric-tan-92 avatar tonygao0125 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.