Giter VIP home page Giter VIP logo

covid19sentiments's Introduction

COVID-19 Sentiment Analysis

Using Twitter data to perform sentiment analysis on the tweets on the COVID-19 virus.

Background

The recent Coronavirus COVID-19 has created a global emergency demanding social distancing, working from home, and self quarantining to control the spread of the virus. As a result, in my spare time I decided to explore how people are reacting and talking about the pandemic online.

Data

For this project, 10000 tweets with the "COVID-19" related keywords between March 20 to March 28 in 2020 were fetched for analysis.

Keywords

Generic

#COVID19
#covid19
#coronavirus
#CoronaVirus

More Specific

#coronavirustexas
#Coronavirustexas
#coronavirusnewyork
#coronaviruscalifornia

I. Non-API Method of data collection using Python3

Twitter API has limits which vary over time and currently allows one week's data. Some packages allow to collect historical Twitter data.

Package used here: GetOldTweets3

This non-API method scrapes Twitter data based on Twitter search results by parsing the result page with a scroll loader, then calling to a JSON provider. While theoretically it can search through oldest tweets and collect data accordingly, the number of variables are limited to the layout of search results.

  1. Creating virtual environment (optional) and install GetOldTweets3 package using pip
python3 -m venv env
source ./env/bin/activate 
python3 -m pip install GetOldTweets3

Alternatively,

pip3 install -e git+https://github.com/Mottl/GetOldTweets3#egg=GetOldTweets3
  1. Collecting Twitter data.

GetOldTweets3

## Keyword search
GetOldTweets3 
  --querysearch "COVID19 covid19 Coronavirus coronavirus" 
  --lang en 
  --since 2020-03-20 
  --until 2020-03-28
  --maxtweets 10000 
  --output data/corona_0320_0327.csv

II. API Method using R

  1. Acquire API key and token from Twitter developer website
  2. Install and load the required R package(s) for collecting and vizualizing Twitter data. Examples: rtweet, twitteR, vosonSML. rtweet gives most detail in twitter variables (> 90).
rtweet, ggmap, igraph, tidyverse, ggraph, ggplot2, data.table, maps, mapdata
  1. Store and check the API keys/tokens
  2. Check token
  3. Search using query
  4. Preview data
  5. Time series plot

Sentiment Analysis

Preprocessing

  1. The Twitter data obtained is converted to a data frame.
  2. The text of the tweets is tokenized, i.e. broken into words. Each row is split such that there is one token (word) in each row of the new data frame.
  3. The stopwords are removed from the data.
  4. The typical keywords are removed from the data.
  5. Sentiment words from the Bing Lexicon are used for analysing the tweet words.

Visualization

Most commonly used words in the tweets

common_words

Word cloud of the Sentiment words in the tweets

sentiment_words

Classification of the Sentiment words in the tweets

sentiment_words_class


Insights

Overall, the tweets convey a mixed, but slightly optimistic sentiment - with the relatively high frequency of words such as "positive", "safe", "relief", "support". The most frequent words are related to "lockdown", "health", "stayhome", "deaths" stood out among the other words, which suggests that people are talking more about the health and deaths, are much more concerned about the measure to deal with it.

covid19sentiments's People

Contributors

likarajo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.