Bitcoin Sentiment in Large Publications

This repository holds code for a sentiment analysis of Bitcoin-related publications in traditional media (large, popular, print-based sources). The codebase consists of two tools: the web-scraping suite and the functional code which applies the sentiment analysis libraries to the corpus.

Code

Scraping Tool

The scraper has a basic terminal-based interactive component through which a user can choose a scraping source and keywords. The available sources are NTY, CNN, BBC & Reuters. This tool performs a keyword search on the source, collects article hyperlinks, and then extracts article text from the specific webpages, respectively.

Sentiment Tool

To derive sentiment scores from the scraped text data, two 'out-of-the-box', unsupervised methods are employed: VADER and TextBlob sentiment libraries. These differ from one another somewhat but provide a similar result in applying pre-trained sentiment polarity values for words known to the model within a given article. Both methods have basic functionality for taking context into account (i.e. negation and so forth).

Data

2053 articles with keyword = ‘bitcoin’

BBC (318 stories – avg. length 451 words)
NYT (402 stories – avg. length 1011 words)
CNN (720 stories – avg. length 379 words)
Reuters (602 stories – avg. length 544 words)

Time range from 2011 to May, 2019

BBC: June, 2011 – May, 2019
NYT: January, 2012 – May, 2019
CNN: August, 2012 – May, 2019
Reuters: April, 2012 – May, 2019

Output

Each article receives a sentiment polarity score. Articles are then aggregated in rolling time windows (monthly & fortnightly) to create smooth time series, which are plotted against Bitcoin prices. Additionally, visualisations on publishing frequency and comparison plots between VADER and TextBlob scores are also produced.

Publishing Frequency Over Time

Sentiment vs. BTC Price

Sentiment scores are calculated using both VADER and TextBlob polarity scoring. For each data source below, the first plot shows the VADER scores, while the second shows those derived via TextBlob. Cursory causality testing using Granger Causality Tests indicated that, amongst these selected sources, BTC price is more a driver of news sentiment than the other way around.

Aggregate

BBC

CNN

NYT

Reuters

VADER / TextBlob Comparison

The general comparison to note between the two methods is that TextBlob produces consistently lower values than VADER in its polarity scoring. What is interesting, however, is that this variance between the two methods is not equal from source to source (e.g. Reuters scores from both methods are much more similar than those from other sources). This indicates that a particular style of journalism might react with these unsupervised sentiment scorers with more volatility than others – presumably based on word choice.

alextruesdale / bitcoin-news-sentiment-analysis Goto Github PK

bitcoin-news-sentiment-analysis's Introduction

Bitcoin Sentiment in Large Publications

Code

Scraping Tool

Sentiment Tool

Data

Output

Publishing Frequency Over Time

Sentiment vs. BTC Price

Aggregate

BBC

CNN

NYT

Reuters

VADER / TextBlob Comparison

Aggregate

BBC

CNN

NYT

Reuters

bitcoin-news-sentiment-analysis's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org