Giter VIP home page Giter VIP logo

bitcoin-news-sentiment-analysis's Introduction

BTC logo

Bitcoin Sentiment in Large Publications

This repository holds code for a sentiment analysis of Bitcoin-related publications in traditional media (large, popular, print-based sources). The codebase consists of two tools: the web-scraping suite and the functional code which applies the sentiment analysis libraries to the corpus.

Code

Scraping Tool

The scraper has a basic terminal-based interactive component through which a user can choose a scraping source and keywords. The available sources are NTY, CNN, BBC & Reuters. This tool performs a keyword search on the source, collects article hyperlinks, and then extracts article text from the specific webpages, respectively.

Sentiment Tool

To derive sentiment scores from the scraped text data, two 'out-of-the-box', unsupervised methods are employed: VADER and TextBlob sentiment libraries. These differ from one another somewhat but provide a similar result in applying pre-trained sentiment polarity values for words known to the model within a given article. Both methods have basic functionality for taking context into account (i.e. negation and so forth).

Data

2053 articles with keyword = ‘bitcoin’

  • BBC (318 stories – avg. length 451 words)
  • NYT (402 stories – avg. length 1011 words)
  • CNN (720 stories – avg. length 379 words)
  • Reuters (602 stories – avg. length 544 words)


Time range from 2011 to May, 2019

  • BBC: June, 2011 – May, 2019
  • NYT: January, 2012 – May, 2019
  • CNN: August, 2012 – May, 2019
  • Reuters: April, 2012 – May, 2019

Output

Each article receives a sentiment polarity score. Articles are then aggregated in rolling time windows (monthly & fortnightly) to create smooth time series, which are plotted against Bitcoin prices. Additionally, visualisations on publishing frequency and comparison plots between VADER and TextBlob scores are also produced.

Publishing Frequency Over Time

publication_frequency

Sentiment vs. BTC Price

Sentiment scores are calculated using both VADER and TextBlob polarity scoring. For each data source below, the first plot shows the VADER scores, while the second shows those derived via TextBlob. Cursory causality testing using Granger Causality Tests indicated that, amongst these selected sources, BTC price is more a driver of news sentiment than the other way around.

Aggregate

rolling_vader_btc_agg

rolling_blob_btc_agg

BBC

rolling_vader_btc_bbc

rolling_blob_btc_bbc

CNN

rolling_vader_btc_cnn

rolling_blob_btc_cnn

NYT

rolling_vader_btc_nyt

rolling_blob_btc_nyt

Reuters

rolling_vader_btc_reuters

rolling_blob_btc_reuters

VADER / TextBlob Comparison

The general comparison to note between the two methods is that TextBlob produces consistently lower values than VADER in its polarity scoring. What is interesting, however, is that this variance between the two methods is not equal from source to source (e.g. Reuters scores from both methods are much more similar than those from other sources). This indicates that a particular style of journalism might react with these unsupervised sentiment scorers with more volatility than others – presumably based on word choice.

Aggregate

vader_blob_agg

BBC

vader_blob_bbc

CNN

vader_blob_cnn

NYT

vader_blob_nyt

Reuters

vader_blob_reuters

bitcoin-news-sentiment-analysis's People

Contributors

alextruesdale avatar

Stargazers

monchi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.