Giter VIP home page Giter VIP logo

redditstockpredictions's Introduction

Project

The goal of our project is to understand the impact of social media posts on the future prices of individual stocks. We examined the impact of posts in the social media platform reddit.com within investment focused subreddits/forums using textual analysis techniques.

The files found here can be used to clean and apply sentiment to reddit posts and comments as well as create regression models using this data. We are also supplying the code which can be used for the interactive web application that is used to visualize real time sentiment and prediction for specific stocks tickers.

Data and Analysis

All analysis and data collection is found within CODE/Analysis

Obtain and Clean Data

Obtain reddit comments and posts from google big query. https://bigquery.cloud.google.com/dataset/fh-bigquery:reddit_posts?pli=1

Example queries for May 2018. Each table represents 1 month. Repeat as necessary.

SELECT * FROM fh-bigquery.reddit_posts.2018_05 where subreddit = 'wallstreetbets'; 
SELECT * FROM fh-bigquery:reddit_comments.2018_05 WHERE subreddit = 'wallstreetbets';

Place .csv files from big query into respective folders.

Run the following files in order.

CleanData.rmd > RedditSentiment.rmd > getStockData.rmd > Python Sentiment/vaderSentiment.py > FinalModel.Rmd > FinalModel_2.Rmd

MongoDB Cluster:

Scripts:

  1. Export comments collection (before sentiment was calcuated) mongoexport --db liztd -c comments --out comments.csv --type csv --fields "author_flair_css_class,distinguished,ups,subreddit,body,score_hidden,archived,name,author,author_flair_text,downs,created_utc,subreddit_id,link_id,parent_id,score,retrieved_on,controversiality,gilded,id"

  2. Export the submissions collection (before sentiment was calculated) mongoexport --db liztd -c submissions --out submissions.csv --type csv --fields "created_utc,subreddit,author,domain,url,num_comments,score,ups,downs,title,selftext,saved,id,from_kind,gilded,from,stickied,retrieved_on,over_18,thumbnail,subreddit_id,hide_score,link_flair_css_class,author_flair_css_class,archived,is_self,from_id,permalink,name,author_flair_text,quarantine,link_flair_text,distinguished"

  3. Export the aggregated sentiments collection mongoexport --db liztd -c reddit_sentiments --out sentiments.csv --type csv --fields "date,ticker,sumCompound,count,close,pct2,pred"

  4. Importing it into local: mongoimport --host mongodb://://dvafinalproject-anotq.mongodb.net/liztd -c reddit_submissions --type csv --headerline --file submissions_with_sentiments.csv mongoimport --db liztd -c reddit_comments --type csv --headerline --file comments_with_sentiments.csv

  5. Import to the cloud.mongodb.com shard mongoimport --host dvafinalproject-shard-0/dvafinalproject-shard-00-00-anotq.mongodb.net:27017,dvafinalproject-shard-00-01-anotq.mongodb.net:27017,dvafinalproject-shard-00-02-anotq.mongodb.net:27017 --ssl --username arvnan52 --password --authenticationDatabase admin --db liztd --collection reddit_submissions --type csv --file submissions_with_sentiments.csv --headerline mongoimport --host dvafinalproject-shard-0/dvafinalproject-shard-00-00-anotq.mongodb.net:27017,dvafinalproject-shard-00-01-anotq.mongodb.net:27017,dvafinalproject-shard-00-02-anotq.mongodb.net:27017 --ssl --username arvnan52 --password --authenticationDatabase admin --db liztd --collection reddit_comments --type csv --file comments_with_sentiments.csv --headerline mongoimport --host dvafinalproject-shard-0/dvafinalproject-shard-00-00-anotq.mongodb.net:27017,dvafinalproject-shard-00-01-anotq.mongodb.net:27017,dvafinalproject-shard-00-02-anotq.mongodb.net:27017 --ssl --username arvnan52 --password --authenticationDatabase admin --db liztd --collection sentiments --type csv --file sentiments.csv --headerline

Database connection and details

  1. Connect to the cloud.mongodb.com (arvnan52/hp..)
  2. The connection is enabled only from 2 IP's. a. My laptop b. The digitalocean server

mongo "mongodb+srv://dvafinalproject-anotq.mongodb.net/liztd" --username --password

3. Collections:

a. reddit_submissions
    db.reddit_submissions.createIndex({title: "text", selftext: "text", id: 1, created_utc: 1})
b. reddit_comments
    Indexes: db.reddit_comments.createIndex({parent_id: 'text', body: 'text', created_utc: 1, id: 1})
c. sentiments
This collection aggregates stock price with reddit sentiment analysis and final prediction

Digital Ocean droplet:

hostname: ubuntu-s-1vcpu-1gb-nyc1-01: 159.89.232.113

Domain:

liztd.com

The following fuctionalities were hosted on one ubuntu server hosted by digitalocean.

Daily Load:

The python script under CODE/liztd_python_load is setup as a cronjob to be executed every night.

Reddit Stream:

This is handled by the python script inside CODE/liztd_python_stream folder. This script has an open connection to monitor reddit 'wallstreetbets' stream and upload them into the mongodb database.

Tools:

PM2 - PM2 is a process mangement tool which is setup to keep the jobs running the scheduled time for data collection.

Web Application:

API:
    The python bottlepy based web server is hosted as an api to the database and the frontend. The project is present in CODE/liztd_python_api

UI: 
    The UI is created using ReactjS, evergreen library for UI components and Recharts for charting components. The scripts neccessary to run the web ui are 
    present at CODE/liztd_ui/readme.md file. 

redditstockpredictions's People

Contributors

dmilmont avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.