Giter VIP home page Giter VIP logo

predicting-stocks-with-twitter-sentiment's Introduction

Predicting the Stock Market using the BERT Model and Sentiment Analysis of Live Tweets

Video Presentation on YouTube:

Video Presentation

Abstract

Predicting the best stocks to buy while day-trading the stock market is a dream for many seeking financial freedom. While there are heaps of existing day-trading strategies, many of them involve long hours of strict human focus on the numerical trends of stock prices and the contextual trends from news sources that directly correlate to the future value of publicly traded companies. We live in an age where there exists a subcategory of machine learning that can help reduce the manual burden of analysis and automate tasks that predict trends quicker than any human. Enter Natural Language Processing (NLP) and the Transformer model for sentiment analysis. This article discusses the use of an NLP pipeline that trains an altered BERT model with tweets from Twitter to make predictions about their bullish and bearish sentiments concerning stocks. The pipeline then conducts trades based on a user’s predefined portfolio and desired risk level.

Usage

Twitter API

Before running this code, you will need make a devloper account on Twitter to get access to API keys needed to communicate with their API.

Alpaca API

You will also need to make a stock trading account on Alpaca to get access to API keys needed to communicate to their API. If you want to use fake money, then be sure to make a "paper-trading" account.

Saving Models and Outputs for Re-use

The program automatically saves the following files after training the model. You will need to generate the following files by initially training the model yourself by inputting "train_new_model" because the files containing my pre-trained models were too large to post on GitHub.

  • Neural network tensors: twitter_sentiment_model.pth
  • Model outputs and predictions: twitter_sentiment_outcomes.csv

Running the Code

Run all cells in jupyter notebook. After training the model yourself for the first time, you may choose to input "p" to use your automartically saved pre-trained model or "train_new_model" to overwrite those files by training it yourself with new parameters. Next, input whether you are running the code on Google Colab's GPU or locally on your own CPU. Google Colab's GPU is necessary for training the model initially. Subsequent runs can be done on your local CPU using your saved pretrained model. Running on Colab's CPU (instead of GPU) will throw an error, which can be handled by altering the code.

Directory Structure

The data folder is the only sub-directory, and it contains the stock_data.csv file containing the 5790 tweets with labeled sentiment for training and testing the model.

The stock_trading_decisions.txt and portfolio_performance.txt exist in the folder for keeping track of the stock trades that have occured and the balance of the account.

The api_credentials.py file is where I kept my API credentials for secrecy. Another user can place their credentials in this file or they can instead be placed directly into the code.

The requirements.txt file contains the command line arguments necessary to install all of the necessary packages to run the main program. The main program will automatically read this file and install any packages not already installed on your computer. You do not need to manually install these packages.

The sentiment_model.ipynb is the main file that contains the entire NLP pipeline. It contains approximately 700 lines of code.

predicting-stocks-with-twitter-sentiment's People

Contributors

ericzacharia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.