Giter VIP home page Giter VIP logo

stock-prediction's Introduction

Stock Prediction Build Status

A complete machine learning data pipeline for training TensorFlow models to forecast stock prices. Written in Python.

Goal: given stock data (opening, closing and indicators), predict next day's adjusted closing price

Predictions for MSFT:

Mean Relative error: 11.53%

Orange line: predictions

Blue line: actual

Running the pipeline

First, make sure dependencies are installed:

$ pip install -r requirements.txt

The pipeline is outlined in scripts/run.py. To run the complete pipeline to train a neural network model for each stock (i.e. fetch, preprocess, train, evaluate), run:

$ python scripts/run.py -f -p -nn

Running jobs separately

-f, --fetch: run the data fetch job, which fetches stock data and financial indicators for each stock symbol, joins them together, then saves the data to a csv file in output/raw

-p, --preprocess: run the preprocessing job. Data must already exist in output/raw. This job creates the label dimension and shifts it one day down. It then splits the data into 80% training and 20% testing sets. After that, a last observed carried forward procedure is performed to fill in the missing data. Finally, a scikit-learn MinMaxScaler is applied to each column to scale the dataset.

-nn, --neuralnetwork: trains a neural network model for each stock using TensorFlow. Then runs a simply evaluation on the test set to calculate the relative error. Models are saved in output/models

--evalnn: runs evaluation using the test data set. Gives MSE and relative error.

Pipeline

The pipeline consists of the following stages

  1. Data fetching from AlphaVantage stock quotes API
  2. Data preprocessing - splitting, scaling/normalization, last observed carried forward and shifting
  3. Training various supervised learning models, a separate model is trained for each stock
    • Neural Networks
    • AdaBoost regressors
    • Gradient boosting regressors
    • Random forest regressors
  4. Model evaluation - loss and relative error

Structure of data

the stock data for S&P 500 companies includes the daily adjusted time series data as well as 51 financial indicators. The adjusted closing is used as the label and shifted one day down.

Models used

Neural Networks

A TensorFlow 5 layer Neural Network is used. The 3 hidden layers have 64, 32 and 16 neurons respectively to better fit the input dimensions. Rectified Linear Units are used as activation functions. The Mean Squared Error is used as the loss function and the AdamOptimizer is used to compute the gradients.

Boosting Regressor

Random Forest Regressor

Future Improvements

  • Sentiment analysis on Twitter and news

stock-prediction's People

Contributors

jsun98 avatar alextanjh avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.