Minute by Minute Trading Project

Building a Minute by Minute Trading Engine with Neural Networks

This is the repository for Project Kojack, my final project for Metis. Please read this file as a guide to the code in the repository.

stage1.ipynb: This notebook assumes that you have downloaded the folders from the xetra file in Deutsche Börse's Amazon S3 bucket, and that those folders are in the same directory as this notebook. This notebook reads the data into a dataframe keeps the data for the 10 most frequently traded securities. These happen to all be company stocks.

stage2.ipynb: This notebook assumes that the pickle file generated from stage1.ipynb is in the same directory as this notebook. This notebook reads in the data from that pickle file, and fills in data for times when any given stock was not traded.

stage3.ipynb: This notebook assumes that the pickle file generated from stage2.ipynb is in the same directory as this notebook. This notebook reads in the data from that pickle file, and adds lagged variables as predictors for the initial classification models. Note, after adding the lagged variables, the data gets too big for a single pickle file, so this notebook produces 10 pickle files.

RegClass.ipynb: This notebook assumes that the pickle files generated from stage3.ipynb are in the same directory as this notebook. This notebook reads in the data from those pickle files and tries out a logistic regression predictor and a random forest predictor. They don't perform all that well. Note: these models were allowed to assume that 1) I can buy and sell stocks without changing the price and 2) there are no transaction costs. Even with those assumptions, they didn't perform well, so it would not be advisable to pursue using these models.

stage3n.ipynb: This notebook assumes that the pickle file generated from stage2.ipynb is in the same directory as this notebook. This notebook reads in the data from that pickle file and prepares it for input into an LSTM neural network. Namely, it deletes most variables and introduces log differences in price, which are nice to work with because 1) they capture price movements in a way that turns multiplication into addition, 2) their sign is the same as the sign of the price movement, and 3) capture relative price movements, as opposed to absolute price movements, i.e. log(300)-log(299) is a lot smaller than log(2) - log(1), even though 300-299=2-1. Additionally, neural networks like normalized data, and log differences in price, when the granularity is minute by minute, will be in a very small range.

stage3n-withSpread.ipynb: This notebook assumes that the pickle file generated from stage2.ipynb is in the same directory as this notebook. This notebook reads the data from that pickle file and prepares it for input into an LSTM neural network. It differs from stage3n.ipynb in that in prepares the data in such a way that the neural network can take transaction costs into account. Namely, I'm assuming a bid-ask spread of .2 cents. I do this by adding .001 to all start prices and .001 to all end prices.

Kojack: This directory contains the scripts necessary for experimenting with LSTM neural networks in the scope of this project. Inside this directory:

picfix.ipynb: In order to utilize the gpu capabilities when building these LSTMs, I initialized 10 gpu EC2 instances with an AMI, shared with me by Julia Lintern, with the necessary installations to run scripts that use Keras while utilizing gpu capability. However, the environment that this AMI uses is a python 2 environment, and the stage3n.ipynb notebook file runs in a python 3 environment. So, the python 3 pickle written by the stage3n.pynb notebook needs to be converted into a python 2 pickle before it can be used by the LSTM scripts. This notebook makes that conversion, assuming the pickle file generated by the stage3n,ipynb notebook is in the same directory as this notebook.

MscriptXY.py, where X is a number from 0 through 9 and Y is either 0 or 1 (20 of these total): These scripts contain the code necessary to build and test the neural network models. X here is the stock number, from 0 through 9. These are broken up into separate scripts because I ran them on separate g2.2xlarge EC2 instances in parallel. Each script saves a pickle file with the results of the test. These scripts don't just try and predict price movements 1 minute ahead. They also try and predict price movements 5 minutes ahead, 10 minutes ahead, etc.., for a total of 18 different forward periods. Scripts with a Y value of 0 test the first 9 forward periods, while scripts with a Y value of 1 test the last 9 forward periods. The results from each script are written to a pickle file titled 'stockXY.pkl', with X and Y corresponding to the X and Y of the python script file name. Note, each of these scripts assumes that in the directory, there exists a pickle file of the name 'check3x1.pkl', generated by the picfix.ipynb notebook. Each one of these takes about an hour and a half to run on a g2.2xlarge EC2 instance, assuming you're utilizing the gpu capabilities on the backend.

pickles: This directory contains the pickle files that were generated by the MscriptXY.py scripts when I ran them. These are python 2 pickles. In addition, this directory also contains:

EdaStage2.ipynb: This notebook assumes that all the pickle files generated by the MscriptXY.py scripts are in the same directory as this notebook. Here, I check out the results of the LSTM. The takeaways that looking ahead 1 minute gets the best results, that I can get a 74% annualized rate of return using LSTMs, and that I got a postive rate of return for 9 out of 10 companies, suggesting that the success of this model is not due to random chance.

KojackSpread: This directory contains the scripts necessary for experimenting with LSTM neural networks in the scope of this project, this time assuming a bid-ask spread of .1 cents a share. Inside this directory:

SpreadPicFix.ipynb: In order to utilize the gpu capabilities when building these LSTMS, I initialized 10 gpu EC2 instances with an AMI, shared with me by Julia Lintern, with the necessary installations to use gpu capabilities. However, this AMI uses a python 2 environment, and the stage3n-withSpread.ipynb notebook uses a python 3 environment and writes a python 3 pickle. Accordingly, the python 3 pickle needs to be converted to a python 2 pickle before it can be used by these scripts. This notebook makes that conversion, assuming the pickle file written by the stage3n-withSpread.ipynb notebook is in the same directory as this notebook.

MscriptX.py, where X is a number from 0 through 9: These scripts contain the code necessary to build and test the neural network models, this time assuming a bid-ask spread of .2 cents. X here is the stock number, from 0 through 9. This is broken up into separate scripts because I ran them on separate g2.2xlarge EC2 instances in parallel. Each script saves a pickle file with the results of the test. These scripts don't just try and predict price movements 1 minute ahead. They also try and predict price movements 10 minutes ahead, etc.., for a total of 12 different forward periods. The results from each script are written to a pickle file titled 'stockX.pkl', with X corresponding to the X of the python script name. Note, each of these scripts assumes that in the directory, there exists a pickle file of the name 'check3x1WSpread.pkl', generated by the SpreadPicFix.ipynb notebook. Each one of these scripts takes about an hour and a half to run on a g2.2xlarge EC2 instance, assuming you're utilizing the gpu capabilities on the backend.

pickles: This directory contains the pickle files that were generated by the MscriptX.py scripts when I ran them. These are python 2 pickles. In addition, this directory also contains:

wSpread.ipynb: This notebook assumes that all the pickle files generated by the Mscript. Here, I check out the results of the LSTM models, this time assuming a spread of .2 cents. The takeaways are that looking ahead 10 minutes gets the best results, and that I got a 24% annualized rate of return using LSTMs, accounting for a spread. Now granted, This generated a negative rate of return for 5 out of 10 companies, BUT, for almost every forward period, the rate of return was positive, and for the one forward period with a negative rate of return, it was just barely negative, at -0.7% annualized. It would be interesting to investigate this further with more data. When I pulled the data from Deutsche Börse's public S3 bucket, only 1 month was available. If I ever check back and find there's more data available, I may investigate this further.

riyersk / hftrading Goto Github PK

hftrading's Introduction

Minute by Minute Trading Project

hftrading's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent