heliphix / btc_data Goto Github PK

This repository contains the code and datasets for creating the machine learning models in the research paper titled "Time-series forecasting of Bitcoin prices using high-dimensional features: a machine learning approach"

Home Page: https://doi.org/10.1007/s00521-020-05129-6

License: MIT License

Python 1.01% Jupyter Notebook 98.99%

bitcoin time-series blockchain prediction lstm svm cryptocurrency bitcoin-prices technical-indicators machine-learning

btc_data's Introduction

Time-series forecasting of Bitcoin prices

Models considered:

LSTM
SVM
SANN
ANN

Feature selection process is based on Pearson correlation, random forest, and variance inflation factor.

New analysis is submitted to the dev and the main branches.

The original code is archived in the "research_paper" branch.

btc_data's People

Contributors

Stargazers

Watchers

btc_data's Issues

Missing files?

Hello,

I have a question regarding jupyter notebooks, for example "Training_LSTM_cls.ipynb",
there are read functions called on files not present in repository (for instance "pca_75_clas.csv"),
is there a way to obtain them by running other part of code?

Thanks

LSTM model for regression

Hello,
I would like to ask you when you use the LSTM algorithm , do you use timestep=1 in the input shape of LSTM? LSTM models shouldn't have a bigger timestep because of their memory state?
Secondly, for the n-th day prediction, this timestep shouldn't be n? For example when we predict the 7th day, shouldn't we use timestep=7? What is the difference timestep=1 for one day forecast and for 7-th day forecast?

Thanks in advance!

Fix for datacollector.py

Hi,

I found your manuscript for this repository to be really interesting - thanks for publishing! I'm now trying to independently recreate the results to better understand how LSTM and Keras work within Python.

It appears that a small fix for datacollector.py is required for scraping from bitinfocharts.com due to changes on the remote side. Line 100 should now be values=soup.find_all('script')[4].string when using Python 3.8 and BS4 >=4.9.3.

Thanks,
J.

new complementary tool

I want to offer a new point of view, and my colaboraty

Why this stock prediction project ?

Things this project offers that I did not find in other free projects, are:

Testing with +-30 models. Multiple combinations features and multiple selections of models (TensorFlow , XGBoost and Sklearn )
Threshold and quality models evaluation
Use 1k technical indicators
Method of best features selection (technical indicators)
Categorical target (do buy, do sell and do nothing) simple and dynamic, instead of continuous target variable
Powerful open-market-real-time evaluation system
Versatile integration with: Twitter, Telegram and Mail
Train Machine Learning model with Fresh today stock data

https://github.com/Leci37/stocks-prediction-Machine-learning-RealTime-telegram/tree/develop

Testing on unseen data

In figure 9 of your paper, you show test results of forecasting prices after 31-12-2019.
I can find no code related to this in the notebooks. The biggest dataset I can find goes up until 2-2-2020, while the graph is until 5-2020.
Could you upload your remaining code?

Environment file available?

Hi,

I'm trying to reproduce you're results as indicated in the Feature_Selection_reg notebook but am finding that I'm getting slightly different results starting with running X=cmns.drop_high_vif(df_reduced,thresh=5) on line 130, even though I'm using the same BTC_Data_736_features_raw.csv file that was available in commit b80f8913e0. My guess is this is coming from slightly different versions of Python (I'm running 3.8) and related packages compared to what was in your manuscript.

Do you have an Anaconda environment (or other virtualenv) file from your original workflow that can be shared, so I can better understand how these discrepancies are arising?

Thanks in advance.

Generating technical indicators for intervals and periods

Hi,

After creating the the master BTC_Data.csv file, it needs to be broken down into the respective indicator files for the different intervals (1, 2, 3) and periods (1, 7, 30, 90 days etc). There seems to be a loose framework for the interval interval file generation in the Feature_Selection notebooks, but I just want to confirm the methodology before proceeding.

Do you already have this code in a loop that will generate each file automatically, or do the notebooks require manually editing for each iteration? If the latter, can you please clarify which lines need to be updated in Feature_Collection_reg.ipynb and Feature_Collection_cls.ipynb to generate all the different combinations of technical indicators on each run?

Thanks,
J.

'commons' package missing?

Hi,

Within the feature selection and training notebooks, the import statements specifically reference a commons package or file which doesn't appear to be available within the repo:

import commons as cmns

A quick search through the pipand conda libraries suggests this is a custom package - can you please clarify and supply if available?

Thanks in advance.

Outlier Handling

Assalamualaikum Warahmatullahi Wabarakatuh and Hello,

I've read the paper regarding this github file, and it says "Removing about 10% of the outliers increased model performance for most of the ML models. A few models performed well despite the outliers". But I'm unable to find the code regarding the outlier itself.

I'm currently doing a thesis for my undergraduate degree with the same topic, may I ask you what exactly did you do to the outlier itself? Or better yet may I ask for the code?

Thank you before hand, wassalamualaikum warahmatullahi wabarakatuh