robertmartin8 / machinelearningstocks Goto Github PK

Using python and scikit-learn to make stock predictions

License: MIT License

Python 100.00%

stock-prediction machine-learning scikit-learn python yahoo-finance stock historical-stock-fundamentals quantitative-finance algorithmic-trading trading

machinelearningstocks's People

Contributors

Stargazers

Watchers

Forkers

kppamy zjpd mahmoudabbas86 syip cyanable clustersdata amol-kkwieer alxsoares llyspy bwuebben hybridious itproto l-will ziptrade afederici mikekiwa ianmadlenya lordlochinvar ssekhar2017 jzbonner markcheno joddm steinersimon amritsinghbains senomoy chan5050 pyfrog ml-finance deepakagra harrykrishnat rupeshparab opensource-trade dannliuu kkevin880 chetan009 chanyk-joseph surinder432 abhinavraj23 vskynet rbcwmlab quantanjali jackmoody11 lijielife awasthimaddy abraichdata bhavikbhavsar kahthong surendra-patil borngovaert ecbc1 noisyoscillator mishaplotkine brucedai003 thuruv pmurali1729 mfrigillana sahanduiuc damonclifford plin1112 sahil5674 guozihao1020 python-z yushu-liu allensmile yunweidashuju atsnova falconzyx morepainmoregainbytp yangyi0959 nearer rahul-apple jason08 lishin1980 beaquant helong20180725 wokezombie qiucichen calury xaviergoby keerthana40 sanazmousavi vc2014 yolandazeng nagyist fung1091 brbart quochien cyrustse drharitaparikh tomkaul tookennysupreme not-a-dev0 ideaplexus corneliussigei guangmo cggarvey saryd harshith246 fsonmez kkc-krish

machinelearningstocks's Issues

More unit tests

Although I have written some unit tests, right now they really only check the datasets and some of the helper functions.

I would much appreciate any help in testing this project's core functionality.

When I first built this project, I retrieved my stock data from Quandl. However, this was about 2-3 years ago, so the API has changed considerably. As such, I strongly suspect that this code is broken (though I haven't tested it lately).

In any case, Quandl may not be the best datasource: I am keen to instead use pandas-datareader with the yahoo-finance fix (https://github.com/ranaroussi/fix-yahoo-finance).

intraQuarter/_KeyStats/ issue

Hi Robert, I noticed the intraQuarter/_KeyStats/ data is used,but I cannot find it,could u tell me where it is or
how I can get it?thx so much!

Adaptation request

Hello, I luckily ended up on your project as I'm looking at scraping data from Yahoo Finance for a list of quotes (not only S&P500). I was wondering if there was a way to get a part of your script adapted to my needs?
i.e. I've got a list of quotes available in a .txt file. I currently use the YahooFinancials python api but I realised that some key figures are missing, such as "Cash, Debt, Levered free cash flow...etc".
So far, I'm collecting the data using that custom python script and then dump as a JSON file.
Would you be able to help me? Thanks :)

Question

Hi Robert,

This is another fine project together with your excellent portfolio optimisation work.
I hope you don't mind me asking a question as someone inexperienced in this area regarding the amount of fundamental data required to produce a viable model.

My local exchange is London (LSE/FTSE) and getting historic fundamentals is hard. I am able to extract these day by day but it will take some time to produce a significant amount.

So I was wondering, how many days would I need to have processed for a viable classification model? Say 6 months, 3 months etc. I have many fields but at present this goes back a week.

Thank you in advance

Fig

The CSV link isn't downloadable. Please check.

How to test for NSE index (india)

"from utils import data_string_to_float" import error

could not successfully import "from utils import data_string_to_float", seemed to work after replacing with:

import utils

Using yfinance

Hello, thanks you for all the great work done in this repo. I would suggest that you use the finance library that's gets the data from Yahoo Finance fairly easily and is much faster and more accommodating than pandas_datareader. It also has a load of other functions that might make your life easier. I would love to contribute to this repo.

Error en pytest -v

tests/test_datasets.py::test_forward_sample_dimensions PASSED [ 11%]
tests/test_datasets.py::test_forward_sample_data PASSED [ 22%]
tests/test_datasets.py::test_stock_prices_dataset PASSED [ 33%]
tests/test_datasets.py::test_stock_prediction_dataset PASSED [ 44%]
tests/test_utils.py::test_status_calc PASSED [ 55%]
tests/test_utils.py::test_data_string_to_float PASSED [ 66%]
tests/test_variables.py::test_statspath PASSED [ 77%]
tests/test_variables.py::test_features_same FAILED [ 88%]
tests/test_variables.py::test_outperformance PASSED [100%]

=================================================== FAILURES ====================================================
______________________________________________ test_features_same _______________________________________________

def test_features_same():
    # There are only four differences (intentionally)

  assert set(parsing_keystats.features) - set(current_data.features) == {'Net Income Avl to Common', 'Qtrly Earnings Growth',

                                                                           'Qtrly Revenue Growth', 'Shares Short (as of',
                                                                           'Shares Short (prior month)'}

E AssertionError: assert {'Net Income ...Short (as of'} == {'Net Income A...prior month)'}
E Extra items in the right set:
E 'Shares Short (prior month)'
E Full diff:
E {'Net Income Avl to Common',
E 'Qtrly Earnings Growth',
E 'Qtrly Revenue Growth',
E - 'Shares Short (as of'}...
E
E ...Full output truncated (5 lines hidden), use '-vv' to show

tests/test_variables.py:17: AssertionError
====================================== 1 failed, 8 passed in 11.06 seconds ======================================

SET (help wanted - can't find label )

Was really impressed with this file. Most just look at historical price, but love what you have done with including a deeper assessment of the company.

I live in Bangkok and follow the Thai market. I can already get Thai historical prices, but how to change S&P500 to SET ( In yahoo I thought it would be SET.bk ( as stocks are .bk ), but does not work

Thanks

Add some unit testing!

In the interests of stability and best practice, it would probably be a good idea to add some basic unit tests.

Only problem is I have to find a way to write meaningful tests without actually having to download all of the data etc each time I run a test.

Question

Hello. Is it possible to integrate this with NASDAQ/IQ Option?

More KeyStats

Hi! I just cloned your project and am messing around with it. Though I am an experienced software engineer, I am new to machine learning so feel free to tell me my insights are incorrect!

After reading the code I noticed prediction modeling heavily relies on the KeyStats, however data is extremely limited. Would it not be SUPER beneficial to back fill this data with a record per quarter (the provided data is very erratic, yet most 'feature' data points are provided be the company every quarter).

In addition to this, a cron or a simple get_missing_quartly_keystats.py script that can be invoked on demand to fill in new stats to accommodate longevity and modern accuracy of this project would help this project modeling become more accurate (more data sets), but also bring it closer to becoming a practical live use tool.

Most of the historical quarterly features data points can be found directly or through calculations on https://www.macrotrends.net/. Example: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/financial-statements

There are many categories with sub categories that can most likely be scraped and parsed. For example, the full historical market cap chart served here: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/market-cap
can be parsed out as in the html is a <script> tag that defines var chartData with all the values by date.

between the balance sheets and financial records they provide you may even find other influential data points to add to the ML portion of this script.

Let me know what you think, or if my logic is simply way off. If you think it is a good Idea I can help out with refactoring!

Data download missing data

Based on some feedback received and subsequent experiments, it seems that the data download is missing out a lot of tickers (and if it's missing out the SPY, there will be an error in parsing_keystats.py).

This project downloads price data for free from Yahoo Finance, via pandas-datareader (and fix-yahoo-finance). However, I've noticed lately that the data is becoming a lot more inconsistent, and sometimes just fails completely. This is because Yahoo seems to be dropping their support for this API.

The data on yahoo is still there, it's just a problem of accessing it. In the past I wrote a blog post about downloading data from the linked source, but 'deprecated it' once I realised that pandas-datareader with fix-yahoo-finance did the same thing but much better. My method still works, but it won't be trivial to integrate it with the project (and anyway it's a very clunky solution). I suppose that the easiest solution is to find another data source, so suggestions would be welcome.

As a temporary fix, I have added the csv files (containing all the data) to this repo.

Requests.get() no longer working with Yahoo Finance.

Since the past 1 year, it seems that 'requests.get()' has stopped working with yahoo finance.

@robertmartin8 may you please guide us on how we can get around this error?

Thanks

Error

There is error zero-sized array to reduction operation maximum which has no identity
in download_historical_prices.py

Stuck downloading historical prices

Hello, thank you for writing such an interesting repository. Could you assist me with an issue with running the command, python download_historical_prices.py. Appears to be stuck at 80% and not proceeding.. Thank you!

Cannot receive data

I entered this code in and the data doesn't return anything.

from pandas_datareader import data as pdr

import fix_yahoo_finance as yf
yf.pdr_override()
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")

I've got an error when I try to run stock_prediction.py

Hello,

I've got the following error when I try to run stock_prediiction.py I already tried in Linux Centos 7 and Windows 10 my python version is 3.6.5 I followed all the instructions step by step . The others files runs fine.

[root@customiseta MachineLearningStocks]# python3.6 stock_prediction.py
Building dataset and predicting stocks...
Traceback (most recent call last):
File "stock_prediction.py", line 55, in
predict_stocks()
File "stock_prediction.py", line 42, in predict_stocks
y_pred = clf.predict(X_test)
File "/usr/lib64/python3.6/site-packages/sklearn/ensemble/forest.py", line 538, in predict
proba = self.predict_proba(X)
File "/usr/lib64/python3.6/site-packages/sklearn/ensemble/forest.py", line 578, in predict_proba
X = self._validate_X_predict(X)
File "/usr/lib64/python3.6/site-packages/sklearn/ensemble/forest.py", line 357, in validate_X_predict
return self.estimators[0]._validate_X_predict(X, check_input=True)
File "/usr/lib64/python3.6/site-packages/sklearn/tree/tree.py", line 373, in _validate_X_predict
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
File "/usr/lib64/python3.6/site-packages/sklearn/utils/validation.py", line 462, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 41)) while a minimum of 1 is required.

running problem

actually how to run this code I install all the libraries but also it gives error. could you please help me to run this code.

Train - test split (allready seen samples)

Hello,

First of all great work Robert.

I find one big mistake ( everyone do that ) in backtesting.py -> row 40 - u are using shuffle = True ( by default is true in train_test_split ) and when u doing i+1 or i+x targets data is already seen when doing learning. Because of that u get always different result when running backtesting.py. If u change shuffle = False u will get 45-50% less of trades and Accuracy score will drop to 0.6/0.65 max.

Best

Naming conventions

Within the project there are many inconsistent naming conventions (just look at the top level python files!)

Fix this according to the holy laws of PEP8 and human decency.

running problem

Benchmark data

Hi Robert,

I was reviewing your most excellent work earlier and was wondering..

What index did you use to generate the sp500_index.csv data?

Was this S&P 500 (^GSPC) and did you preprocess or scale this data.

The reason I ask is that the data in the 200-207 range looks on the low side.

Thanks!

Fig

syntax error - download_historical_prices.py

ubuntu 16.04

✘-1 ~/MachineLearningStocks [master|✚ 1…21968] 
05:02 $ python download_historical_prices.py
  File "download_historical_prices.py", line 35
    print(f"{len(missing_tickers)} tickers are missing: \n {missing_tickers} ")
                                                                             ^
SyntaxError: invalid syntax

Command-line interaction

Add `if name = "main" ' to most of the files to improve command line access

；

Refactor current_data.py

current_data.py extracts the current financials of a company by scraping yahoo finance.

However, if you look at the file you will see that it is a hard-coded mess, filled with code smell and repetition. In the spirit of python, this can and should be fixed. I do have a fix ready on one of my recent versions of this project, but I will have to backwards-integrate it.

neew complementary tool

My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.

I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.

The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.

With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).

I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.

Historical Fundamental Data

Robert, Just discovered your MachineLearningStocks. Not an issue but a suggestion on fundamental
data sources, The American Association of Individual Investors has a product (Stock Investor Pro) with a
reasonable subscription fee of US $198/year after a membership fee of $29/year. A subscriber has
access to both current and weekly non survivorship biased historical back to 2004 for ~2000 fundamental factors for ~6000 equities. It takes a significant effort to download and put the data
into a usable format. I have been using this data source in a personal Python based stock back tester
and screener for personal investing for 14 + years. Interestingly I too am wading through Eremenko Krill's Machine Learning and Deep Learning and have just purchased a GPU card with the long term
intent of adding ML stock selection to my current system.

Backtesting issue

Interested in this project and possibly working on it more. Just starting out with ML but I was curious to try and figure out the issue with the backtesting. From what I can tell it is that you are training the model on future data but then making predictions for stocks in the past...

It seems like the solution would be to first, randomly select the year you'd like to predict and then ensure the spit for both training and test is only run on years before that. Just wanted to check and see if I'm right about at least the issue. Feel free to drop me an email (on my profile) if you'd rather talk there, I know you said you want to let other people try and figure it out.

backtest

I would like to contribute to this project and have read through the readme in detail.

I have noticed you speak about a fatal flaw in the backtest, what is it? I can work on this and submit a PR.

What is the best way to get more updated key statistics data?

Hello,

I tried using the script but the data for key statistics provided by Sentdex is a bit outdated. Does anyone know an API or a URL where we can get more fresh data? I am willing to submit a PR with this solution is someone provides me with enough info so I can implement it?

Thanks,
Aleksandar Serafimoski

Ticker List Confused

I do not see how to get the ticker list. There isn't really much documentation on it. I can get the prices for SPY, but the director does not work. Is this something I have to get by myself?

Please let me know.

keystats.csv

Hello, when I run command python parsing_keystats.py, keystats.cvs is created, but it's empty(except for the Date, unix, ticker and etc.). I know for sure that stock prices are downloaded and updates as I change date to present. Please can you help me as I tried everything to solve this.

Improve documentation

Let's have some clear commenting and a much improved README so new users can understand exactly what's going on.

Test Failure

I get the below error when doing the pytest. I'm not sure why this is occurring.

pytest -vv
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.6.5, pytest-3.4.1, py-1.5.3, pluggy-0.6.0 -- /home/chris/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/chris/Documents/Stocks/MachineLearningStocks, inifile:
plugins: remotedata-0.2.1, openfiles-0.3.0, doctestplus-0.1.3, arraydiff-0.2
collected 9 items

======================================================================================= FAILURES =======================================================================================
__________________________________________________________________________________ test_features_same __________________________________________________________________________________

def test_features_same():
    # There are only four differences (intentionally)

  assert set(parsing_keystats.features) - set(current_data.features) == {'Qtrly Revenue Growth', 'Qtrly Earnings Growth',

                                                                           'Shares Short (as of', 'Net Income Avl to Common'}

E AssertionError: assert {'Net Income ...prior month)'} == {'Net Income A...Short (as of'}
E Extra items in the left set:
E 'Shares Short (prior month)'
E Full diff:
E {'Net Income Avl to Common',
E 'Qtrly Earnings Growth',
E 'Qtrly Revenue Growth',
E - 'Shares Short (as of',
E ? ^
E + 'Shares Short (as of'}
E ? ^
E - 'Shares Short (prior month)'}

tests/test_variables.py:17: AssertionError
========================================================================= 1 failed, 8 passed in 15.02 seconds ==========================================================================

robertmartin8 / machinelearningstocks Goto Github PK

machinelearningstocks's People

Contributors

Stargazers

Watchers

Forkers

machinelearningstocks's Issues

Recommend Projects

Recommend Topics

Recommend Org