Giter VIP home page Giter VIP logo

machinelearningstocks's People

Contributors

ecbc1 avatar robertmartin8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machinelearningstocks's Issues

More unit tests

Although I have written some unit tests, right now they really only check the datasets and some of the helper functions.

I would much appreciate any help in testing this project's core functionality.

Fix the stock data parser

When I first built this project, I retrieved my stock data from Quandl. However, this was about 2-3 years ago, so the API has changed considerably. As such, I strongly suspect that this code is broken (though I haven't tested it lately).

In any case, Quandl may not be the best datasource: I am keen to instead use pandas-datareader with the yahoo-finance fix (https://github.com/ranaroussi/fix-yahoo-finance).

intraQuarter/_KeyStats/ issue

Hi Robert, I noticed the intraQuarter/_KeyStats/ data is used,but I cannot find it,could u tell me where it is or
how I can get it?thx so much!

Adaptation request

Hello, I luckily ended up on your project as I'm looking at scraping data from Yahoo Finance for a list of quotes (not only S&P500). I was wondering if there was a way to get a part of your script adapted to my needs?
i.e. I've got a list of quotes available in a .txt file. I currently use the YahooFinancials python api but I realised that some key figures are missing, such as "Cash, Debt, Levered free cash flow...etc".
So far, I'm collecting the data using that custom python script and then dump as a JSON file.
Would you be able to help me? Thanks :)

Question

Hi Robert,

This is another fine project together with your excellent portfolio optimisation work.
I hope you don't mind me asking a question as someone inexperienced in this area regarding the amount of fundamental data required to produce a viable model.

My local exchange is London (LSE/FTSE) and getting historic fundamentals is hard. I am able to extract these day by day but it will take some time to produce a significant amount.

So I was wondering, how many days would I need to have processed for a viable classification model? Say 6 months, 3 months etc. I have many fields but at present this goes back a week.

Thank you in advance

Fig

Using yfinance

Hello, thanks you for all the great work done in this repo. I would suggest that you use the finance library that's gets the data from Yahoo Finance fairly easily and is much faster and more accommodating than pandas_datareader. It also has a load of other functions that might make your life easier. I would love to contribute to this repo.

Error en pytest -v

tests/test_datasets.py::test_forward_sample_dimensions PASSED [ 11%]
tests/test_datasets.py::test_forward_sample_data PASSED [ 22%]
tests/test_datasets.py::test_stock_prices_dataset PASSED [ 33%]
tests/test_datasets.py::test_stock_prediction_dataset PASSED [ 44%]
tests/test_utils.py::test_status_calc PASSED [ 55%]
tests/test_utils.py::test_data_string_to_float PASSED [ 66%]
tests/test_variables.py::test_statspath PASSED [ 77%]
tests/test_variables.py::test_features_same FAILED [ 88%]
tests/test_variables.py::test_outperformance PASSED [100%]

=================================================== FAILURES ====================================================
______________________________________________ test_features_same _______________________________________________

def test_features_same():
    # There are only four differences (intentionally)
  assert set(parsing_keystats.features) - set(current_data.features) == {'Net Income Avl to Common', 'Qtrly Earnings Growth',
                                                                           'Qtrly Revenue Growth', 'Shares Short (as of',
                                                                           'Shares Short (prior month)'}

E AssertionError: assert {'Net Income ...Short (as of'} == {'Net Income A...prior month)'}
E Extra items in the right set:
E 'Shares Short (prior month)'
E Full diff:
E {'Net Income Avl to Common',
E 'Qtrly Earnings Growth',
E 'Qtrly Revenue Growth',
E - 'Shares Short (as of'}...
E
E ...Full output truncated (5 lines hidden), use '-vv' to show

tests/test_variables.py:17: AssertionError
====================================== 1 failed, 8 passed in 11.06 seconds ======================================

SET (help wanted - can't find label )

Was really impressed with this file. Most just look at historical price, but love what you have done with including a deeper assessment of the company.

I live in Bangkok and follow the Thai market. I can already get Thai historical prices, but how to change S&P500 to SET ( In yahoo I thought it would be SET.bk ( as stocks are .bk ), but does not work

Thanks

Add some unit testing!

In the interests of stability and best practice, it would probably be a good idea to add some basic unit tests.

Only problem is I have to find a way to write meaningful tests without actually having to download all of the data etc each time I run a test.

Question

Hello. Is it possible to integrate this with NASDAQ/IQ Option?

More KeyStats

Hi! I just cloned your project and am messing around with it. Though I am an experienced software engineer, I am new to machine learning so feel free to tell me my insights are incorrect!

After reading the code I noticed prediction modeling heavily relies on the KeyStats, however data is extremely limited. Would it not be SUPER beneficial to back fill this data with a record per quarter (the provided data is very erratic, yet most 'feature' data points are provided be the company every quarter).

In addition to this, a cron or a simple get_missing_quartly_keystats.py script that can be invoked on demand to fill in new stats to accommodate longevity and modern accuracy of this project would help this project modeling become more accurate (more data sets), but also bring it closer to becoming a practical live use tool.

Most of the historical quarterly features data points can be found directly or through calculations on https://www.macrotrends.net/. Example: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/financial-statements

There are many categories with sub categories that can most likely be scraped and parsed. For example, the full historical market cap chart served here: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/market-cap
can be parsed out as in the html is a <script> tag that defines var chartData with all the values by date.

between the balance sheets and financial records they provide you may even find other influential data points to add to the ML portion of this script.

Let me know what you think, or if my logic is simply way off. If you think it is a good Idea I can help out with refactoring!

Data download missing data

Based on some feedback received and subsequent experiments, it seems that the data download is missing out a lot of tickers (and if it's missing out the SPY, there will be an error in parsing_keystats.py).

This project downloads price data for free from Yahoo Finance, via pandas-datareader (and fix-yahoo-finance). However, I've noticed lately that the data is becoming a lot more inconsistent, and sometimes just fails completely. This is because Yahoo seems to be dropping their support for this API.

The data on yahoo is still there, it's just a problem of accessing it. In the past I wrote a blog post about downloading data from the linked source, but 'deprecated it' once I realised that pandas-datareader with fix-yahoo-finance did the same thing but much better. My method still works, but it won't be trivial to integrate it with the project (and anyway it's a very clunky solution). I suppose that the easiest solution is to find another data source, so suggestions would be welcome.

As a temporary fix, I have added the csv files (containing all the data) to this repo.

Error

There is error zero-sized array to reduction operation maximum which has no identity
in download_historical_prices.py

Stuck downloading historical prices

Hello, thank you for writing such an interesting repository. Could you assist me with an issue with running the command, python download_historical_prices.py. Appears to be stuck at 80% and not proceeding.. Thank you!
image

Cannot receive data

I entered this code in and the data doesn't return anything.

from pandas_datareader import data as pdr

import fix_yahoo_finance as yf
yf.pdr_override()
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")

I've got an error when I try to run stock_prediction.py

Hello,

I've got the following error when I try to run stock_prediiction.py I already tried in Linux Centos 7 and Windows 10 my python version is 3.6.5 I followed all the instructions step by step . The others files runs fine.

[root@customiseta MachineLearningStocks]# python3.6 stock_prediction.py
Building dataset and predicting stocks...
Traceback (most recent call last):
File "stock_prediction.py", line 55, in
predict_stocks()
File "stock_prediction.py", line 42, in predict_stocks
y_pred = clf.predict(X_test)
File "/usr/lib64/python3.6/site-packages/sklearn/ensemble/forest.py", line 538, in predict
proba = self.predict_proba(X)
File "/usr/lib64/python3.6/site-packages/sklearn/ensemble/forest.py", line 578, in predict_proba
X = self._validate_X_predict(X)
File "/usr/lib64/python3.6/site-packages/sklearn/ensemble/forest.py", line 357, in validate_X_predict
return self.estimators
[0]._validate_X_predict(X, check_input=True)
File "/usr/lib64/python3.6/site-packages/sklearn/tree/tree.py", line 373, in _validate_X_predict
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
File "/usr/lib64/python3.6/site-packages/sklearn/utils/validation.py", line 462, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 41)) while a minimum of 1 is required.

running problem

actually how to run this code I install all the libraries but also it gives error. could you please help me to run this code.

Train - test split (allready seen samples)

Hello,

First of all great work Robert.

I find one big mistake ( everyone do that ) in backtesting.py -> row 40 - u are using shuffle = True ( by default is true in train_test_split ) and when u doing i+1 or i+x targets data is already seen when doing learning. Because of that u get always different result when running backtesting.py. If u change shuffle = False u will get 45-50% less of trades and Accuracy score will drop to 0.6/0.65 max.

Best

Naming conventions

Within the project there are many inconsistent naming conventions (just look at the top level python files!)

Fix this according to the holy laws of PEP8 and human decency.

Benchmark data

Hi Robert,

I was reviewing your most excellent work earlier and was wondering..

What index did you use to generate the sp500_index.csv data?

Was this S&P 500 (^GSPC) and did you preprocess or scale this data.

The reason I ask is that the data in the 200-207 range looks on the low side.

Thanks!

Fig

syntax error - download_historical_prices.py

ubuntu 16.04

✘-1 ~/MachineLearningStocks [master|✚ 1…21968] 
05:02 $ python download_historical_prices.py
  File "download_historical_prices.py", line 35
    print(f"{len(missing_tickers)} tickers are missing: \n {missing_tickers} ")
                                                                             ^
SyntaxError: invalid syntax

Refactor current_data.py

current_data.py extracts the current financials of a company by scraping yahoo finance.

However, if you look at the file you will see that it is a hard-coded mess, filled with code smell and repetition. In the spirit of python, this can and should be fixed. I do have a fix ready on one of my recent versions of this project, but I will have to backwards-integrate it.

neew complementary tool

My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.

I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.

The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.

With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).

I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.

Historical Fundamental Data

Robert, Just discovered your MachineLearningStocks. Not an issue but a suggestion on fundamental
data sources, The American Association of Individual Investors has a product (Stock Investor Pro) with a
reasonable subscription fee of US $198/year after a membership fee of $29/year. A subscriber has
access to both current and weekly non survivorship biased historical back to 2004 for ~2000 fundamental factors for ~6000 equities. It takes a significant effort to download and put the data
into a usable format. I have been using this data source in a personal Python based stock back tester
and screener for personal investing for 14 + years. Interestingly I too am wading through Eremenko Krill's Machine Learning and Deep Learning and have just purchased a GPU card with the long term
intent of adding ML stock selection to my current system.

Backtesting issue

Interested in this project and possibly working on it more. Just starting out with ML but I was curious to try and figure out the issue with the backtesting. From what I can tell it is that you are training the model on future data but then making predictions for stocks in the past...

It seems like the solution would be to first, randomly select the year you'd like to predict and then ensure the spit for both training and test is only run on years before that. Just wanted to check and see if I'm right about at least the issue. Feel free to drop me an email (on my profile) if you'd rather talk there, I know you said you want to let other people try and figure it out.

backtest

I would like to contribute to this project and have read through the readme in detail.

I have noticed you speak about a fatal flaw in the backtest, what is it? I can work on this and submit a PR.

What is the best way to get more updated key statistics data?

Hello,

I tried using the script but the data for key statistics provided by Sentdex is a bit outdated. Does anyone know an API or a URL where we can get more fresh data? I am willing to submit a PR with this solution is someone provides me with enough info so I can implement it?

Thanks,
Aleksandar Serafimoski

Ticker List Confused

I do not see how to get the ticker list. There isn't really much documentation on it. I can get the prices for SPY, but the director does not work. Is this something I have to get by myself?

Please let me know.

keystats.csv

Hello, when I run command python parsing_keystats.py, keystats.cvs is created, but it's empty(except for the Date, unix, ticker and etc.). I know for sure that stock prices are downloaded and updates as I change date to present. Please can you help me as I tried everything to solve this.

Improve documentation

Let's have some clear commenting and a much improved README so new users can understand exactly what's going on.

Test Failure

I get the below error when doing the pytest. I'm not sure why this is occurring.

pytest -vv
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.6.5, pytest-3.4.1, py-1.5.3, pluggy-0.6.0 -- /home/chris/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/chris/Documents/Stocks/MachineLearningStocks, inifile:
plugins: remotedata-0.2.1, openfiles-0.3.0, doctestplus-0.1.3, arraydiff-0.2
collected 9 items

tests/test_datasets.py::test_forward_sample_dimensions PASSED [ 11%]
tests/test_datasets.py::test_forward_sample_data PASSED [ 22%]
tests/test_datasets.py::test_stock_prices_dataset PASSED [ 33%]
tests/test_datasets.py::test_stock_prediction_dataset PASSED [ 44%]
tests/test_utils.py::test_status_calc PASSED [ 55%]
tests/test_utils.py::test_data_string_to_float PASSED [ 66%]
tests/test_variables.py::test_statspath PASSED [ 77%]
tests/test_variables.py::test_features_same FAILED [ 88%]
tests/test_variables.py::test_outperformance PASSED [100%]

======================================================================================= FAILURES =======================================================================================
__________________________________________________________________________________ test_features_same __________________________________________________________________________________

def test_features_same():
    # There are only four differences (intentionally)
  assert set(parsing_keystats.features) - set(current_data.features) == {'Qtrly Revenue Growth', 'Qtrly Earnings Growth',
                                                                           'Shares Short (as of', 'Net Income Avl to Common'}

E AssertionError: assert {'Net Income ...prior month)'} == {'Net Income A...Short (as of'}
E Extra items in the left set:
E 'Shares Short (prior month)'
E Full diff:
E {'Net Income Avl to Common',
E 'Qtrly Earnings Growth',
E 'Qtrly Revenue Growth',
E - 'Shares Short (as of',
E ? ^
E + 'Shares Short (as of'}
E ? ^
E - 'Shares Short (prior month)'}

tests/test_variables.py:17: AssertionError
========================================================================= 1 failed, 8 passed in 15.02 seconds ==========================================================================

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.