antoinecarme / pyaf Goto Github PK
View Code? Open in Web Editor NEWPyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
License: BSD 3-Clause "New" or "Revised" License
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
License: BSD 3-Clause "New" or "Revised" License
antoine@z600:~/dev/python/packages/pyaf$ ipython3 tests/bench/test_yahoo.py
ACQUIRED_YAHOO_LINKS 4818
YAHOO_DATA_LINK AAPL https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_AAPL.csv
YAHOO_DATA_LINK GOOG https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_GOOG.csv
load_yahoo_stock_prices my_test 2
BENCH_TYPE YAHOO_my_test OneDataFramePerSignal
BENCH_DATA YAHOO_my_test <pyaf.Bench.TS_datasets.cTimeSeriesDatasetSpec object at 0x7fdc9c5f7fd0>
TIME : Date N= 1246 H= 12 HEAD= ['2011-07-28T00:00:00.000000000' '2011-07-29T00:00:00.000000000'
'2011-08-01T00:00:00.000000000' '2011-08-02T00:00:00.000000000'
'2011-08-03T00:00:00.000000000'] TAIL= ['2016-07-05T00:00:00.000000000' '2016-07-06T00:00:00.000000000'
'2016-07-07T00:00:00.000000000' '2016-07-08T00:00:00.000000000'
'2016-07-11T00:00:00.000000000']
SIGNAL : GOOG N= 1246 H= 12 HEAD= [ 610.941019 603.691033 606.771021 592.40099 601.171059] TAIL= [ 694.950012 697.77002 695.359985 705.630005 715.090027]
GOOG Date
0 610.941019 2011-07-28
1 603.691033 2011-07-29
2 606.771021 2011-08-01
3 592.400990 2011-08-02
4 601.171059 2011-08-03
https://en.wikipedia.org/wiki/Variance-stabilizing_transformation
In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.
The aim behind the choice of a variance-stabilizing transformation is to find a simple function ƒ to apply to values x in a data set to create new values y = ƒ(x) such that the variability of the values y is not related to their mean value.
The README file says that it is necessary to clone the github repository. This is not necessary.
A simple pip install should be Ok, like :
pip install git+git://github.com/antoinecarme/pyaf.git
A lot of imports should be updated (the FQN of a class starts with pyaf. etc).
Need to find a dataset reporting the day of each duck (these are numbered) served at "La Tour d'Argent".
https://en.wikipedia.org/wiki/RAML_%28software%29
RESTful API Modeling Language (RAML) is a YAML-based language for describing RESTful APIs.[2] It provides all the information necessary to describe RESTful or practically-RESTful APIs. Although designed with RESTful APIs in mind, RAML is capable of describing APIs that do not obey all constraints of REST (hence the description "practically-RESTful"). It encourages reuse, enables discovery and pattern-sharing, and aims for merit-based emergence of best practices.
There is already a log for 'fit/train' with
INFO:pyaf.std:END_TRAINING_TIME_IN_SECONDS 'Ozone' 3.291870594024658
Add the same kind of logging for 'predict/forecast' and 'plot'.
On Heruku platform, there is some very tiny difference on dates generated by pyaf. This may impact the extraction of exogenous data.
According to https://github.com/ripienaar/free-for-dev
travis-ci.org — Free for public GitHub repositories
Goal :
Put in place some continuous integration process for pyaf.
need a similar doc with hierarchical and grouped time series detailed examples.
The following test fails :
tests/artificial/transf_/trend_poly/cycle_7/ar_/test_artificial_1024__poly_7__20.py
Fails with the exception :
ValueError: shapes (1012,1030) and (1000,) not aligned: 1030 (dim 1) != 1000 (dim 0)
Seems to be a scikit learn usage error.
Typing columns is not optimized for the moment ....
Avoid using diff and cat
Compare files bu parsing all the lines and allowing 1e-10 error in the numerical columns of pandas JSON output for example.
In order to analyze accurately the issues, it is necessary to have the exact version of each component used.
These version should be stored in the model (with model training date etc).
This is not an issue with PyAF.
However, in case anyone else should encounter read_csv() failures on macosx when running examples, refer
https://bugs.python.org/issue28150
or
In the API-generated plots, the forecast line color is not always the same. The same applies for other lines colors.
At least check the possibility of using pyaf in this context.
pyaf is not aware of the data source type (time series database or web service, etc) as long as the dataset is stored in a pandas dataframe.
Is there a link with hierarchical models ?
A jupyter notebook is welcome with a real example.
instead of printing text messages on the standard output !!!
At least for debugging purposes.
Time should be a time (np.dtype is time/date or numeric)
Signal should be numeric
issue error messages if time/signal column is not found in the training dataset and has correct type.
It may be funny to test some neural network models in the competing models.
Using scikit_learn SVR models (the same way we already use Ridge linear regression for AR/ARX models.).
Need to test PyAF on all th e datasets given in this package
package :
https://cran.r-project.org/web/packages/expsmooth/index.html
datasets :
ausgdp dji gasprice partx utility bonds enplanements hospital ukcars vehicles canadagas fmsales jewelry unemp.cci visitors carparts freight mcopper usgdp xrates djiclose frexport msales usnetelec
To make error handling easier, PyAF api calls should all raise the same kind of exception (PyaF_Error) or an inherited form.
This approach is still missing. Generate new columns with MO prefix based on base forecasts for all hierarchy levels.
According to
https://www.otexts.org/fpp/2/5
MAPE/SMAPE are not reliable performance measures for model selection.
The authors recommend MASE (mean absolute scaled error):
MASE=mean(|qj|).
where qj is the scaled error.
Need to run a benchmarking process to review the current state of PyAF.
In as first time, we will see this as a sanity check (correct some bugs here and there ;).
In a second time, a report is generated with performance figures.
Following #12 , Only keras with theano backend was tested on a 24 cores machine (HP Z600).
Try to configure TensorFlow (with or wihout an NVidia gpu) on this machine and perform the same tests.
After updating #6 , The following test fails :
python3 tests/artificial/transf_log/trend_constant/cycle_12/ar_12/test_artificial_1024_log_constant_12_12_100.py
The code as it is today, uses RMSE sometimes and MAPE sometimes.
There is an option,
self.mModelSelection_Criterion = "L2";
That shoud be used for model selection everywhere. By the way it should be set to "MAPE" be default.
This is a very serious bug.
Some signal transformations need nothing, others need the signal to be positive (log = boxcox(0)).
Some perform some scale-translation invariance. needs to be applicavble to all transformations (new option).
Following #12 , MLP and LSTM Keras with theano backend were tested on a 24 cores machine (HP Z600).
These m odels may need some tuning (RNN architecture improvement).
Need also some artificial dataset validation.
Sometimes the signal is too small/easy to forecast. PyAF fails when the signal has only one row !!!
The goal here is to make pyaf as robust as possible against very small/bad datasets
PyAF should automatically produce reasonable/naive/trivial models in these cases.
It should not fail in any case (normal behavior expected, useful for M2M context)
PyAF has a API call lEngine.standardPlots(). It gives some classical plots (signal against forecast, residues, trends, cycles, AR)
All the plots are generated with matplotlib
Document the plots generated.
The REST service (issue #20 ) also gives the same plots in a png/base64 encoding, to be documented.
Plots that are generated for hierarchical models are too elementary.
Add more significant annotation for all the hierarchy nodes:
Need a document to describe the algorithmic aspects of time series forecasting in PyAF.
Need to review the software aspects of performance.
Heroku free dyno "only" has 512MB and low cpu.
A lot of pandas dataframes are created internally at each step of the signal decomposition process.
Try to get rid of unnecessary dataframes.
Some memory profiling is also welcome.
download_all_stock_prices.py:
import SignalDecomposition as SigDec
import TS_datasets as tsds
tsds.download_yahoo_stock_prices();
PyAF protects these indicators against zero values in the signal :
def protect_small_values(self, signal, estimator):
eps = 1.0e-13;
keepThis = (np.abs(signal) > eps);
signal1 = signal[keepThis];
estimator1 = estimator[keepThis];
# self.dump_perf_data(signal , signal1);
return (signal1 , estimator1);
This is not necessary.
An approximation is better :
rel_error = abs(estimator - signal) / (abs(signal) + eps)
MAPE = mean(rel_error)
We copy the dataset and use a loop over the whole dataset which is too baaaaad :
'''
def specific_invert(self, df):
df_orig = df.copy();
df_orig.iloc[0] = df.iloc[0];
for i in range(1,df.shape[0]):
df_orig.iloc[i] = df.iloc[i] - df.iloc[i - 1]
return df_orig;
'''
There is room for improvement.
A second group of failures. The following warning is raised :
logs/test_artificial_1024_cumsum_constant_5__20.log:UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.
sample script :
tests/artificial/transf_cumsum/trend_constant/cycle_5/ar_/test_artificial_1024_cumsum_constant_5__20.py
At least
new feature to be exposed.
Following #12 , Only keras with theano backend was tested on a 24 cores machine (HP Z600).
Try to configure an NVidia gpu on this machine and perform the same tests.
According to :
https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error
A limitation to SMAPE is that if the actual value or forecast value is 0, the value of error will boom up to the upper-limit of error. (200% for the first formula and 100% for the second formula).
Provided the data are strictly positive, a better measure of relative accuracy can be obtained based on the log of the accuracy ratio: log(Ft / At) This measure is easier to analyse statistically, and has valuable symmetry and unbiasedness properties. When used in constructing forecasting models the resulting prediction corresponds to the geometric mean (Tofallis, 2015).
A logistic transformation is more appropriate when the signal is a proportion.
Need a RESTful API for PyAF and a Heroku demo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.