Giter VIP home page Giter VIP logo

antoinecarme / pyaf Goto Github PK

View Code? Open in Web Editor NEW
457.0 19.0 73.0 288.43 MB

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

License: BSD 3-Clause "New" or "Revised" License

Python 91.64% R 0.01% Shell 0.01% Makefile 8.36% Procfile 0.01%
scikit-learn pandas jupyter forecasting exogenous benchmark seasonal time-series horizon autoregressive

pyaf's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyaf's Issues

Support Date types

antoine@z600:~/dev/python/packages/pyaf$ ipython3 tests/bench/test_yahoo.py
ACQUIRED_YAHOO_LINKS 4818
YAHOO_DATA_LINK AAPL https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_AAPL.csv
YAHOO_DATA_LINK GOOG https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/YahooFinance/nasdaq/yahoo_GOOG.csv
load_yahoo_stock_prices my_test 2
BENCH_TYPE YAHOO_my_test OneDataFramePerSignal
BENCH_DATA YAHOO_my_test <pyaf.Bench.TS_datasets.cTimeSeriesDatasetSpec object at 0x7fdc9c5f7fd0>
TIME : Date N= 1246 H= 12 HEAD= ['2011-07-28T00:00:00.000000000' '2011-07-29T00:00:00.000000000'
'2011-08-01T00:00:00.000000000' '2011-08-02T00:00:00.000000000'
'2011-08-03T00:00:00.000000000'] TAIL= ['2016-07-05T00:00:00.000000000' '2016-07-06T00:00:00.000000000'
'2016-07-07T00:00:00.000000000' '2016-07-08T00:00:00.000000000'
'2016-07-11T00:00:00.000000000']
SIGNAL : GOOG N= 1246 H= 12 HEAD= [ 610.941019 603.691033 606.771021 592.40099 601.171059] TAIL= [ 694.950012 697.77002 695.359985 705.630005 715.090027]
GOOG Date
0 610.941019 2011-07-28
1 603.691033 2011-07-29
2 606.771021 2011-08-01
3 592.400990 2011-08-02
4 601.171059 2011-08-03

Add Variance stabilizing transformations

https://en.wikipedia.org/wiki/Variance-stabilizing_transformation

In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.

The aim behind the choice of a variance-stabilizing transformation is to find a simple function ƒ to apply to values x in a data set to create new values y = ƒ(x) such that the variability of the values y is not related to their mean value.

Installation should be easier.

The README file says that it is necessary to clone the github repository. This is not necessary.

A simple pip install should be Ok, like :

pip install git+git://github.com/antoinecarme/pyaf.git

A lot of imports should be updated (the FQN of a class starts with pyaf. etc).

Add a RAML doc for the RESTful API

https://en.wikipedia.org/wiki/RAML_%28software%29

RESTful API Modeling Language (RAML) is a YAML-based language for describing RESTful APIs.[2] It provides all the information necessary to describe RESTful or practically-RESTful APIs. Although designed with RESTful APIs in mind, RAML is capable of describing APIs that do not obey all constraints of REST (hence the description "practically-RESTful"). It encourages reuse, enables discovery and pattern-sharing, and aims for merit-based emergence of best practices.

Add Timing reports for all operations

There is already a log for 'fit/train' with

INFO:pyaf.std:END_TRAINING_TIME_IN_SECONDS 'Ozone' 3.291870594024658

Add the same kind of logging for 'predict/forecast' and 'plot'.

Improve numerical stability

On Heruku platform, there is some very tiny difference on dates generated by pyaf. This may impact the extraction of exogenous data.

Artificial dataset test failure

The following test fails :

tests/artificial/transf_/trend_poly/cycle_7/ar_/test_artificial_1024__poly_7__20.py

Fails with the exception :

ValueError: shapes (1012,1030) and (1000,) not aligned: 1030 (dim 1) != 1000 (dim 0)

Seems to be a scikit learn usage error.

Store version information in Model metadata

In order to analyze accurately the issues, it is necessary to have the exact version of each component used.

  • system (uname -a ??)
  • pyhton version
  • scikit-learn.
  • Pandas
  • NumPy,
  • SciPy.
  • matplotlib
  • SQLAlchemy

These version should be stored in the model (with model training date etc).

Investigate IoT Time Series Applications

At least check the possibility of using pyaf in this context.
pyaf is not aware of the data source type (time series database or web service, etc) as long as the dataset is stored in a pandas dataframe.
Is there a link with hierarchical models ?
A jupyter notebook is welcome with a real example.

Exception Handling

To make error handling easier, PyAF api calls should all raise the same kind of exception (PyaF_Error) or an inherited form.

Benchmarking Process

Need to run a benchmarking process to review the current state of PyAF.

In as first time, we will see this as a sanity check (correct some bugs here and there ;).

In a second time, a report is generated with performance figures.

Investigate TensorFlow usage with PyAF

Following #12 , Only keras with theano backend was tested on a 24 cores machine (HP Z600).

Try to configure TensorFlow (with or wihout an NVidia gpu) on this machine and perform the same tests.

Failure in an artificial test

After updating #6 , The following test fails :

python3 tests/artificial/transf_log/trend_constant/cycle_12/ar_12/test_artificial_1024_log_constant_12_12_100.py

Perform signal transformation in a uniform way

Some signal transformations need nothing, others need the signal to be positive (log = boxcox(0)).
Some perform some scale-translation invariance. needs to be applicavble to all transformations (new option).

Tuning Keras Models

Following #12 , MLP and LSTM Keras with theano backend were tested on a 24 cores machine (HP Z600).

These m odels may need some tuning (RNN architecture improvement).

Need also some artificial dataset validation.

Avoid unnecessary failures

Sometimes the signal is too small/easy to forecast. PyAF fails when the signal has only one row !!!

The goal here is to make pyaf as robust as possible against very small/bad datasets
PyAF should automatically produce reasonable/naive/trivial models in these cases.
It should not fail in any case (normal behavior expected, useful for M2M context)

Add a document about plotting features of PyAF

PyAF has a API call lEngine.standardPlots(). It gives some classical plots (signal against forecast, residues, trends, cycles, AR)

All the plots are generated with matplotlib

Document the plots generated.

The REST service (issue #20 ) also gives the same plots in a png/base64 encoding, to be documented.

Improve Plots for Hierarchical Models

Plots that are generated for hierarchical models are too elementary.

Add more significant annotation for all the hierarchy nodes:

  1. MAPE for the node model.
  2. Top-Down average proportions for the edges.
  3. Use a specific color for each level of the hierarchy.
  4. Other annotations ?

Document the Algorithmic Details of PyAF

Need a document to describe the algorithmic aspects of time series forecasting in PyAF.

  1. The overall algorithm
  2. The detail of the signal decomposition
  3. The machine learning aspects
  4. Advanced usage/control of the algorithms.
    5.Hierarchical forecasting.

Try to reduce memory usage

A lot of pandas dataframes are created internally at each step of the signal decomposition process.
Try to get rid of unnecessary dataframes.

Some memory profiling is also welcome.

MAPE, sMAPE computations are slow

PyAF protects these indicators against zero values in the signal :

def protect_small_values(self, signal, estimator):
    eps = 1.0e-13;
    keepThis = (np.abs(signal) > eps);
    signal1 =  signal[keepThis];       
    estimator1 = estimator[keepThis];
    # self.dump_perf_data(signal , signal1);        
    return (signal1 , estimator1);

This is not necessary.

An approximation is better :

rel_error = abs(estimator - signal) / (abs(signal) + eps)
MAPE = mean(rel_error)

Signal Transformation Computation is too slow

We copy the dataset and use a loop over the whole dataset which is too baaaaad :

'''

def specific_invert(self, df):
    df_orig = df.copy();
    df_orig.iloc[0] = df.iloc[0];
    for i in range(1,df.shape[0]):
        df_orig.iloc[i] = df.iloc[i] -  df.iloc[i - 1]
    return df_orig;

'''

There is room for improvement.

Artificial dataset test failure - Warning about singular matrix

A second group of failures. The following warning is raised :

logs/test_artificial_1024_cumsum_constant_5__20.log:UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.

sample script :

tests/artificial/transf_cumsum/trend_constant/cycle_5/ar_/test_artificial_1024_cumsum_constant_5__20.py

User Guide

At least

  1. API documentation in Python code
  2. Some examples

Investigate GPU usage with PyAF

Following #12 , Only keras with theano backend was tested on a 24 cores machine (HP Z600).

Try to configure an NVidia gpu on this machine and perform the same tests.

Add LnQ Performance measure

According to :

https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error

A limitation to SMAPE is that if the actual value or forecast value is 0, the value of error will boom up to the upper-limit of error. (200% for the first formula and 100% for the second formula).

Provided the data are strictly positive, a better measure of relative accuracy can be obtained based on the log of the accuracy ratio: log(Ft / At) This measure is easier to analyse statistically, and has valuable symmetry and unbiasedness properties. When used in constructing forecasting models the resulting prediction corresponds to the geometric mean (Tofallis, 2015).

pyAF_introduction stalling during training?

The introductory notebook hangs calling lEngine.train(ozone_dataframe, )

jypyter_hang

I am investigating...

ipykernel==4.5.2
ipython==5.3.0
ipywidgets==6.0.0
jupyter==1.0.0
jupyter-client==5.0.0
jupyter-console==5.1.0
jupyter-core==4.3.0
Keras==1.2.2
matplotlib==2.0.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.