jxx123 / firets Goto Github PK

A python multi-variate time series prediction library working with sklearn

License: MIT License

Python 100.00%

time-series machine-learning predictive-modeling time-series-prediction time-series-analysis autoregression nonlinear-timeseries-analysis narx nonlinear-time-series

firets's Introduction

fireTS

Documentation, FAQ

UPDATES

5/31/2020 forecast method is AVAILABLE now in NARX models!!! (DirectAutoRegressor is not suitable to do forecast, so there is no forecast method for it.) Here is a quick start example. Check "examples/Basic usage of NARX and DirectAutoregressor.ipynb" for more details. What is the difference between predict and forecast?

import numpy as np
from sklearn.linear_model import LinearRegression
from fireTS.models import NARX

x = np.random.randn(100, 1)
y = np.random.randn(100)
mdl = NARX(LinearRegression(), auto_order=2, exog_order=[2])
mdl.fit(x, y)
y_forecast = mdl.forecast(x, y, step=10, X_future=np.random.randn(9, 1))

Introduction

fireTS is a sklean style package for multi-variate time-series prediction. Here is a simple code snippet to showcase the awesome features provided by fireTS package.

from fireTS.models import NARX, DirectAutoRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
import numpy as np

# Random training data
x = np.random.randn(100, 2)
y = np.random.randn(100)

# Build a non-linear autoregression model with exogenous inputs
# using Random Forest regression as the base model
mdl1 = NARX(
    RandomForestRegressor(n_estimators=10),
    auto_order=2,
    exog_order=[2, 2],
    exog_delay=[1, 1])
mdl1.fit(x, y)
ypred1 = mdl1.predict(x, y, step=3)

# Build a general autoregression model and make multi-step prediction directly
# using XGBRegressor as the base model
mdl2 = DirectAutoRegressor(
    XGBRegressor(n_estimators=10),
    auto_order=2,
    exog_order=[2, 2],
    exog_delay=[1, 1],
    pred_step=3)
mdl2.fit(x, y)
ypred2 = mdl2.predict(x, y)

sklearn style API. The package provides fit and predict methods, which is very similar to sklearn package.
Plug-and-go. You are able to plug in any machine learning regression algorithms provided in sklearn package and build a time-series forecasting model.
Create the lag features for you by specifying the autoregression order auto_order, the exogenous input order exog_order, and the exogenous input delay exog_delay.
Support multi-step prediction. The package can make multi-step prediction in two different ways: recursive way and direct way. NARX model is to build a one-step-ahead-predictive model, and the model will be used recursively to make multi-step prediction (future exogenous input information is needed). DirectAutoRegressor makes multi-step prediction directly (no future exogenous input information is needed) by specifying the prediction step in the constructor.
Support grid search to tune the hyper-parameters of the base model (cannot do grid search on the orders and delays of the time series model for now).

I developed this package when writing this paper. It is really handy to generate lag features and leverage various regression algorithms provided by sklearn to build non-linear multi-variate time series models. The API can also be used to build deep neural network models to make time-series prediction. The paper used this package to build LSTM models and make multi-step predictions.

The documentation can be found here. The documentation provides the mathematical equations of each model. It is highly recommended to read the documentation before using the model.

Nonlinear AutoRegression with eXogenous (NARX) model

fireTS.models.NARX model is trying to train a one-step-ahead-prediction model and make multi-step prediction recursively given the future exogenous inputs.

Given the output time series to predict y(t) and exogenous inputs X(t) The model will generate target and features as follows:

Target	Features
y(t + 1)	y(t), y(t - 1), ..., y(t - p + 1), X(t - d), X(t - d - 1), ..., X(t - d - q + 1)

where p is the autogression order auto_order, q is the exogenous input order exog_order, d is the exogenous delay exog_delay.

NARX model can make any step ahead prediction given the future exogenous inputs. To make multi-step prediction, set the step in the predict method.

Direct Autoregressor

fireTS.models.DirectAutoRegressor model is trying to train a multi-step-head-prediction model directly. No future exogenous inputs are required to make the multi-step prediction.

Given the output time series to predict y(t) and exogenous inputs X(t) The model will generate target and features as follows:

Target	Features
y(t + k)	y(t), y(t - 1), ..., y(t - p + 1), X(t - d), X(t - d - 1), ..., X(t - d - q + 1)

where p is the autogression order auto_order, q is the exogenous input order exog_order, d is the exogenous delay exog_delay, k is the prediction step pred_step.

Direct autoregressor does not require future exogenous input information to make multi-step prediction. Its predict method cannot specify prediction step.

Installation

NOTE: Only python3 is supported.

It is highly recommended to use pip to install fireTS, follow this link to install pip.

After pip is installed,

pip install fireTS

To get the latest development version,

git clone https://github.com/jxx123/fireTS.git
cd fireTS
pip install -e .

Quick Start

Use RandomForestRegressor as base model to build a NARX model

from fireTS.models import NARX
from sklearn.ensemble import RandomForestRegressor
import numpy as np

x = np.random.randn(100, 1)
y = np.random.randn(100)
mdl = NARX(RandomForestRegressor(), auto_order=2, exog_order=[2], exog_delay=[1])
mdl.fit(x, y)
ypred = mdl.predict(x, y, step=3)

Use RandomForestRegressor as base model to build a DirectAutoRegressor model

from fireTS.models import DirectAutoRegressor
from sklearn.ensemble import RandomForestRegressor
import numpy as np

x = np.random.randn(100, 1)
y = np.random.randn(100)
mdl = DirectAutoRegressor(RandomForestRegressor(), 
                          auto_order=2, 
                          exog_order=[2], 
                          exog_delay=[1], 
                          pred_step=3)
mdl.fit(x, y)
ypred = mdl.predict(x, y)

Usage of grid search

from fireTS.models import NARX
from sklearn.ensemble import RandomForestRegressor
import numpy as np

x = np.random.randn(100, 1)
y = np.random.randn(100)

# DirectAutoRegressor can do grid search as well
mdl = NARX(RandomForestRegressor(), auto_order=2, exog_order=[2], exog_delay=[1])

# Grid search
para_grid = {'n_estimators': [10, 30, 100]}
mdl.grid_search(x, y, para_grid, verbose=2)

# Best hyper-parameters are set after grid search, print the model to see the difference
print(mdl)

# Fit the model and make the prediction
mdl.fit(x, y)
ypred = mdl.predict(x, y, step=3)

The examples folder provides more realistic examples. The example1 and example2 use the data simulated by simglucose pakage to fit time series model and make multi-step prediction.

FAQ

What is the difference between predict and forecast?
- For example, given a target time series y(0), y(1), ..., y(9) to predict and the exogenous input time series x(0), x(1), ..., x(9), build a NARX model NARX(RandomForestRegressor(), auto_order=1, exog_order=[1], exog_delay=[0]). The model can be represented by a function y(t + 1) = f(y(t), x(t)) + e(t).
- predict(x, y, step=2) outputs a time series that has the same length as original y, and it means the 2-step-ahead prediction at each step, i.e. nan, nan, y_hat(2), y_hat(3), ..., y_hat(9). Note that y_hat(2) is the 2-step-ahead prediction standing at time 0. y_hat(3) is the 2-step-ahead prediction standing at time 1, and so on. Another very important note is that predicted value y_hat(2) = f(y_hat(1), x(1)) = f(f(y(0), x(0)), x(1)). The prediction uses a perfect future information x(1) (since you are currently at time 0).
- When forecast(x, y, step=2) was called, the output is of length 2, meaning the predicted y in the future 2 steps, i.e. y_hat(10), y_hat(11). Here, both y_hat(10), y_hat(11) are the predicted values standing at time 9. However, forecast will NOT use any perfect future information of the exogenous input x by default. In fact, the default future exogenous inputs x are assume to be zeros across the whole prediction horizon. You can provide your own future exogenous input values through the optional argument X_future (call forcast(x, y, step=2, X_future=your_X_future)).

firets's People

Contributors

Stargazers

Watchers

firets's Issues

Memory efficient implementation for the `update()` in `OutputLagFeatureProcessor`

Hi Jinyu,

Regarding the update in OutputLagFeatureProcessor, perhaps a little off-topic, but I wonder what would be more memory efficient other than deque here?

Thanks!

Explanation of training model

HI developer,

Thanks for developing this package.

May I ask how to read the training model details? For example, I use XGboost model as the estimator and put into the NARX model. After that I would like to use SHAP get the related explanations. Can I do like this?

BTW, I would like to confirm the meaning of exog_order, for example, exog_order = np.tile(10,6)). Does the 10 represent the the lag order of the exogenous variable and the 6 represent that "there are 6 exogenous variables", although there are 6 exogenous variables in the X_train?

how to run if we have multiple input in x and one output.

getting following error

ValueError: The number of columns of X must be the same as the length of exog_order.

my X has 7 column and Y has one column. how I run this.

Prediction and forecast for Multiple input multipleoutput system

I know that there is a way for training a MIMO system by training multiple MISO systems and get multiple models. But is there any way to get only a single model?

Plotting predict results

Hi Jinyu,

When we are plotting the prediction results when using DirectAutoRegressor, should we be only setting the first pred_step elements to be np.nan?

EDIT: I meant to say should we be plotting the results starting from index pred_step.

I was also wondering why you set the first pred_step + max(auto_order-1, max(exog_order+exog_delay)-1) elements to be np.nan.

Thanks!
Daniel

Consider implementing forecast method for NARX model

Implement a forecast method that is doing the following:

Given X(0), ... , X(t) and Y(0), ..., Y(t), output the forecast values of Y as a vector Y_hat(t+1), Y_hat(t+2), ... ,Y_hat(t+k).

Paper link is broken

Hi,
I want to access the paper mentioned on this repository, but it seems the link is broken.

Regards

ModuleNotFoundError: No module named 'sklearn.metrics.regression'

In File "/root/miniconda3/envs/myconda/lib/python3.8/site-packages/fireTS/models.py", line 3.
from sklearn.metrics.regression import r2_score, mean_squared_error

It raised a moduleNotFoundError: No module named 'sklearn.metrics.regression'.

The enviroment is: Ubuntu18.04, python3.8.12, scikit-learn 1.0.2; numpy 1.19.5; scipy 1.8.0

why does the fitted model give good result on testing but terrible result when doing predict on training

Hi,
I am just wondering after playing around with the package using my own dataset. I split the data into training and the last 180 data as testing dataset. The fitted model gives good result compared to testing data but when applying predict procedure back to the training data the result is terrible. Is it normal? or am I doing a mistake?
Please advice,

Thank you

correction to model.NARX.forecast method

Line 135 in the forecast method of models.py has this: deque(X[(-1 - d):(-1 - q):-1, i])

I believe the middle slice is missing a term; it should be deque(X[(-1 - d):(-1 - d - q):-1, i]), otherwise if the exogenous regressor(s) are given a delay that is >= the exog_order, this will return an empty deque and the forecast method will fail.

problem when fitting with LSTM

when iam trying to fit LSTM in NARX architecture it showing error like 'lstm does not have set_parameters'
brother could you please add some example code to fit LSTM in your model

forecast/predict method

Why the forecast method need the target value?
mdl1.predict(x, y, step=3)

target variable is in design matrix?

Thank you for sharing this code!

Apologies if I'm misunderstanding something, but in the "Basic usage" and "Grid Search" example notebooks, it appears that the CGM column has been defined as the target variable as well as one of the two features in the design matrix. This is true for both the training and test sets. Having CGM as a predictor for CGM will lead to very low error. :)

I would have fixed this myself, but in the "Basic Usage" notebook, the markdown says that the design matrix is supposed to be insulin and meal data as the exogenous inputs. It wasn't obvious which of the columns was supposed to be the meal data, and I couldn't get any clarity by looking at the code for simglucose (which looks awesome, btw).

Online training narx

Hi, the package looks very fantastic to predict time series data. I was wondering if it is possible to implement it in an online training.

Combine fireTS library with neupy library for NARX network based on Levenberg Marquardt

Hi.
I want to create a NARX (Nonlinear Autoregressive with exogenous variables) model based on LM (Levenberg Marquardt) method.

Since this two method are not implemented in keras, I search for the library fireTs (for NARX) and neupy (for LM).

I'm using the sample code for both libraries:

fireTS (NARX): fireTS
neupy (LM): Neupy

and I combine them:

from fireTS.models import NARX
from sklearn.ensemble import RandomForestRegressor
import numpy as np
from neupy import algorithms
from neupy.layers import *

x = np.array([[1, 2], [3, 4]])
y = np.array([[1], [0]])

#y = np.ravel(y) <-just to avoid a warning (the error is the same without comment)

network = Input(2) >> Sigmoid(3) >> Sigmoid(1)
optimizer = algorithms.LevenbergMarquardt(network)

mdl1 = NARX(
    optimizer, #I change random forest for LevenbergMarquardt
    auto_order=2,
    exog_order=[2, 2],
    exog_delay=[1, 1])

mdl1.fit(x, y)

ypred1 = mdl1.predict(x, y)

ypred1

But I'm having this error in .fit method:

ZeroDivisionError Traceback (most recent call last)

in ()
18 exog_delay=[1, 1])
19
---> 20 mdl1.fit(x, y)
21
22 ypred1 = mdl1.predict(x, y)

/usr/local/lib/python3.7/dist-packages/neupy/utils/iters.py in
count_minibatches(inputs, batch_size)
22
23 def count_minibatches(inputs, batch_size):
---> 24 return int(math.ceil(count_samples(inputs) / batch_size))
25
26

ZeroDivisionError: division by zero

Any solution?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.