Comments (5)
Hi @Hussam1,
Forecasting with missing values is always a challenge. How to solve it depends a lot on the business case. Based on what you are explaining, it may make sense to propagate the value of the last business day.
You may also benefit from the weighted time series forecasting feature that skforecast offers.
https://www.cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values.html
from skforecast.
Hello @Hussam1,
Yes, you are right. One of the main limitations of an autoregressive model is that the series cannot be incomplete. Since the prediction t+1
depends on its past values (lags) it will not make sense for the gap between the lags to be different for each prediction.
Along with @JoaquinAmatRodrigo's solutions, I think another one can be tried:
- Did you try your idea of
hist = hist.asfreq("B")
? Since the series has a freq the error should be gone. It only makes sense as you mention in a series that stops on Friday and starts again on Monday. Disclaimer: business days values cannot be NaN.
from skforecast.
Thanks you very much @JoaquinAmatRodrigo @JavierEscobarOrtiz for your answers and suggestions. I agree for company's sales business case (which is the real case for me) propagating last business day's sales might be good option in addition to resampling to a higher frequency such as weekly/monthly.
I appreciate the efforts you put in this library, it is super helpful!
from skforecast.
Hello @Hussam1,
This line of code is causing the problem data = hist.dropna()
. Dropping NaNs creates a gap in the series and then the series loses its frequency. Check this issue.
Check how freq
disappears (and the length is reduced):
import yfinance as yf
import datetime as dt
spxl = yf.Ticker("SPXL")
hist = spxl.history(start="2015-01-01")
hist = hist.asfreq("D")
print(hist.index)
DatetimeIndex(['2015-01-02 00:00:00-05:00', '2015-01-03 00:00:00-05:00',
'2015-01-04 00:00:00-05:00', '2015-01-05 00:00:00-05:00',
'2015-01-06 00:00:00-05:00', '2015-01-07 00:00:00-05:00',
'2015-01-08 00:00:00-05:00', '2015-01-09 00:00:00-05:00',
'2015-01-10 00:00:00-05:00', '2015-01-11 00:00:00-05:00',
...
'2023-01-04 00:00:00-05:00', '2023-01-05 00:00:00-05:00',
'2023-01-06 00:00:00-05:00', '2023-01-07 00:00:00-05:00',
'2023-01-08 00:00:00-05:00', '2023-01-09 00:00:00-05:00',
'2023-01-10 00:00:00-05:00', '2023-01-11 00:00:00-05:00',
'2023-01-12 00:00:00-05:00', '2023-01-13 00:00:00-05:00'],
dtype='datetime64[ns, America/New_York]', name='Date', length=2934, freq='D')
data = hist.dropna()
print(data.index)
DatetimeIndex(['2015-01-02 00:00:00-05:00', '2015-01-05 00:00:00-05:00',
'2015-01-06 00:00:00-05:00', '2015-01-07 00:00:00-05:00',
'2015-01-08 00:00:00-05:00', '2015-01-09 00:00:00-05:00',
'2015-01-12 00:00:00-05:00', '2015-01-13 00:00:00-05:00',
'2015-01-14 00:00:00-05:00', '2015-01-15 00:00:00-05:00',
...
'2022-12-30 00:00:00-05:00', '2023-01-03 00:00:00-05:00',
'2023-01-04 00:00:00-05:00', '2023-01-05 00:00:00-05:00',
'2023-01-06 00:00:00-05:00', '2023-01-09 00:00:00-05:00',
'2023-01-10 00:00:00-05:00', '2023-01-11 00:00:00-05:00',
'2023-01-12 00:00:00-05:00', '2023-01-13 00:00:00-05:00'],
dtype='datetime64[ns, America/New_York]', name='Date', length=2023, freq=None)
from skforecast.
Thanks @JavierEscobarOrtiz for the response. Problem is sometimes filling the "gap" might not be optimal or correct from business case prospective, for instance in this situation trading happen only in business days and it will distort the purpose to assume any results in the "gap" days.
even if we take hist = hist.asfreq("B")
we will still have the gap. Or do you mean it is ok to leave them as null as long as there are no gaps in DateTimeIndex?
Edit:
I tested your answer, yes by filling the gap with any figures the error is gone but how would you solve the problem of having to fill the gap when it doesn't make sense business wise, for instance if a company doesn't have sales every day and you still need to model the daily sales? of course you can always resample till higher frequency but don't you think this is a limitation of the library?
from skforecast.
Related Issues (20)
- Predicted values are the same for 4 different test sets HOT 4
- Backtesting with overlap in the validation sets [parameter defining forecast origin shift] HOT 4
- About get_coef in older version HOT 3
- Pip Installation Fails on Macbook Pro M1 HOT 7
- Questions about using the known exogenous variables to conduct forecasted values HOT 2
- About multiseries: level and level weights setting in grid_search_forecaster_multiseries HOT 1
- Make compatible with numpy 1.23.1? HOT 7
- How to disable verbosity on Skforecast? HOT 5
- How to fill future known information from one of the exogenous variables? HOT 3
- Forecasting Future Unknown Data HOT 6
- error while using backtesting_forecaster HOT 2
- fcaster.fit Lightgmb freezes HOT 3
- How to get In-sample forecast ( Train data) ? HOT 1
- Question: forecaster.predict is not same as backtesting_forecaster predictions HOT 5
- Predict future values with SARIMAX HOT 2
- shap to support vector regression with linear kernel HOT 3
- backtesting ValueError: `last_window` has missing values
- Support pandas 1.2.0 and set_output(transform="pandas") HOT 1
- MultiVariate backtesting HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skforecast.