Giter VIP home page Giter VIP logo

Comments (5)

JoaquinAmatRodrigo avatar JoaquinAmatRodrigo commented on May 18, 2024 2

Hi @Hussam1,

Forecasting with missing values is always a challenge. How to solve it depends a lot on the business case. Based on what you are explaining, it may make sense to propagate the value of the last business day.

You may also benefit from the weighted time series forecasting feature that skforecast offers.

https://www.cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values.html

https://joaquinamatrodrigo.github.io/skforecast/0.6.0/faq/forecasting-time-series-with-missing-values.html

from skforecast.

JavierEscobarOrtiz avatar JavierEscobarOrtiz commented on May 18, 2024 1

Hello @Hussam1,

Yes, you are right. One of the main limitations of an autoregressive model is that the series cannot be incomplete. Since the prediction t+1 depends on its past values (lags) it will not make sense for the gap between the lags to be different for each prediction.

Along with @JoaquinAmatRodrigo's solutions, I think another one can be tried:

  • Did you try your idea of hist = hist.asfreq("B")? Since the series has a freq the error should be gone. It only makes sense as you mention in a series that stops on Friday and starts again on Monday. Disclaimer: business days values cannot be NaN.

from skforecast.

Hussam1 avatar Hussam1 commented on May 18, 2024 1

Thanks you very much @JoaquinAmatRodrigo @JavierEscobarOrtiz for your answers and suggestions. I agree for company's sales business case (which is the real case for me) propagating last business day's sales might be good option in addition to resampling to a higher frequency such as weekly/monthly.

I appreciate the efforts you put in this library, it is super helpful!

from skforecast.

JavierEscobarOrtiz avatar JavierEscobarOrtiz commented on May 18, 2024

Hello @Hussam1,

This line of code is causing the problem data = hist.dropna(). Dropping NaNs creates a gap in the series and then the series loses its frequency. Check this issue.

Check how freq disappears (and the length is reduced):

import yfinance as yf
import datetime as dt

spxl = yf.Ticker("SPXL")
hist = spxl.history(start="2015-01-01")
hist = hist.asfreq("D")
print(hist.index)

DatetimeIndex(['2015-01-02 00:00:00-05:00', '2015-01-03 00:00:00-05:00',
'2015-01-04 00:00:00-05:00', '2015-01-05 00:00:00-05:00',
'2015-01-06 00:00:00-05:00', '2015-01-07 00:00:00-05:00',
'2015-01-08 00:00:00-05:00', '2015-01-09 00:00:00-05:00',
'2015-01-10 00:00:00-05:00', '2015-01-11 00:00:00-05:00',
...
'2023-01-04 00:00:00-05:00', '2023-01-05 00:00:00-05:00',
'2023-01-06 00:00:00-05:00', '2023-01-07 00:00:00-05:00',
'2023-01-08 00:00:00-05:00', '2023-01-09 00:00:00-05:00',
'2023-01-10 00:00:00-05:00', '2023-01-11 00:00:00-05:00',
'2023-01-12 00:00:00-05:00', '2023-01-13 00:00:00-05:00'],
dtype='datetime64[ns, America/New_York]', name='Date', length=2934, freq='D')

data = hist.dropna()
print(data.index)

DatetimeIndex(['2015-01-02 00:00:00-05:00', '2015-01-05 00:00:00-05:00',
'2015-01-06 00:00:00-05:00', '2015-01-07 00:00:00-05:00',
'2015-01-08 00:00:00-05:00', '2015-01-09 00:00:00-05:00',
'2015-01-12 00:00:00-05:00', '2015-01-13 00:00:00-05:00',
'2015-01-14 00:00:00-05:00', '2015-01-15 00:00:00-05:00',
...
'2022-12-30 00:00:00-05:00', '2023-01-03 00:00:00-05:00',
'2023-01-04 00:00:00-05:00', '2023-01-05 00:00:00-05:00',
'2023-01-06 00:00:00-05:00', '2023-01-09 00:00:00-05:00',
'2023-01-10 00:00:00-05:00', '2023-01-11 00:00:00-05:00',
'2023-01-12 00:00:00-05:00', '2023-01-13 00:00:00-05:00'],
dtype='datetime64[ns, America/New_York]', name='Date', length=2023, freq=None)

from skforecast.

Hussam1 avatar Hussam1 commented on May 18, 2024

Thanks @JavierEscobarOrtiz for the response. Problem is sometimes filling the "gap" might not be optimal or correct from business case prospective, for instance in this situation trading happen only in business days and it will distort the purpose to assume any results in the "gap" days.

even if we take hist = hist.asfreq("B") we will still have the gap. Or do you mean it is ok to leave them as null as long as there are no gaps in DateTimeIndex?

Edit:

I tested your answer, yes by filling the gap with any figures the error is gone but how would you solve the problem of having to fill the gap when it doesn't make sense business wise, for instance if a company doesn't have sales every day and you still need to model the daily sales? of course you can always resample till higher frequency but don't you think this is a limitation of the library?

from skforecast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.