Giter VIP home page Giter VIP logo

Comments (8)

fkiraly avatar fkiraly commented on June 2, 2024 1

Maybe something like this (not tested) will solve it?

Yes, @yarnabrina, I think your test is the correct expectation. But it seems to me that it would be violated by many current forecasters, as @Vasudeva-bit has observed by accident.

Also, in case it is not the current behaviour, we need to run it through a deprecation period to move it to the one that seems more logical - people might have adapted to it downstream, and we do not want to break their code.

from sktime.

fkiraly avatar fkiraly commented on June 2, 2024 1

Could we return forecasts as pd.DataFrame always, irrespective of whether it is a univariate or multi-variate forecaster?

That is an interface decision we have already discussed a number of times, but then the question becomes, what do we do with other data containers, such as dask or np.ndarray - coerce to pd.DataFrame as well? If not, then only pd.Series gets special treatment here...

Not saying that it would be a bad choice, but it would be an exception from the "support multiple formats" that we currently have.

from sktime.

fkiraly avatar fkiraly commented on June 2, 2024 1

Having said that, the issue you highlighted originally seems to be an inconsistency we may like to fix - ahtough it requires deprecation.
https://www.sktime.net/en/stable/developer_guide/deprecation.html

Could you perhaps open a PR which has only your originally proposed fix (changes to test and arch), so we can use this as a starting point?

I would use that to look into the problem (domino effects on other estimators) and build a deprecation strategy around it.

from sktime.

Vasudeva-bit avatar Vasudeva-bit commented on June 2, 2024 1

I opened a PR #5384 with originally proposed fix.

from sktime.

yarnabrina avatar yarnabrina commented on June 2, 2024

I think this is related to #4709 and #4780.

@Vasudeva-bit I think the issue is y_train may not always have a name attribute that is non-empty. May be something like this (not tested) will solve it?

if isinstance(y_train, pd.Series) and y_train.name:
    assert (pred_cols == y_train.name).all()
elif isinstance(y_train, pd.Series):  # automatically implies empty name
    assert (pred_cols == pd.Index([0])).all()
else:
    assert (pred_cols == y_train.columns).all()

from sktime.

fkiraly avatar fkiraly commented on June 2, 2024

I think this is related to #4709 and #4780.

Hm, I think not as closely as one might suspect.

The two PRs and related design problems were restricted to interval and quantile forecasts.

I think it is more related to "should we use name as col or not if a conversion to a DataFrame-like type, from pd.Series`, happens", like the journey of suffering in these PR:
#3290
#3297
#3322
#3323
#4150
#4155
#4157

The root of evil is really that depending on how you convert or coerce pd.Series to pd.DataFrame, one of two things might happen: the name appears now as the single column name, or it gets purged and instead the integer 0 appears as single column name. It is one of the idiosyncrasies of pandas, and the "how" matters.

Unfortunately this also needs to be consistent in many places, conversions, back-conversions, etc. While the original problem - I hope - has been resolved, it seems to have reappeared with the skpro distribution output, which is strongly pd.DataFrame inspired, and has similar index and columns attributes.

from sktime.

Vasudeva-bit avatar Vasudeva-bit commented on June 2, 2024

@Vasudeva-bit I think the issue is y_train may not always have a name attribute that is non-empty. May be something like this (not tested) will solve it?

A few tests are expecting name to be preserved even if it is None, here. Thats why I didn't handle the empty case separately. But as suggested, handling empty name None separately seems reasonable.

Could we return forecasts as pd.DataFrame always, irrespective of whether it is a univariate or multi-variate forecaster?

This way we can test for only one single type. And, while returning the predicted forecasts from _base.py, we could convert pd.DataFrame to pd.Series for univariate forecasters. When a pd.DataFrame is converted to pd.Series, the name of pd.Series would be same as pd.DataFrame's name (attribute name, not columns).

from sktime.

Vasudeva-bit avatar Vasudeva-bit commented on June 2, 2024

Oh, I get it. It's better not to treat them specially.

from sktime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.