Describe the bug All the tests in <code class="not

Could we return forecasts as pd.DataFrame</code

I opened a PR <a class="issue-link js-issue-link" data-error-text="Failed to load titl

I think this is related to <a class="issue-link js-issue-link" data-error-text="Failed

I think this is related to <a class="issue-link js-issue-link" data-error

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

[BUG] `test_predict_proba` is expecting column name of `pred_dist` to be `0` always. about sktime HOT 8 CLOSED

Vasudeva-bit commented on June 2, 2024

[BUG] `test_predict_proba` is expecting column name of `pred_dist` to be `0` always.

from sktime.

Comments (8)

fkiraly commented on June 2, 2024 1

Maybe something like this (not tested) will solve it?

Yes, @yarnabrina, I think your test is the correct expectation. But it seems to me that it would be violated by many current forecasters, as @Vasudeva-bit has observed by accident.

Also, in case it is not the current behaviour, we need to run it through a deprecation period to move it to the one that seems more logical - people might have adapted to it downstream, and we do not want to break their code.

from sktime.

fkiraly commented on June 2, 2024 1

Could we return forecasts as pd.DataFrame always, irrespective of whether it is a univariate or multi-variate forecaster?

That is an interface decision we have already discussed a number of times, but then the question becomes, what do we do with other data containers, such as dask or np.ndarray - coerce to pd.DataFrame as well? If not, then only pd.Series gets special treatment here...

Not saying that it would be a bad choice, but it would be an exception from the "support multiple formats" that we currently have.

from sktime.

fkiraly commented on June 2, 2024 1

Having said that, the issue you highlighted originally seems to be an inconsistency we may like to fix - ahtough it requires deprecation.
https://www.sktime.net/en/stable/developer_guide/deprecation.html

Could you perhaps open a PR which has only your originally proposed fix (changes to test and arch), so we can use this as a starting point?

I would use that to look into the problem (domino effects on other estimators) and build a deprecation strategy around it.

from sktime.

Vasudeva-bit commented on June 2, 2024 1

I opened a PR #5384 with originally proposed fix.

from sktime.

yarnabrina commented on June 2, 2024

I think this is related to #4709 and #4780.

@Vasudeva-bit I think the issue is y_train may not always have a name attribute that is non-empty. May be something like this (not tested) will solve it?

if isinstance(y_train, pd.Series) and y_train.name:
    assert (pred_cols == y_train.name).all()
elif isinstance(y_train, pd.Series):  # automatically implies empty name
    assert (pred_cols == pd.Index([0])).all()
else:
    assert (pred_cols == y_train.columns).all()

from sktime.

fkiraly commented on June 2, 2024

I think this is related to #4709 and #4780.

Hm, I think not as closely as one might suspect.

The two PRs and related design problems were restricted to interval and quantile forecasts.

I think it is more related to "should we use name as col or not if a conversion to a DataFrame-like type, from pd.Series`, happens", like the journey of suffering in these PR:
#3290
#3297
#3322
#3323
#4150
#4155
#4157

The root of evil is really that depending on how you convert or coerce pd.Series to pd.DataFrame, one of two things might happen: the name appears now as the single column name, or it gets purged and instead the integer 0 appears as single column name. It is one of the idiosyncrasies of pandas, and the "how" matters.

Unfortunately this also needs to be consistent in many places, conversions, back-conversions, etc. While the original problem - I hope - has been resolved, it seems to have reappeared with the skpro distribution output, which is strongly pd.DataFrame inspired, and has similar index and columns attributes.

from sktime.

Vasudeva-bit commented on June 2, 2024

@Vasudeva-bit I think the issue is y_train may not always have a name attribute that is non-empty. May be something like this (not tested) will solve it?

A few tests are expecting name to be preserved even if it is None, here. Thats why I didn't handle the empty case separately. But as suggested, handling empty name None separately seems reasonable.

Could we return forecasts as pd.DataFrame always, irrespective of whether it is a univariate or multi-variate forecaster?

This way we can test for only one single type. And, while returning the predicted forecasts from _base.py, we could convert pd.DataFrame to pd.Series for univariate forecasters. When a pd.DataFrame is converted to pd.Series, the name of pd.Series would be same as pd.DataFrame's name (attribute name, not columns).

from sktime.

Vasudeva-bit commented on June 2, 2024

Oh, I get it. It's better not to treat them specially.

from sktime.

[BUG] `test_predict_proba` is expecting column name of `pred_dist` to be `0` always. about sktime HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent