Comments (8)
Maybe something like this (not tested) will solve it?
Yes, @yarnabrina, I think your test is the correct expectation. But it seems to me that it would be violated by many current forecasters, as @Vasudeva-bit has observed by accident.
Also, in case it is not the current behaviour, we need to run it through a deprecation period to move it to the one that seems more logical - people might have adapted to it downstream, and we do not want to break their code.
from sktime.
Could we return forecasts as
pd.DataFrame
always, irrespective of whether it is a univariate or multi-variate forecaster?
That is an interface decision we have already discussed a number of times, but then the question becomes, what do we do with other data containers, such as dask
or np.ndarray
- coerce to pd.DataFrame
as well? If not, then only pd.Series
gets special treatment here...
Not saying that it would be a bad choice, but it would be an exception from the "support multiple formats" that we currently have.
from sktime.
Having said that, the issue you highlighted originally seems to be an inconsistency we may like to fix - ahtough it requires deprecation.
https://www.sktime.net/en/stable/developer_guide/deprecation.html
Could you perhaps open a PR which has only your originally proposed fix (changes to test and arch), so we can use this as a starting point?
I would use that to look into the problem (domino effects on other estimators) and build a deprecation strategy around it.
from sktime.
I opened a PR #5384 with originally proposed fix.
from sktime.
I think this is related to #4709 and #4780.
@Vasudeva-bit I think the issue is y_train
may not always have a name
attribute that is non-empty. May be something like this (not tested) will solve it?
if isinstance(y_train, pd.Series) and y_train.name:
assert (pred_cols == y_train.name).all()
elif isinstance(y_train, pd.Series): # automatically implies empty name
assert (pred_cols == pd.Index([0])).all()
else:
assert (pred_cols == y_train.columns).all()
from sktime.
Hm, I think not as closely as one might suspect.
The two PRs and related design problems were restricted to interval and quantile forecasts.
I think it is more related to "should we use name
as col
or not if a conversion to a DataFrame
-like type, from pd.Series`, happens", like the journey of suffering in these PR:
#3290
#3297
#3322
#3323
#4150
#4155
#4157
The root of evil is really that depending on how you convert or coerce pd.Series
to pd.DataFrame
, one of two things might happen: the name
appears now as the single column name, or it gets purged and instead the integer 0
appears as single column name. It is one of the idiosyncrasies of pandas
, and the "how" matters.
Unfortunately this also needs to be consistent in many places, conversions, back-conversions, etc. While the original problem - I hope - has been resolved, it seems to have reappeared with the skpro
distribution output, which is strongly pd.DataFrame
inspired, and has similar index
and columns
attributes.
from sktime.
@Vasudeva-bit I think the issue is
y_train
may not always have aname
attribute that is non-empty. May be something like this (not tested) will solve it?
A few tests are expecting name to be preserved even if it is None
, here. Thats why I didn't handle the empty case separately. But as suggested, handling empty name None
separately seems reasonable.
Could we return forecasts as pd.DataFrame
always, irrespective of whether it is a univariate or multi-variate forecaster?
This way we can test for only one single type. And, while returning the predicted forecasts from _base.py
, we could convert pd.DataFrame
to pd.Series
for univariate forecasters. When a pd.DataFrame
is converted to pd.Series
, the name of pd.Series
would be same as pd.DataFrame
's name (attribute name
, not columns
).
from sktime.
Oh, I get it. It's better not to treat them specially.
from sktime.
Related Issues (20)
- [ENH] improvements to `BaggingForecaster`
- [MNT] Test failures on `main` due to `deep_equals` failing on `sklearn` `RandomForestRegressor.__len__` HOT 23
- [BUG] `TSCGridSearchCV.score` - `test_multioutput` fails as `predict` returned NaN values HOT 1
- Custom Cross-Validation Parameters in ForecastingGridSearchCV Cannot Be Tuned via param_grid HOT 6
- [ENH] behaviour of bootstrappers and series-to-panel transformers on panels
- [BUG] Global Forecasting with `ReconcilerForecaster` and `LightGBMRegressor `errors/help. HOT 4
- [ENH] convenient user-facing tag for native scitypes supported by forecasters or transformers HOT 1
- [BUG] `FeatureUnion` raises error with transformers with primitive output HOT 3
- [ENH] Adding a `test_mstl` module primarily to check if `transform` returns the correct components HOT 1
- [ENH] Update `MSTL` import in `sktime.transformations.series.detrend` HOT 7
- [BUG] make_reduction returns NaN on specific time series when using pooling = "local"' HOT 4
- [Question] Cudf Support for forecasting HOT 1
- [ENH] umbrella issue - support for data container types
- [ENH] Address Dask Dataframe Deprecation HOT 2
- [BUG] Tests failing on main after recent `tensorflow` and `mlflow` releases HOT 6
- [BUG] Exchange optimizers for Deep Learning Classifiers HOT 3
- [ENH] move and consolidate model serialization format file specification to `sktime` file specifications docs HOT 2
- [BUG] `estimator_overview_table.md` generated outside of source folder in local build of documentation HOT 3
- [ENH] Enhancement for building documentation locally HOT 2
- [ENH] umbrella issue - categorical feature support HOT 72
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sktime.