Comments (7)
I can see a usecase though where you would want to perform operations first and then freeze the categories by casting from a Categorical
to a Enum
.
from polars.
It would be even nicer if the bins are correctly ordered when passed to enum
from polars.
Yes, I agree with @markxwang here. Enum is an orderded datatype, for cut
I would much rather return an Enum
directly than a categorical. See #13038.
from polars.
@c-peters I see you self-assigned. I have a working impl, if you haven't started already should I just submit a PR?
from polars.
After some more thought. We could allow this on Series (materialized data), but not on expressions. Categories are inside our Enum
Datatype and are required for reasoning about the schema (correctness). If our Enum categories are dynamic then we can not reason up front if our query is correct which is an important property in our LogicalPlan.
from polars.
@c-peters I agree with you, and now I'm even hesitant to add this for Series
, since the only reason is to workaround existing API issues where a categorical is returned instead of an enum.
Also, if we add an Enum
as a return dtype to (for example) pl.cut
, then we again are going to have a really hard time (read: impossible) propagating the schema in an expression context. What if someone uses pl.cut
in a when/then and provides alternative values to what cut
returns? Let's keep as categorical and reserve enums exclusively for static categoricals.
Do you agree here? I vote we just close and I'll cancel my PR.
from polars.
For pl.cut
we do know the categories as they are the labels
by the user or a derivative of the breaks
parameter. So the schema would still be known after the operation.
But I agree, let's keep the Enums simple for now and exclusively use it for statics
from polars.
Related Issues (20)
- Obfuscate/censor AccessKey in lazy serialize by default (have parameter to leave in the clear).
- Regression 0.20.15->0.20.16: ComputeError: conversion from `null` to `struct[100]` failed in column 'literal' for 0 out of 1 values HOT 1
- Casting a column to pl.Categorical is way slower than pandas (10-20x) HOT 2
- Minimal memory usage tests for read_ipc read_ipc_stream
- Forward_fill() and backward_fill() is about 25% slower in polars compared to pandas' counterparts HOT 6
- Rust code examples missing on page /user-guide/io/cloud-storage/#scanning-from-cloud-storage-with-query-optimisation
- Offset_by is about 4 times slower in polars compared to pandas' counterpart HOT 2
- `filter` + `arg_max` + `over` producing non-deterministic junk values HOT 1
- pyo3_runtime.PanicException: python function failed: PyErr { type: <class 'TypeError'>, value: TypeError("'list' object is not callable"), traceback: None } HOT 1
- support expressions in `Frame.unique()` HOT 2
- Read Options for Calamine HOT 1
- Support for pl.List('*') HOT 4
- See the polars df in Pycharm HOT 5
- schema_overrides failing HOT 4
- .over() performs quite slow in given sample HOT 4
- `.backward_fill()` does not consider `np.nan` to be invalid HOT 2
- from_arrow.consuming large memory HOT 2
- Inconsistent behavior with dataframe level arithmetic when using Python's `sum` HOT 1
- `read_csv` ignores `skip_rows_after_header` when `use_pyarrow=True` HOT 1
- group_by sum agg returns nulls for decimal columns in dataframes with 1000+ rows HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.