Comments (3)
I had previously checked to see if Polars had an enumerate()
function.
(Although in this case, I guess it would be .list.enumerate()
)
df.with_columns(
pl.col("val").list.eval(
pl.struct(
index = pl.cum_count(),
value = pl.element()
)
)
)
# shape: (2, 2)
# ┌─────┬─────────────────────────────┐
# │ num ┆ val │
# │ --- ┆ --- │
# │ i64 ┆ list[struct[2]] │
# ╞═════╪═════════════════════════════╡
# │ 1 ┆ [{1,"a"}, {2,"c"}, {3,"e"}] │
# │ 2 ┆ [{1,"b"}, {2,"d"}, {3,"f"}] │
# └─────┴─────────────────────────────┘
Just thought I'd mention it as it could be useful as a general addition instead of special casing .explode
from polars.
@cmdlineluser - True, this is a lot like python's enumerate
. Probably also a more recognisable / discoverable name than "with ordinality", haha.
Thank you very much for the code snippet and the suggestion - definitely on board with more generalised and composable, rather than special cases for only certain functions.
You inspired me to do a basic implementation!
In order to get a list of structs (rather than a struct of lists), list.eval
also does the trick.
That way both enumerate
s return a struct.
Not really sold on which is better as a default (list of structs or struct of lists) though, will give it some more thought.
def enumerate(self, name: str = "index") -> pl.Expr:
"""Args: name (str, optional): Name of the index column. Defaults to "index"."""
return pl.struct(
pl.int_range(pl.count()).alias(name),
self,
).alias(self.meta.output_name())
def list_enumerate(self, name: str = "index") -> pl.Expr:
return pl.struct(
pl.int_ranges(0, self.list.len()).alias(name),
self,
).alias(self.meta.output_name())
pl.Expr.enumerate = enumerate
# Can't figure out how to monkey patch this onto the list namespace, but not the point
pl.Expr.list_enumerate = list_enumerate
df = pl.DataFrame({"num": [1, 2], "val": [["a", "c", "e"], ["b", "d", "f"]]})
print(
df.select(
"num",
pl.col("val").enumerate().alias("plain_enumerate"),
pl.col("val").list_enumerate().alias("list_enumerate"),
pl.col("val").list.eval(pl.element().enumerate()).alias("eval_enumerate"),
)
# then to get the data into a completely flat format, do one of these
# .unnest("list_enumerate").explode("index", "val")
# the "val" col name is lost in the "eval_enumerate" because of `pl.element()` - will open an issue
# .explode("eval_enumerate").unnest("eval_enumerate")
)
# shape: (2, 4)
# ┌─────┬─────────────────────┬─────────────────────────────┬─────────────────────────────┐
# │ num ┆ plain_enumerate ┆ list_enumerate ┆ eval_enumerate │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ struct[2] ┆ struct[2] ┆ list[struct[2]] │
# ╞═════╪═════════════════════╪═════════════════════════════╪═════════════════════════════╡
# │ 1 ┆ {0,["a", "c", "e"]} ┆ {[0, 1, 2],["a", "c", "e"]} ┆ [{0,"a"}, {1,"c"}, {2,"e"}] │
# │ 2 ┆ {1,["b", "d", "f"]} ┆ {[0, 1, 2],["b", "d", "f"]} ┆ [{0,"b"}, {1,"d"}, {2,"f"}] │
# └─────┴─────────────────────┴─────────────────────────────┴─────────────────────────────┘
from polars.
Giving what I wrote earlier some more thought:
- like
df.with_row_index
and python's builtinenumerate
, it would be worthwhile adding anoffset
parameter to start at a number other than 0 - the plain
enumerate
may not really seem super useful on its own, but does offer good utility when applied insidelist.eval
from polars.
Related Issues (20)
- Implement min-max predicate pushdown optimisation through joins (from DuckDB)
- `df.write_excel` does not work with file objects HOT 2
- Broken API links in the user guide (404 page not found) + stale documentation example (fetch function) HOT 4
- Performance of register_plugin_function HOT 7
- `include_index` ignored in `pl.from_pandas(pd.Series)`
- `map_batches` isn't streaming with new streaming engine HOT 2
- Move the option to overwrite field names to be responsibility of the `new_columns` parameter
- Adding Literal column of Enum dtype fails (for small DataFrame only) HOT 1
- Parquet writer statistics for categorical/enum values have overall min/max instead of row_group min/max HOT 1
- Parquet scanner doesn't do predicate pushdown for categoricals/enums HOT 1
- `.eq_missing()` returns `null` instead of bool for structs
- Support comparison operations for `list` types HOT 2
- Casting String lit to Categorical HOT 2
- Cargo.lock has not been updated for py-polars 1.8.1 HOT 1
- Add an `is_not_in` expression HOT 1
- Inconsistent default parameter in polars Rust vs Python rolling window
- `reshape` + `.arr.to_struct()` capacity overflow PanicException
- from_dicts without strict=false can result in silent data loss with ragged data HOT 1
- Add `polars` to SQL query translation for databases (like `dbplyr`).
- `DataFrame.to_dicts` change str values HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.