Giter VIP home page Giter VIP logo

Comments (3)

cmdlineluser avatar cmdlineluser commented on September 24, 2024

I had previously checked to see if Polars had an enumerate() function.

(Although in this case, I guess it would be .list.enumerate())

df.with_columns(
   pl.col("val").list.eval(
      pl.struct(
         index = pl.cum_count(),
         value = pl.element()
      )
   )
)

# shape: (2, 2)
# ┌─────┬─────────────────────────────┐
# │ num ┆ val                         │
# │ --- ┆ ---                         │
# │ i64 ┆ list[struct[2]]             │
# ╞═════╪═════════════════════════════╡
# │ 1   ┆ [{1,"a"}, {2,"c"}, {3,"e"}] │
# │ 2   ┆ [{1,"b"}, {2,"d"}, {3,"f"}] │
# └─────┴─────────────────────────────┘

Just thought I'd mention it as it could be useful as a general addition instead of special casing .explode

from polars.

henryharbeck avatar henryharbeck commented on September 24, 2024

@cmdlineluser - True, this is a lot like python's enumerate. Probably also a more recognisable / discoverable name than "with ordinality", haha.

Thank you very much for the code snippet and the suggestion - definitely on board with more generalised and composable, rather than special cases for only certain functions.

You inspired me to do a basic implementation!

In order to get a list of structs (rather than a struct of lists), list.eval also does the trick.
That way both enumerates return a struct.
Not really sold on which is better as a default (list of structs or struct of lists) though, will give it some more thought.

def enumerate(self, name: str = "index") -> pl.Expr:
    """Args: name (str, optional): Name of the index column. Defaults to "index"."""
    return pl.struct(
        pl.int_range(pl.count()).alias(name),
        self,
    ).alias(self.meta.output_name())


def list_enumerate(self, name: str = "index") -> pl.Expr:
    return pl.struct(
        pl.int_ranges(0, self.list.len()).alias(name),
        self,
    ).alias(self.meta.output_name())


pl.Expr.enumerate = enumerate
# Can't figure out how to monkey patch this onto the list namespace, but not the point
pl.Expr.list_enumerate = list_enumerate


df = pl.DataFrame({"num": [1, 2], "val": [["a", "c", "e"], ["b", "d", "f"]]})

print(
    df.select(
        "num",
        pl.col("val").enumerate().alias("plain_enumerate"),
        pl.col("val").list_enumerate().alias("list_enumerate"),
        pl.col("val").list.eval(pl.element().enumerate()).alias("eval_enumerate"),
    )
    # then to get the data into a completely flat format, do one of these
    # .unnest("list_enumerate").explode("index", "val")
    # the "val" col name is lost in the "eval_enumerate" because of `pl.element()` - will open an issue
    # .explode("eval_enumerate").unnest("eval_enumerate")
)

# shape: (2, 4)
# ┌─────┬─────────────────────┬─────────────────────────────┬─────────────────────────────┐
# │ num ┆ plain_enumerate     ┆ list_enumerate              ┆ eval_enumerate              │
# │ --- ┆ ---                 ┆ ---                         ┆ ---                         │
# │ i64 ┆ struct[2]           ┆ struct[2]                   ┆ list[struct[2]]             │
# ╞═════╪═════════════════════╪═════════════════════════════╪═════════════════════════════╡
# │ 1   ┆ {0,["a", "c", "e"]} ┆ {[0, 1, 2],["a", "c", "e"]} ┆ [{0,"a"}, {1,"c"}, {2,"e"}] │
# │ 2   ┆ {1,["b", "d", "f"]} ┆ {[0, 1, 2],["b", "d", "f"]} ┆ [{0,"b"}, {1,"d"}, {2,"f"}] │
# └─────┴─────────────────────┴─────────────────────────────┴─────────────────────────────┘

from polars.

henryharbeck avatar henryharbeck commented on September 24, 2024

Giving what I wrote earlier some more thought:

  • like df.with_row_index and python's builtin enumerate, it would be worthwhile adding an offset parameter to start at a number other than 0
  • the plain enumerate may not really seem super useful on its own, but does offer good utility when applied inside list.eval

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.