Comments (9)
We will rename the parameter to reverse
in top_k
and bottom_k
.
from polars.
Confusing in general. I would expect the descending
parameter to change the sorting of the output, not what the "top" values are. We have a bottom_k()
function for that already.
from polars.
not what the "top" values are. We have a bottom_k() function for that already.
you can pass multiple values to descending
though, so bottom_k
isn't a full replacement:
In [44]: df = pl.DataFrame({'a': [1, 2, 3], 'b': [6, 5, 4], 'c': [7,8,9]})
In [45]: df.top_k(k=2, by=['a', 'b'], descending=[True, False])
Out[45]:
shape: (2, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ 7 │
│ 2 ┆ 5 ┆ 8 │
└─────┴─────┴─────┘
from polars.
I see. Seems like a single function called extrema
would work better, but people will undoubtedly be searching for top_k
and bottom_k
since those are pretty prevalent in other libraries.
from polars.
Yup...which leaves open the question of "what to do with descending
?". I don't know 😄
from polars.
A somewhat more explicit but verbose would be a sort_method
parameter that's a list of "ascending" and "descending":
import polars as pl
df = pl.DataFrame({
"a": [1, 1, 2, 2, 3, 3],
"b": [2, 1, 2, 1, 2, 1],
})
# (a=3, b=2)
df.top_k(k=1, by=pl.all())
# (a=3, b=1)
df.top_k(k=1, by=pl.all(), sort_method=["ascending", "descending"])
# (a=1, b=1)
df.bottom_k(k=1, by=pl.all())
# (a=1, b=2)
df.bottom_k(k=1, by=pl.all(), sort_method="descending", "ascending")
from polars.
And actually, just renaming descending
to sort_descending
would be more obvious. And then the bottom_k
parameter could be sort_ascending
.
from polars.
I don't think it can just be renamed, because then the output of sort_descending=False
actually would be sorted descending
I feel like suggesting find_smallest
, so then code can just be updated without other changes?
Current behaviour:
In [46]: import polars as pl
...: df = pl.DataFrame({'a': [1,3,2]})
...: df.top_k(k=3, by='a', descending=False)
Out[46]:
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 3 │
│ 2 │
│ 1 │
└─────┘
Proposed new api:
In [46]: import polars as pl
...: df = pl.DataFrame({'a': [1,3,2]})
...: df.top_k(k=3, by='a', find_smallest=False)
Out[46]:
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 3 │
│ 2 │
│ 1 │
└─────┘
from polars.
I ran across this yesterday in a different context - it looks to me like the underlying Rust code is doing it backwards. Here's where the Python code calls into the Rust, if I understand it correctly:
polars/py-polars/polars/lazyframe/frame.py
Line 1550 in 31df06d
To me it seems like the Python code reflects the Rust API and is faithfully passing descending
through.
I don't like this behavior but it feels like the Python and Rust code should be kept consistent, so my temptation would be to look in to addressing the problem upstream of the Python bindings.
I'm personally not actually using Python at all except as an example (we're looking at the Scala JNI bindings to the Rust backend) but my coworkers would want to use the Python bindings. It would be excellent for us to be able to be on the same page with respect to the API shape.
from polars.
Related Issues (20)
- Filter doesn't return correct rows HOT 1
- Jemalloc is still used when it is turned off with "--cfg allocator=mimalloc" HOT 1
- Index out of bounds for unequal number of chunks across columns HOT 3
- Maybe the version of polars of user guide should be specified. HOT 2
- Jemalloc inclusion condition is wrong: Jemalloc is used when not(allocator = "default")
- Support reading byte stream split encoded Parquet data
- hive partitioning predicate isn't applied before reading HOT 11
- Element-wise multiplication & division of `pl.Decimal` silently fail and produce inconsistent results HOT 1
- [typing] Boolean operators like `Series.__xor__` do not support scalar operands. HOT 1
- Index out of bounds panic caused by specific combination of window functions HOT 2
- Feature Request: add to `df.write_ndjson(json_columns: list[str])`, for columns to be decoded and written out as JSON HOT 2
- sql conbine CTE and cross join leads to internal error
- Expression/context evaluation bug HOT 3
- Expose `coalesce` option to asof joins
- Nested struct column is null after pivoting DataFrame
- Panic when glob scanning with two files with different schemas HOT 2
- `quantile` fails on various numeric edge cases
- Tracking Issue: Utilizing and Keeping track of Metadata
- `.last()` can't be used on LazyGroupBy HOT 2
- Panic when doing an invalid melt HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.