Comments (2)
Can reproduce.
1000
seems to the be minimum number of rows needed for it to error on my system.
(pl.read_csv("sales_data.csv").head(1000)
.select("Country", "Profit")
.group_by("Country")
.agg(
(pl.col("Profit") > 1000).alias("Profit > 1000")
)
)
# ComputeError: returned aggregation is of different length: 125 than the groups length: 6
.head(999)
works as expected.
from polars.
There is a bug here, but I think there is also confusion about what you're trying to do.
Calling .agg(pl.col("Profit"))
returns a list of all profits--one for each country value:
>>> df.group_by("Country").agg(col("Profit"))
shape: (6, 2)
┌────────────────┬─────────────────────┐
│ Country ┆ Profit │
│ --- ┆ --- │
│ str ┆ list[i64] │
╞════════════════╪═════════════════════╡
│ Canada ┆ [590, 590, … 630] │
│ Germany ┆ [160, 53, … 746] │
│ Australia ┆ [1366, 1188, … 655] │
│ United Kingdom ┆ [1053, 1053, … 112] │
│ France ┆ [427, 427, … 655] │
│ United States ┆ [524, 407, … 542] │
└────────────────┴─────────────────────┘
Calling col("Profit") > 1000
then returns lists of True/False values:
df.group_by("Country").agg(col("Profit") > 1000))
shape: (6, 2)
┌────────────────┬─────────────────────────┐
│ Country ┆ Profit │
│ --- ┆ --- │
│ str ┆ list[bool] │
╞════════════════╪═════════════════════════╡
│ Germany ┆ [false, false, … false] │
│ United States ┆ [false, false, … false] │
│ United Kingdom ┆ [true, true, … false] │
│ Canada ┆ [false, false, … false] │
│ France ┆ [false, false, … false] │
│ Australia ┆ [true, true, … false] │
└────────────────┴─────────────────────────┘
This is probably not what you intended (my guess is you meant col("Profit").sum() > 1000
). But regardless, here is where the alias fails:
df.group_by("Country").agg((col("Profit") > 1000).alias("test"))
polars.exceptions.ComputeError: returned aggregation is of different length: 193 than the groups length: 2
This appears to be failing because the lists are of different lengths. The 193 changes if you rerun the command over and over.
This error doesn't occur if you simply aggregate into a list, but once you do another operation, the alias appears to fail.
from polars.
Related Issues (20)
- Apologies for a Non-issue
- Incorrectly identifies list[f64] column as Object when one of the inner lists contains Int HOT 4
- UInt16/Uint8 using by schema to create empty dataframe got error
- Support Array `zero-copy` from numpy array. HOT 1
- sink parquet make the disk full after spilling, even for small data HOT 3
- [Python] Shouldn't the argument name `include_key` of `polars.DataFrame.partition_by` be changed to `include_keys`? HOT 4
- Allow Series in `pl.concat(..., how="horizontal")` HOT 9
- Add `.cat.to_enum()` for fast Categorical -> Enum casting HOT 7
- `when`/`then`/`otherwise` silently converts values to nulls for `Enum` series HOT 4
- Certain String Series results in getting rust panics when when `s.unique()`. HOT 2
- `count_rows` returns incorrect row count for large CSV files HOT 6
- dt.truncate may panic on non-existent datetimes
- pl.read_csv can cause invalid UTF-8 strings to be generated HOT 7
- Add union `union`/`or` operator to combine Enums HOT 1
- Regression on v0.38 HOT 2
- replace_time_zone with ambiguous with single null value panics
- Automatically guess separators HOT 4
- Expression panics instead of showing ColumnNotFoundError
- use unstable feature 'arc_unwrap_or_clone' HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.