Comments (5)
For the moment, I use this temporary fix before the group by:
first_timestamp = df["t"][0]
floored = floor_timestamp(timestamp=first_timestamp, offset=5)
prefix = pl.DataFrame(
data={
"t": pl.Series([floored]),
"v": [0.0],
}
)
df = prefix.vstack(df)
from polars.
Alternatively, you could do
...: resampled = df.group_by_dynamic(
...: index_column="t", every="1d", offset=timedelta(hours=5-24)
...: ).agg(
...: [
...: pl.sum("v").alias("v"),
...: ]
...: )
...:
...: print(resampled)
shape: (2, 2)
┌─────────────────────────┬──────┐
│ t ┆ v │
│ --- ┆ --- │
│ datetime[ms, UTC] ┆ i64 │
╞═════════════════════════╪══════╡
│ 2024-03-21 05:00:00 UTC ┆ 11 │
│ 2024-03-22 05:00:00 UTC ┆ 1100 │
└─────────────────────────┴──────┘
I'm wondering if Polars should do this for you. Currently the rule is
‘window’: Start by taking the earliest timestamp, truncating it with every, and then adding offset. Note that weekly windows start on Monday.
but maybe it would be more user-friendly to do
‘window’: Start by taking the earliest timestamp, truncating it with every, subtracting 'every', and then adding "offset". Note that weekly windows start on Monday.
from polars.
Thanks for the solution! It solves my problem when every
is also daily. But if every
is monthly, this offsets the label of the windows.
I believe that there is an assumption, just from the name of that method, that every row is taken into account.
Nevertheless, if you consider the behavior to be as expected, I think the documentation should be extra clear about that. For the moment, it only warns that "Different from a normal group by is that a row can be member of multiple groups." It should also says that rows may not belong to any window and be dropped.
You are right that the first window is computed as documented. But the major consequence that some rows may not belong to any window is implicit, and cannot be easily deduced. A warning could also be added.
from polars.
But if every is monthly, this offsets the label of the windows.
could you show an example please?
from polars.
Of course:
from datetime import UTC, datetime, timedelta
import polars as pl
df = pl.DataFrame(
data={
"t": pl.Series(
[
datetime(2024, 2, 28, 3, 0, tzinfo=UTC),
datetime(2024, 2, 29, 3, 0, tzinfo=UTC),
datetime(2024, 3, 1, 3, 0, tzinfo=UTC),
datetime(2024, 3, 2, 3, 0, tzinfo=UTC),
]
).dt.cast_time_unit("ms"),
"v": [1, 10, 100, 1000],
}
).set_sorted("t")
resampled = df.group_by_dynamic(
index_column="t", every="1mo", offset=timedelta(hours=5-24)
).agg(
[
pl.sum("v").alias("v"),
]
)
print(resampled)
shows:
shape: (2, 2)
┌─────────────────────────┬──────┐
│ t ┆ v │
│ --- ┆ --- │
│ datetime[ms, UTC] ┆ i64 │
╞═════════════════════════╪══════╡
│ 2024-01-31 05:00:00 UTC ┆ 11 │
│ 2024-02-29 05:00:00 UTC ┆ 1100 │
└─────────────────────────┴──────┘
while I would expect:
shape: (2, 2)
┌─────────────────────────┬──────┐
│ t ┆ v │
│ --- ┆ --- │
│ datetime[ms, UTC] ┆ i64 │
╞═════════════════════════╪══════╡
│ 2024-02-01 05:00:00 UTC ┆ 111 │
│ 2024-03-01 05:00:00 UTC ┆ 1000 │
└─────────────────────────┴──────┘
I get that latter result when offset=timedelta(hours=5)
.
from polars.
Related Issues (20)
- Missing example of usage for pl.map() HOT 1
- Added `normalize` parameter to `hist`
- different results between 0.19.8 and 0.20.31 HOT 2
- BigQuery integration incorrect for list type columns HOT 1
- Four failed tests for `make test` when working through Contributing Guide
- "has_header" in "read_options" for `read_excel` has inconsistent behavior across engines HOT 1
- pl.scan_iceberg looks at the current directory for the metadata file instead of the specified path
- Polars Hudi support
- Improve `Series.cut` / `Expr.cut`: Add number of bins as an alternative HOT 1
- read_csv_batched not working when separator is included in the field
- `LazyFrame.slice()` does not stream
- is_between in a filter for datetime gives incorrect results HOT 1
- refactor(rust): change operator overloading of `ChunkedArray`, `Series` and `ExprIR`
- `Expr.replace` equivalent for looking up a value in a sequence using an integer index column HOT 2
- Deprecation decorators messing with type checking HOT 1
- Add ``polars.Expr.is_sorted()``
- add support for ROW_NUMBER() RANK() window function in the SQL interface
- Reading delta table fails due to failing to cast time variable according to PyArrow parquet
- Odd formatting when a search in the docs contains part of the word "Examples" HOT 1
- `Expr.meta` function that computes the dtype of an expression
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.