Comments (6)
We (currently) allow comparisons in the form of pl.col("") > | < | == | != literal
. Would there be a problem with the first case? ,like casting the datetime
first.filter(pl.col("date") == dt.date(2024, 2, 1).strftime(...)).explain()
Any expression that might alter the value of the column (e.g. pl.col("date").str.to_date()
) would significantly increase the complexity and overhead of hive partitioning as we would need to run the expression for every file instead of comparing the statistic in the path directly to the literal.
It would be nice if we can provide feedback, a warning in case a hive partitioned dataset is scanning the whole dataset
from polars.
Thanks, I edited the path when reading from data/mw/
to ./
from polars.
Ah, i guess i made the example slightly too small. What I am actually using, and what is most powerful (i think)
first.filter(pl.col("date").str.to_date().is_between(start, end)).explain()
Otherwise, to using the literal comparisons i need to do a loop over the required dates, and & them all together.
I currently do this in my environment, but it is pretty hacky.
from polars.
I agree this should be fixed. We first need to do proper schema inference on hive partitions. Once that is in place we can use a similar architecture we use for parquet statistic pruning for hive partitions.
from polars.
Awesome, is there an issue for tracking the schema inference? I can just follow along on that.
Thanks
from polars.
Not yet, I have created an issue for hive partition schema (#14838)
from polars.
Related Issues (20)
- `.struct.field()` after `shuffle()` seems to produce incorrect results HOT 2
- Multi-output, multi-sink lazy polars HOT 5
- Inconsistent Results Between Pandas and Polars using cut (and qcut)? HOT 3
- ExprStringNameSpace replace / replace_all literal flag ignored for dataframes with multiple rows
- `dt.round()` slow/fast path use different rounding HOT 2
- write_parquet with partition_by silently overwrites existing files HOT 7
- rank() on a Series of just 1 null assigns rank=1 to the null value. HOT 2
- read_ndjson ignores provided schema list inner types if values are inferred null HOT 1
- Dependencies not recognized HOT 5
- Inconsistent rolling results when using temporal windows HOT 5
- Polars ignoring rows that are empty in Excel HOT 1
- S3 credentials aren't loaded from `~/.aws/config` if equals aren't padded with spaces
- No non-strict creation of literals HOT 3
- PanicException: index: 8449 out of bounds for len: 1 when using scan csv with schema and include_file_paths
- Fail to compile polars 0.42.0
- Should `str.to_titlecase()` capitalize the letter after an apostrophe? HOT 5
- polars.read_database can not work with duckdb_engine connection. HOT 1
- Build fail HOT 6
- Improve decimal_comma error message
- Add Lateral Column Aliasing support for the SQL interface HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.