Comments (3)
marked release blocker. not because join pushdown in SQL server is so important (it is not), but because of the engine side of the problem.
the planner apparently puts
n.nationkey = c.nationkey AND n.regionkey IS DISTINCT FROM c.custkey
as a join condition.
but then, sometimes after ReorderJoin
, the IS DISTINCT FROM
moves to FilterNode
above the JoinNode
.
the plan is equivalent, but
- some pushdowns may start or stop to be applied
- when pushdown doesn't happen, the join operator will produce more data, unless we put the filter back into JoinNode.
from trino.
The non-determinism is because the SQL server connector is returning stats non-deterministically and that the connector is handling pushdown inconsistently for filter(join(r, l), <f>)
vs join(r, l, <f>)
.
If I recall correctly, for the first execution, the connector doesn't return column stats. For the second one, it does. This results in a different join order and slightly different query plan that the connector refuses to push down.
from trino.
Yes, the plan are different.
When SQL Server connector is offered a join pushdown with n.nationkey = c.nationkey AND n.regionkey IS DISTINCT FROM c.custkey
condition, it rejects the offer (it does not support IS DISTINCT FROM predicates).
When ReorderJoins kicks in, the join condition becomes n.nationkey = c.nationkey
and the resst (n.regionkey IS DISTINCT FROM c.custkey
) becomes a filter above. When SQL Server connector is offered a join pushdown with n.nationkey = c.nationkey
, it can accept it (cost-based, or eaerly depending on configuration).
Both plans are obviously equivalent (produce same relations) and correct. The fact that engine sometimes presents more and sometimes fewer join conditions to the connector is something undesirable.
from trino.
Related Issues (20)
- Wanted to understand on hive.max-partitions-per-writers property HOT 1
- Trino Iceberg Timestamp with Timezone Query Rendering/Display Enhancment HOT 1
- Delta Lake S3 connector parallelism and network speed falls to 0
- `try_cast(parse_json(...))` incorrectly throws error in some cases HOT 3
- Add support for `BOOLEAN`, `TIMESTAMP`, and `VARBINARY` types for data column in Delta Lake connector `$partitions` system table
- PruneTableScanColumns throws an exception when column name contains only non-alphanumeric HOT 1
- Fix broken `pt (default, suite-databricks-unity-http-hms, )`
- Hive connector: SELECT * works, but SELECT on specific column doesn't - NullPointerException
- [Trino-420] Relevant Error message should be shown when clientTags is not passed while query is run
- Pinot connector has no way to pass query option along with query HOT 1
- Trino Query getting Hung after reading 400~600M rows (10~12GB data) of Parquet from object storage HOT 3
- Flaky test TestHiveTransactionalTable.testLargePartitionedUpdate: Hive TEZ failure: `Vertex did not succeed due to OWN_TASK_FAILURE`
- Invalid position 2 in block with 2 positions with multiple filters on array(varchar) HOT 3
- Metadata listing fails when Glue database (schema) dropped concurrently HOT 1
- Correctness issue for predicate pushdown on approx numeric column in Postgres
- Improve stats reporting for group by operator
- how to access mongo object?
- ColumnMask is enabled before aggregation, rather than after aggregation. HOT 1
- OAuth userinfo should be cached
- Trino postgres connector does not support modifying table rows HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trino.