Describe the problem We observe a query execution p

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you for the repro, <a class="user-mention notranslate" data-hovercard-type="user

opt: constant value projections prevent join reordering about cockroach HOT 7 OPEN

morphace commented on August 26, 2024

opt: constant value projections prevent join reordering

from cockroach.

Comments (7)

blathers-crl commented on August 26, 2024

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@cockroachdb/sql-queries (found keywords: optimizer,vectorized,plan)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

from cockroach.

morphace commented on August 26, 2024

I think the "real" issue might not be primarily the fact that it's doing FULL SCANs but that the rendering of the JSON seems to be performed on the unfiltered result of the joins...

from cockroach.

michae2 commented on August 26, 2024

Hi @morphace, thanks for opening an issue. It will be difficult to diagnose the problem without the ability to reproduce it. Could you either (a) boil it down to the simplest reproduction steps, including CREATE TABLE and INSERT statements needed to populate the data to reproduce, or (b) open a support ticket with a statement diagnostics bundle?

from cockroach.

morphace commented on August 26, 2024

Hi @michae2,

This is the most simple example I could create:

drop table if exists outer_table, outer_related_table, inner_table, inner_related_table;
create table outer_table
(
    id    string primary key,
    value int
);
create table outer_related_table
(
    id    string primary key,
    outer_sibling_id string,
    value int
);
create table inner_table
(
    id    string primary key,
    parent_id string,
    value int
);
create table inner_related_table
(
    id    string primary key,
    inner_sibling_id string,
    value int
);
create index idx_value on outer_table (value);
create index idx_outer_sibling_id on outer_related_table (outer_sibling_id);
create index idx_parent_id on inner_table (parent_id);
create index idx_inner_sibling_id on inner_related_table (inner_sibling_id);
alter table inner_table
    add constraint fk_inner_outer foreign key (parent_id) references outer_table (id);
alter table outer_related_table
    add constraint fk_inner_outer_sibling foreign key (outer_sibling_id) references outer_table (id);
alter table inner_related_table
    add constraint fk_inner_inner_sibling foreign key (inner_sibling_id) references inner_table (id);
insert into outer_table (id, value)
(select uuid_v4(), random() * 10000 from generate_series(1, 100000));
insert into outer_related_table (id, outer_sibling_id, value)
(select uuid_v4(), id, random() * 10000 from outer_table);
insert into inner_table (id, parent_id, value)
    (select uuid_v4(), id, random() * 10000 from outer_table);
insert into inner_table (id, parent_id, value)
    (select uuid_v4(), id, random() * 10000 from outer_table);
insert into inner_table (id, parent_id, value)
    (select uuid_v4(), id, random() * 10000 from outer_table);
insert into inner_related_table (id, inner_sibling_id, value)
    (select uuid_v4(), id, random() * 10000 from inner_table);
analyze outer_table;
analyze outer_related_table;
analyze inner_table;
analyze inner_related_table;

The query:

explain select
    (select json_agg(json_build_object('val', i.value))
     from inner_table i
     left join inner_related_table irt on irt.inner_sibling_id=i.id
     where parent_id=o.id)
from outer_table o
left join outer_related_table ort on ort.outer_sibling_id=o.id
where o.value > 5000

from cockroach.

morphace commented on August 26, 2024

When you look at the plan below, you'll see that the nested query is doing those full scans and probably rendering the JSON for all rows...?

distribution: local
vectorized: true

• render
│
└── • group (hash)
    │ estimated row count: 50,577"
    │ group by: rownum
    │
    └── • hash join (right outer)
        │ estimated row count: 153,174"
        │ equality: (parent_id) = (id)
        │
        ├── • render
        │   │
        │   └── • merge join (left outer)
        │       │ estimated row count: 300,000"
        │       │ equality: (id) = (inner_sibling_id)
        │       │ left cols are key
        │       │
        │       ├── • scan
        │       │     estimated row count: 300,000 (100% of the table; stats collected 1 minute ago)"
        │       │     table: inner_table@inner_table_pkey
        │       │     spans: FULL SCAN
        │       │
        │       └── • scan
        │             estimated row count: 300,000 (100% of the table; stats collected 1 minute ago)"
        │             table: inner_related_table@idx_inner_sibling_id
        │             spans: FULL SCAN
        │
        └── • ordinality
            │ estimated row count: 50,577"
            │
            └── • hash join (right outer)
                │ estimated row count: 50,577"
                │ equality: (outer_sibling_id) = (id)
                │ right cols are key
                │
                ├── • scan
                │     estimated row count: 100,000 (100% of the table; stats collected 2 minutes ago)"
                │     table: outer_related_table@idx_outer_sibling_id
                │     spans: FULL SCAN
                │
                └── • scan
                      estimated row count: 50,101 (50% of the table; stats collected 2 minutes ago)"
                      table: outer_table@idx_value
                      spans: [/5001 - ]

from cockroach.

michae2 commented on August 26, 2024

Thank you for the repro, @morphace. I've boiled it down a little further to:

CREATE TABLE ab (a INT PRIMARY KEY, b INT);
CREATE TABLE cd (c INT PRIMARY KEY, d INT);
CREATE TABLE ef (e INT PRIMARY KEY, f INT);

-- able to plan lookup joins
EXPLAIN SELECT (SELECT json_agg(f) FROM cd@{NO_FULL_SCAN} LEFT JOIN ef@{NO_FULL_SCAN} ON e = d WHERE c = b) FROM ab WHERE a > 5;

-- change json_agg to json_object_agg and we cannot produce query plan without FTS
EXPLAIN SELECT (SELECT json_object_agg('val', f) FROM cd@{NO_FULL_SCAN} LEFT JOIN ef@{NO_FULL_SCAN} ON e = d WHERE c = b) FROM ab WHERE a > 5;

-- remove the left join and we can plan a lookup join
EXPLAIN SELECT (SELECT json_object_agg('val', d) FROM cd@{NO_FULL_SCAN} WHERE c = b) FROM ab WHERE a > 5;

from cockroach.

mgartner commented on August 26, 2024

In order to get a plan with two lookup-joins, we need to reorder the joins to be: (Join (Join ab cd) ef). This enables a plan where we scan ab first, then use the values in column b to lookup rows of cd where c = b, then use the values in column d to lookup rows of ef where e = d.

In the json_object_agg case, the 'val' literal requires a projection expression above the join of cd and ef, so the expression looks like (Join (Project (Join cd ef)) ab). The optimizer won't reorder these two joins because they are not adjacent—due to the Project. Since the joins are not reordered, the lookup joins cannot be explored.

If we change the 'val' literal to a column reference, the projection is omitted, and we get the desired plan:

CREATE TABLE ab (a INT PRIMARY KEY, b INT);
CREATE TABLE cd (c INT PRIMARY KEY, d INT);
CREATE TABLE ef (e INT PRIMARY KEY, f INT, s STRING);

EXPLAIN
SELECT (
  SELECT json_object_agg(s, f)
  FROM cd@{NO_FULL_SCAN}
  LEFT JOIN ef@{NO_FULL_SCAN}
  ON e = d WHERE c = b
)
FROM ab WHERE a > 5;
--                    info
-- ------------------------------------------
--   distribution: local
--   vectorized: true
-- 
--   • render
--   │
--   └── • group (streaming)
--       │ group by: a
--       │ ordered: +a
--       │
--       └── • lookup join (left outer)
--           │ table: ef@ef_pkey
--           │ equality: (d) = (e)
--           │ equality cols are key
--           │
--           └── • lookup join (left outer)
--               │ table: cd@cd_pkey
--               │ equality: (b) = (c)
--               │ equality cols are key
--               │
--               └── • scan
--                     missing stats
--                     table: ab@ab_pkey
--                     spans: [/6 - ]
-- (23 rows)

In the original example, I believe it is other literals, like 'f0', that cause the same problem.

Some potential solutions I can think of are:

Add an exploration rule that pulls Projects that produce a constant value up out of joins when it allows more joins to be adjacent. I'm not sure how generally applicable this is because I think there are likely cases where this would change the semantics of the query plan.
Allow the join reorderer to traverse Projects that produce a constant value. Again, not sure how generally applicable this is.
Could we somehow "curry" the json_object_agg('val', f) function into json_object_agg_curried(f) to eliminate the Project of the literal 'val'? It's an interesting idea, but I don't think it would work in the original example above where the literal values are part of a single argument expression, not the entire argument, e.g., json_build_object('f0', '' || material11_.id, 'f1', '' || material11_.availability_date, ...).

The workaround, for now, is to rewrite the query with the correct join ordering and no subqueries:

CREATE TABLE ab (a INT PRIMARY KEY, b INT);
CREATE TABLE cd (c INT PRIMARY KEY, d INT);
CREATE TABLE ef (e INT PRIMARY KEY, f INT);

EXPLAIN
SELECT json_object_agg('val', f)
FROM ab
LEFT JOIN cd@{NO_FULL_SCAN} ON c = b
LEFT JOIN ef@{NO_FULL_SCAN} ON e = d
WHERE a > 5;
--                    info
-- ------------------------------------------
--   distribution: local
--   vectorized: true
-- 
--   • group (scalar)
--   │
--   └── • render
--       │
--       └── • lookup join (left outer)
--           │ table: ef@ef_pkey
--           │ equality: (d) = (e)
--           │ equality cols are key
--           │
--           └── • lookup join (left outer)
--               │ table: cd@cd_pkey
--               │ equality: (b) = (c)
--               │ equality cols are key
--               │
--               └── • scan
--                     missing stats
--                     table: ab@ab_pkey
--                     spans: [/6 - ]
-- (21 rows)

from cockroach.

opt: constant value projections prevent join reordering about cockroach HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent