Comments (3)
@salted-yu Thanks for reporting this, we will look into it and get back to you when we have more information.
from neo4j.
Hi @salted-yu,
This is very similar to the problem in #13438:
Skewed data
Core of the problem is that we underestimate how often the pattern ()<-[:POSTS]-(u:User)-[:POSTS]->()
occurs: Query 1 expands this later in the plan, which is cheaper, while query 2 expands it early on.
We again underestimate the pattern because of the skewed nature of the dataset: For :User
we are far off:
PROFILE
MATCH ()<-[:POSTS]-(u:User)-[:POSTS]->()
RETURN count(*)
==================================================
+-------------------+----+------------------------------+----------------+---------+
| Operator | Id | Details | Estimated Rows | Rows |
+-------------------+----+------------------------------+----------------+---------+
| +ProduceResults | 0 | `count(*)` | 1 | 1 |
| | +----+------------------------------+----------------+---------+
| +EagerAggregation | 1 | count(*) AS `count(*)` | 1 | 1 |
| | +----+------------------------------+----------------+---------+
| +Filter | 2 | NOT anon_2 = anon_1 | 117 | 1442388 |
| | +----+------------------------------+----------------+---------+
| +Expand(All) | 3 | (u)-[anon_1:POSTS]->(anon_0) | 118 | 1444534 |
| | +----+------------------------------+----------------+---------+
| +Expand(All) | 4 | (u)-[anon_2:POSTS]->(anon_3) | 2146 | 2146 |
| | +----+------------------------------+----------------+---------+
| +NodeByLabelScan | 5 | u:User | 38986 | 38986 |
+-------------------+----+------------------------------+----------------+---------+
While when restricting to the dominating :Me
node, we are closer to the true result:
PROFILE
MATCH ()<-[:POSTS]-(u:Me)-[:POSTS]->()
RETURN count(*)
==================================================
+-------------------+----+------------------------------+----------------+---------+
| Operator | Id | Details | Estimated Rows | Rows |
+-------------------+----+------------------------------+----------------+---------+
| +ProduceResults | 0 | `count(*)` | 1 | 1 |
| | +----+------------------------------+----------------+---------+
| +EagerAggregation | 1 | count(*) AS `count(*)` | 1 | 1 |
| | +----+------------------------------+----------------+---------+
| +Filter | 2 | NOT anon_2 = anon_1 | 142322 | 1436402 |
| | +----+------------------------------+----------------+---------+
| +Expand(All) | 3 | (u)-[anon_1:POSTS]->(anon_0) | 143760 | 1437601 |
| | +----+------------------------------+----------------+---------+
| +Expand(All) | 4 | (u)-[anon_2:POSTS]->(anon_3) | 1199 | 1199 |
| | +----+------------------------------+----------------+---------+
| +NodeByLabelScan | 5 | u:Me | 10 | 1 |
+-------------------+----+------------------------------+----------------+---------+
Tweets make the difference
Why doe the plan change by adding the label? Because now we have the option of using a label scan on :Tweet
as a starting point, which seems attractive with the bad cardinality estimation.
I hope this clarified the situation again.
Have a nice weekend!
Regards,
Arne
from neo4j.
Thank you!
Have a nice weekend!
from neo4j.
Related Issues (20)
- Query causes Stackoverflow HOT 1
- Unexpected SyntaxError in `toBoolean โ all` HOT 1
- `genai.vector.encode` doesn't allow for changing OpenAI API endpoints HOT 1
- Result rows do not work as expected. HOT 6
- The apoc.cypher.runSchemaFile() never finishes execution
- Node matched too early in `UNION` on Enterprise
- Node deleted failed about path of unknown length HOT 1
- One possible performance issue HOT 3
- Wrongful execution of `ELSE` branch in `CASE` expression HOT 2
- UnknownError in a DuplicatedSlotKey HOT 1
- toInteger and toFloat can't handle strings with common thousands delimiters HOT 2
- Neo.ClientError.Statement.RuntimeUnsupportedError HOT 6
- BLOCK storage format between AuraDB and Community editions HOT 8
- Allow lists with arbitrary strings in CVS import by escaping array delimiter HOT 5
- Neo4j 5.19.0 was released April 12th, 2024 but none of the source is available on this repo. Is it still under GPL? HOT 1
- Critical CVE in container image HOT 1
- Neo4j cypher ๏ผquery directed relationships HOT 2
- Importing csv quote issues even with "" escaping HOT 1
- Version 5.19.0 and 5.20.0 releases HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neo4j.