Comments (8)
@son2408 Would you be available for a call with myself and one of our Solution Architects so that we can review this with you in more detail?
from arangodb.
What are strengths of ArangoDB? with a simple graph from a nodes to millions edges takes 6.1s, compare with solr take 95ms.
Query
for v,e, path in 0..1 outbound 'nodes/2' edges
//filter e._to == 'nodes/td_0'
return v._key
Profile
Query String (93 chars, cacheable: false):
for v,e, path in 0..1 outbound 'nodes/2' edges
//filter e._to == 'nodes/td_0'
return v._key
Execution plan:
Id NodeType Calls Par Items Filtered Runtime [s] Comment
1 SingletonNode 1 - 1 0 0.00000 * ROOT
2 TraversalNode 1005 1004 1004011 0 6.07071 - FOR v /* vertex (projections: `_key`) */ IN 0..1 /* min..maxPathDepth */ OUTBOUND 'nodes/2' /* startnode */ edges /* order: dfs */
3 CalculationNode 1005 0 1004011 0 0.05797 - LET #4 = v.`_key` /* attribute expression */
4 ReturnNode 1005 - 1004011 0 0.00389 - RETURN #4
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
2 edge edge edges false false false 0.00 % [ `_from` ] [ ] base OUTBOUND
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
2 0..1 edges uniqueVertices: none, uniqueEdges: path, order: dfs
Optimization rules applied:
Id Rule Name Id Rule Name
1 optimize-traversals 2 async-prefetch
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 2008021 1 / 0 0 108199936 6.11645
Query Profile:
Query Stage Duration [s] Query Stage Duration [s] Query Stage Duration [s]
initializing 0.00000 loading collections 0.00001 instantiating executors 0.00004
parsing 0.00004 instantiating plan 0.00002 executing 6.11624
optimizing ast 0.00000 optimizing plan 0.00010 finalizing 0.00003
from arangodb.
Consider the following original query
FOR v, e, path IN 0..1 outbound 'nodes/2' edges
FILTER e._to == 'nodes/td_0'
RETURN v._key
This is a traversal that looks for all outgoing edges that have a _from
value of nodes/2
. In your example that happen to be around 1M edges. There is also an additional filter condition that filters on the _to
value of the connected edge. That brings down the total number of results to just 1.
How this query is executed is that, because it is an outbound traversal with a given start node, it will start with the index on the _from
attribute of the edges
collection and will look in the index for all edges that have a _from
value of nodes/2
. That index however has an abysmal selectivity, because most/all _from
values are identical. So it will find the 1M documents with that same _from
value and will then post-filter on the correct _to
value.
The index on the _to
attribute of the edges
collection would be much more selective here, I guess.
But as the query is a traversal with direction "outbound", it will always look for an index that contains the _from
attribute and will look up all matching edges for the given start vertex first.
If the query is rewritten in a way such that an arbitrary index can be used, then it should execute in less than a millisecond
FOR e IN edges
FILTER e._to == 'nodes/td_0' && e._from == 'nodes/2'
RETURN SUBSTRING(e._to, FIND_FIRST(e._to, '/') + 1)
Note that this query does not even need to look at the vertex value, simply because the _key
attribute of the vertex can be extract from the _to
value of the connected edge.
Of course it is possible to join the vertex collection later should it actually be needed. But if only the _key
is needed, it can be assumed from just extracting the key part of the _to
attribute.
from arangodb.
@jsteemann I cant filter as your suggest because I graph with max deep.
from arangodb.
for v,e in 0..5 outbound 'nodes/2' edges
filter e._to == 'nodes/td_0'
return v._key
from arangodb.
@jsteeman how to guide me the best way to optimize the graph with max deep. For detail, a node graph with max deept until final nodes. Count edges about 5 millions
from arangodb.
hi @Redmen2679 I work usecase to authorized users on tree data.
For detail, I have tree data as image. Now I authorized a user to parent path /data then user see all data in childern path of /data path. I want to filter by a contion in this all data.
Query
FOR v,e, p IN 0..3 OUTBOUND 'node/data'
FILTER v.Name == 'node_1_4'
RETURN v
Profile
Query String (88 chars, cacheable: false):
for v,e,p in 0..3 outbound 'nodes/data' edges
FILTER v.Name == 'node_1_4'
RETURN v
Execution plan:
Id NodeType Calls Par Items Filtered Runtime [s] Comment
1 SingletonNode 1 - 1 0 0.00000 * ROOT
2 TraversalNode 1 0 0 2000004 49.08274 - FOR v /* vertex */ IN 0..3 /* min..maxPathDepth */ OUTBOUND 'nodes/data' /* startnode */ edges /* order: dfs */
5 ReturnNode 1 - 0 0 0.00000 - RETURN v
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
2 edge edge edges false false false 0.00 % [ `_from` ] [ ] base OUTBOUND
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
2 0..3 edges uniqueVertices: none, uniqueEdges: path, order: dfs FILTER (v.`Name` == "node_1_4")
Optimization rules applied:
Id Rule Name Id Rule Name
1 optimize-traversals 3 remove-unnecessary-calculations-2
2 remove-filter-covered-by-traversal 4 async-prefetch
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 4000007 26 / 1999978 2000004 207912960 49.08297
Query Profile:
Query Stage Duration [s] Query Stage Duration [s] Query Stage Duration [s]
initializing 0.00000 loading collections 0.00001 instantiating executors 0.00003
parsing 0.00004 instantiating plan 0.00002 executing 49.08279
optimizing ast 0.00001 optimizing plan 0.00011 finalizing 0.00002
from arangodb.
I want to know testcase ArangoDB can travsersal max edges in a second? I see Arango traversal large number of edges about 1 ,2M too slow as the above example. Do you have a tip to improve speedup graph ? @jsteemann @Redmen2679.
from arangodb.
Related Issues (20)
- Graph traversal 1million edges too slowly HOT 5
- REGEX_REPLACE character classes not matching correctly HOT 3
- PRUNE GRAPH NOT WORK WHEN USE CONDITION USES NEGATION HOT 2
- How to determine if the query was served by a specific shard using AQL Explain? HOT 1
- Performance graph data with filter too slowly HOT 12
- Arangodb not getting deployed on AWS EKS cluster though worked on local K8s cluster HOT 8
- CentOS / RHEL installs of ArangoDB failing HOT 1
- Does arangodb support IPV6 if the dual-stack arangodb deployed in a cluster cannot be started in a dual-stack K8s environment and no obvious error is reported in the debug logs?
- GCC 13.2.0 build devel error HOT 2
- Vulnerability runc CVE-2024-21626 in ArangoDB HOT 1
- Cluster RemoteNode slowdowns query by 50 times comparing to single instance HOT 3
- An integer field cannot be incremented HOT 2
- FATAL [c81f6] {general} failed to bind to endpoint 'http+tcp://[::]:8529'. Please check whether another instance is already running using this endpoint and review your endpoints configuration. HOT 1
- Guix package HOT 1
- After updating to version 3.11.7 of Arango BD, the service is constantly restarting HOT 2
- Graph traversal with cluster from list nodes very slow HOT 1
- GEO_CONTAINS produces false results (inconsistent right-hand rule) HOT 2
- Execution time inconsistent with the use of an index with sorting
- What next after Pregel removal?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arangodb.