My Environment ArangoDB Version : 3.1

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Consider the following original query <div class="snippet-clipboard-content notran

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Algorithmics Graph travsersal ArangoDB with FILTER too slowly about arangodb HOT 8 OPEN

son2408 commented on June 20, 2024

Algorithmics Graph travsersal ArangoDB with FILTER too slowly

from arangodb.

Comments (8)

Redmen2679 commented on June 20, 2024 1

@son2408 Would you be available for a call with myself and one of our Solution Architects so that we can review this with you in more detail?

from arangodb.

son2408 commented on June 20, 2024

What are strengths of ArangoDB? with a simple graph from a nodes to millions edges takes 6.1s, compare with solr take 95ms.
Query

for v,e, path in 0..1 outbound 'nodes/2' edges
  //filter e._to == 'nodes/td_0'
return v._key

Profile

Query String (93 chars, cacheable: false):
 for v,e, path in 0..1 outbound 'nodes/2' edges
   //filter e._to == 'nodes/td_0'
 return v._key

Execution plan:
 Id   NodeType          Calls    Par     Items   Filtered   Runtime [s]   Comment
  1   SingletonNode         1      -         1          0       0.00000   * ROOT
  2   TraversalNode      1005   1004   1004011          0       6.07071     - FOR v  /* vertex (projections: `_key`) */ IN 0..1  /* min..maxPathDepth */ OUTBOUND 'nodes/2' /* startnode */  edges /* order: dfs */
  3   CalculationNode    1005      0   1004011          0       0.05797       - LET #4 = v.`_key`   /* attribute expression */
  4   ReturnNode         1005      -   1004011          0       0.00389       - RETURN #4

Indexes used:
 By   Name   Type   Collection   Unique   Sparse   Cache   Selectivity   Fields        Stored values   Ranges
  2   edge   edge   edges        false    false    false        0.00 %   [ `_from` ]   [  ]            base OUTBOUND

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                              Filter / Prune Conditions
 2   0..1                       edges             uniqueVertices: none, uniqueEdges: path, order: dfs  

Optimization rules applied:
 Id   Rule Name                   Id   Rule Name          
  1   optimize-traversals          2   async-prefetch     

Query Statistics:
 Writes Exec      Writes Ign      Doc. Lookups      Scan Full      Scan Index      Cache Hits/Misses      Filtered      Peak Mem [b]      Exec Time [s]
           0               0                 0              0         2008021                  1 / 0             0         108199936            6.11645

Query Profile:
 Query Stage                Duration [s]         Query Stage                Duration [s]         Query Stage                Duration [s]
 initializing                    0.00000         loading collections             0.00001         instantiating executors         0.00004
 parsing                         0.00004         instantiating plan              0.00002         executing                       6.11624
 optimizing ast                  0.00000         optimizing plan                 0.00010         finalizing                      0.00003

from arangodb.

jsteemann commented on June 20, 2024

Consider the following original query

FOR v, e, path IN 0..1 outbound 'nodes/2' edges
  FILTER e._to == 'nodes/td_0'
  RETURN v._key

This is a traversal that looks for all outgoing edges that have a _from value of nodes/2. In your example that happen to be around 1M edges. There is also an additional filter condition that filters on the _to value of the connected edge. That brings down the total number of results to just 1.
How this query is executed is that, because it is an outbound traversal with a given start node, it will start with the index on the _from attribute of the edges collection and will look in the index for all edges that have a _from value of nodes/2. That index however has an abysmal selectivity, because most/all _from values are identical. So it will find the 1M documents with that same _from value and will then post-filter on the correct _to value.

The index on the _to attribute of the edges collection would be much more selective here, I guess.
But as the query is a traversal with direction "outbound", it will always look for an index that contains the _from attribute and will look up all matching edges for the given start vertex first.

If the query is rewritten in a way such that an arbitrary index can be used, then it should execute in less than a millisecond

FOR e IN edges
  FILTER e._to == 'nodes/td_0' && e._from == 'nodes/2'
  RETURN SUBSTRING(e._to, FIND_FIRST(e._to, '/') + 1)

Note that this query does not even need to look at the vertex value, simply because the _key attribute of the vertex can be extract from the _to value of the connected edge.
Of course it is possible to join the vertex collection later should it actually be needed. But if only the _key is needed, it can be assumed from just extracting the key part of the _to attribute.

from arangodb.

son2408 commented on June 20, 2024

@jsteemann I cant filter as your suggest because I graph with max deep.

from arangodb.

son2408 commented on June 20, 2024

for v,e in 0..5 outbound 'nodes/2' edges
filter e._to == 'nodes/td_0'
return v._key

from arangodb.

son2408 commented on June 20, 2024

@jsteeman how to guide me the best way to optimize the graph with max deep. For detail, a node graph with max deept until final nodes. Count edges about 5 millions

from arangodb.

son2408 commented on June 20, 2024

hi @Redmen2679 I work usecase to authorized users on tree data.

For detail, I have tree data as image. Now I authorized a user to parent path /data then user see all data in childern path of /data path. I want to filter by a contion in this all data.
Query

FOR v,e, p IN 0..3 OUTBOUND 'node/data'
    FILTER v.Name == 'node_1_4'
RETURN v

Profile

Query String (88 chars, cacheable: false):
 for v,e,p in 0..3 outbound 'nodes/data' edges
     FILTER v.Name == 'node_1_4'
 RETURN v

Execution plan:
 Id   NodeType        Calls   Par   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode       1     -       1          0       0.00000   * ROOT
  2   TraversalNode       1     0       0    2000004      49.08274     - FOR v  /* vertex */ IN 0..3  /* min..maxPathDepth */ OUTBOUND 'nodes/data' /* startnode */  edges /* order: dfs */
  5   ReturnNode          1     -       0          0       0.00000       - RETURN v

Indexes used:
 By   Name   Type   Collection   Unique   Sparse   Cache   Selectivity   Fields        Stored values   Ranges
  2   edge   edge   edges        false    false    false        0.00 %   [ `_from` ]   [  ]            base OUTBOUND

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                              Filter / Prune Conditions
 2   0..3                       edges             uniqueVertices: none, uniqueEdges: path, order: dfs  FILTER (v.`Name` == "node_1_4")

Optimization rules applied:
 Id   Rule Name                                  Id   Rule Name                         
  1   optimize-traversals                         3   remove-unnecessary-calculations-2 
  2   remove-filter-covered-by-traversal          4   async-prefetch                    

Query Statistics:
 Writes Exec      Writes Ign      Doc. Lookups      Scan Full      Scan Index      Cache Hits/Misses      Filtered      Peak Mem [b]      Exec Time [s]
           0               0                 0              0         4000007           26 / 1999978       2000004         207912960           49.08297

Query Profile:
 Query Stage                Duration [s]         Query Stage                Duration [s]         Query Stage                Duration [s]
 initializing                    0.00000         loading collections             0.00001         instantiating executors         0.00003
 parsing                         0.00004         instantiating plan              0.00002         executing                      49.08279
 optimizing ast                  0.00001         optimizing plan                 0.00011         finalizing                      0.00002

from arangodb.

son2408 commented on June 20, 2024

I want to know testcase ArangoDB can travsersal max edges in a second? I see Arango traversal large number of edges about 1 ,2M too slow as the above example. Do you have a tip to improve speedup graph ? @jsteemann @Redmen2679.

from arangodb.

Algorithmics Graph travsersal ArangoDB with FILTER too slowly about arangodb HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent