Giter VIP home page Giter VIP logo

Comments (11)

jsteemann avatar jsteemann commented on May 28, 2024 2

For info, I created a PR that can reduce the number of plans generated by the optimizer rule "interchange-adjacent-enumerations" in some cases:
#20201
For the example query provided in this issue, the change would reduce the number of generated plans from 128 to 32. That should make plan optimization considerably faster than it is right now for such types of queries.
Maybe even more can be achieved by adjusting the optimizer even further. Will need to check.

from arangodb.

jsteemann avatar jsteemann commented on May 28, 2024 1

@matcho : I remember there were issues with index hints in some specific situations, which were fixed a while ago. I guess if you are running the queries on ArangoDB version 3.8, that the issues are still present. If the index hint issues still remain in recent versions of ArangoDB, they should be reported so we can fix them.
I think at least in theory an index hint could lead to the optimizer producing less plans, for example if multiple indexes could be used to execute the query, and the optimizer creates different plans for each candidate index.
I say "in theory" here because I don't know from the top of my head if this is what is actually happening. I think at least in some cases the optimizer will pick the index it finds most appropriate, and then goes ahead with it, without creating additional plans for other indexes. But this may be different in different optimizer rules.
It may also be that a different optimizer rules creates the additional plans here. That is not necessarily related to indexes and thus index hints.

from arangodb.

jsteemann avatar jsteemann commented on May 28, 2024 1

@matcho : thanks for reporting back.
I also checked our code and found that there are just 2 optimizer rules which will create additional plans: the one you mentioned and another one that can modify COLLECT statements (specialize-collect). The latter rule doesn't apply here.
I don't think that disabling the interchange-adjacent-enumerations optimizer rule globally is good, although it may help to limit the optimizer runtime of complex queries. There are at least some cases in which the rule can do good and can lead to better execution plans, although it is not guaranteed that it does. However, if it helps your specific case and you don't see any negative side effects of disabling the rule, why not. But this is not really future proof in case new queries are added or existing ones are changed. These may benefit from the optimizer rule running.
Maybe it would be a good idea to introduce additional limits or heuristics for plan creation. I need to look into it in more detail first though.

from arangodb.

jsteemann avatar jsteemann commented on May 28, 2024 1

As we don't want to change the defaults or behavior for the 3.11 stable branch too much, we will go ahead with the above change (#20201) only from 3.12 onwards. That change should reduce the number of query execution plans for this particular case and a few others.
Until then, it is still possible to adjust the maximum number of plans the optimizer generates, either by setting the per-query option (maxPlans) or by adjusting it globally via the startup option --query.optimizer-max-plans. There is also the option to turn off the optimizer rule "interchange-adjacent-enumerations" for certains queries or globally.
Though I won't recommend doing either of these things without testing the effects on query runtimes first.

from arangodb.

dothebart avatar dothebart commented on May 28, 2024

Hi,
the Est column of the table shows the number of documents the optimizer estimates to move from one node to the other, 857829800 at peak in your example.
If you use the later version, query profiling allows you to browse the actual figures of your query.
Resulting performance is a question of keeping these figures in range, and maybe optimize them by working with the query. Some of that may be available through the optimizer, some not.

from arangodb.

jsteemann avatar jsteemann commented on May 28, 2024

@matcho : It's a bit hard to tell from just the execution plan where the query spends most of its time.
Can you also attach the profile output of the query? The profile can be retrieved from the web UI or from the arangosh by running the command db._profileQuery({ query, bindVars }); in the respective database.
One thing that is striking from the explain output already is that the optimizer created 128 different query execution plans. This is the default maximum number of plans. So the optimizer may spend a lot of time trying to work through all the different execution plans, but this is currently a guess, because I don't know how much time was spent for query optimization and how much was spent for query execution. These numbers should also be included in the query profile, so hopefully it can shed at least a bit of light on the issue.

from arangodb.

matcho avatar matcho commented on May 28, 2024

Thank you very much Wilfried and Jan for your quick answers. @jsteemann here's the profile output. It seems that you're right : most of the execution time is consumed by "optimizing plan". Is there a workaround you could think of?

Query String (6748 chars - truncated..., cacheable: false):
 LET project_pn_key = "bdtgf_jfm_sync_v1"
     LET r = []
     
     FOR o IN observations
         FILTER o._key == '1003316854'
         LET taxo = (
             FOR t IN taxa
                 FILTER t.name == o.computed.current_name
                 FILTER project_pn_key IN t.sources
                 FOR sp IN t.species_list
                     FILTER sp.nameAccordingTo == project_pn_key
                     RETURN {
                         users_common_names: (
                             for ve in vernaculars
                                 filter ve.species == t.name
                                 filter ve.lang == 'fr'
                                 collect term = ve.term with count into c
                                 filter c > 1
                                 sort c desc
                                 return {term, c}
                         )
                     }
         )
         LET syno = (LENGTH(taxo)) ? [] : (
             FOR t in taxa
                 FILTER o.computed.cu...

Execution plan:
  Id   NodeType            Calls   Items   Runtime [s]   Comment
   1   SingletonNode           1       1       0.00001   * ROOT
 195   IndexNode               1       1       0.00019     - FOR o IN observations   /* primary index scan */    
 125   CalculationNode         1       1       0.00001       - LET #155 = o.`computed`.`votes`   /* attribute expression */   /* collections used: o : observations */
 231   SubqueryStartNode       1       2       0.00002       - LET partners_votes = ( /* subquery begin */
 126   EnumerateListNode       1       2       0.00002         - FOR v IN #155   /* list iteration */
 233   SubqueryStartNode       1       3       0.00001           - LET sp = ( /* subquery begin */
 206   IndexNode               1       3       0.00048             - FOR t IN taxa   /* hash index scan */    
 194   LimitNode               1       3       0.00001               - LIMIT 0, 1
 132   CalculationNode         1       3       0.00001               - LET #159 = t.`sources`   /* attribute expression */   /* collections used: t : taxa */
 237   SubqueryStartNode       1       4       0.00025               - LET #56 = ( /* subquery begin */
 133   EnumerateListNode       1      40       0.00004                 - FOR s IN #159   /* list iteration */
 208   IndexNode               1       4       0.00021                   - FOR p IN projects   /* hash index scan, projections: `_key` */    
 137   LimitNode               1       4       0.00000                     - LIMIT 0, 1
 193   LimitNode               1       4       0.00000                     - LIMIT 0, 1
 138   CalculationNode         1       4       0.00001                     - LET #163 = p.`_key`   /* attribute expression */   /* collections used: p : projects */
 238   SubqueryEndNode         1       3       0.00001                     - RETURN  #163 ) /* subquery end */
 235   SubqueryStartNode       1       4       0.00023               - LET #61 = ( /* subquery begin */
 207   IndexNode               1     100       0.00111                 - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
 147   CalculationNode         1     100       0.00004                   - LET #169 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
 148   CollectNode             1      13       0.00003                   - COLLECT term = #169 AGGREGATE c = LENGTH()   /* hash */
 149   CalculationNode         1      13       0.00001                   - LET #171 = (c > 1)   /* simple expression */
 150   FilterNode              1      11       0.00001                   - FILTER #171
 151   SortNode                1      11       0.00001                   - SORT c DESC   /* sorting strategy: standard */
 152   CalculationNode         1      11       0.00001                   - LET #173 = { "term" : term, "c" : c }   /* simple expression */
 236   SubqueryEndNode         1       3       0.00001                   - RETURN  #173 ) /* subquery end */
 155   CalculationNode         1       3       0.00001               - LET #175 = { "species" : { "project" : #56[0] }, "users_common_names" : #61 }   /* simple expression */
 234   SubqueryEndNode         1       2       0.00000               - RETURN  #175 ) /* subquery end */
 158   CalculationNode         1       2       0.00001           - LET #177 = { "obs_id" : o.`_key`, "val" : v.`name`, "sp" : sp[0], "plus" : v.`plus` }   /* simple expression */   /* collections used: o : observations */
 232   SubqueryEndNode         1       1       0.00001           - RETURN  #177 ) /* subquery end */
 162   CalculationNode         1       1       0.00000       - LET #179 = o.`computed`.`tags`   /* attribute expression */   /* collections used: o : observations */
 227   SubqueryStartNode       1       2       0.00000       - LET obs_tags = ( /* subquery begin */
 163   EnumerateListNode       1       1       0.00004         - FOR tag IN #179   /* list iteration */
 205   IndexNode               1       1       0.00005           - FOR tc IN tags_catalog   /* persistent index scan, projections: `public_display`, `target`, `values` */    FILTER ((tc.`target` == "observation") && (tc.`public_display` == true))   /* early pruning */   
 172   CalculationNode         1       1       0.00000             - LET #187 = tc.`values`   /* attribute expression */   /* collections used: tc : tags_catalog */
 229   SubqueryStartNode       1       1       0.00000             - LET #72 = ( /* subquery begin */
 173   FilterNode              1       1       0.00000               - FILTER #187
 175   EnumerateListNode       1       1       0.00006               - FOR tv IN #187   /* list iteration */
 176   CalculationNode         1       1       0.00001                 - LET #191 = (tv.`name` == tag.`value`)   /* simple expression */
 177   FilterNode              1       1       0.00000                 - FILTER #191
 190   LimitNode               1       1       0.00000                 - LIMIT 0, 1
 178   CalculationNode         1       1       0.00000                 - LET #193 = { "key" : tv.`name`, "label" : (tv.`description`.`fr` || tv.`description`.`en`) }   /* simple expression */
 230   SubqueryEndNode         1       1       0.00000                 - RETURN  #193 ) /* subquery end */
 181   CalculationNode         1       1       0.00000             - LET #195 = { "value" : #72[0] }   /* simple expression */
 228   SubqueryEndNode         1       1       0.00000             - RETURN  #195 ) /* subquery end */
 217   SubqueryStartNode       1       2       0.00000       - LET obs_votes = ( /* subquery begin */
 200   IndexNode               1       4       0.00018         - FOR v IN votes   /* persistent index scan */    
 225   SubqueryStartNode       1       7       0.00001           - LET #47 = ( /* subquery begin */
 204   IndexNode               2       7       0.00010             - FOR u IN users   /* primary index scan, projections: `email` */    
 191   LimitNode               2       7       0.00001               - LIMIT 0, 1
 118   CalculationNode         2       7       0.00001               - LET #151 = { "email" : u.`email` }   /* simple expression */   /* collections used: u : users */
 226   SubqueryEndNode         1       4       0.00005               - RETURN  #151 ) /* subquery end */
 219   SubqueryStartNode       1       7       0.00001           - LET #43 = ( /* subquery begin */
 201   IndexNode               1       5       0.00041             - FOR t IN taxa   /* hash index scan */    
  82   CalculationNode         1       5       0.00001               - LET #129 = t.`sources`   /* attribute expression */   /* collections used: t : taxa */
 223   SubqueryStartNode       1       6       0.00020               - LET #36 = ( /* subquery begin */
  83   EnumerateListNode       1      42       0.00005                 - FOR s IN #129   /* list iteration */
 203   IndexNode               1      43       0.00161                   - FOR p IN projects   /* hash index scan, projections: `_key`, `_pn_key`, `roles` */    
  85   CalculationNode         1      43       0.00003                     - LET #131 = ((p.`_pn_key` == s) && ((LENGTH(p.`roles`) == 0) || (LENGTH(p.`roles`[* FILTER false RETURN  #32_]) > 0)))   /* simple expression */   /* collections used: p : projects */
  86   FilterNode              1      41       0.00002                     - FILTER #131
  90   CalculationNode         1      41       0.00001                     - LET order = ((p.`_key` == "guyane") ? 0 : 1)   /* simple expression */   /* collections used: p : projects */
  91   SortNode                1       6       0.00002                     - SORT order ASC   /* sorting strategy: constrained heap */
  92   LimitNode               1       6       0.00001                     - LIMIT 0, 1
 192   LimitNode               1       6       0.00001                     - LIMIT 0, 1
  93   CalculationNode         1       6       0.00001                     - LET #135 = p.`_key`   /* attribute expression */   /* collections used: p : projects */
 224   SubqueryEndNode         1       5       0.00001                     - RETURN  #135 ) /* subquery end */
 221   SubqueryStartNode       1       6       0.00021               - LET #41 = ( /* subquery begin */
 202   IndexNode               1     102       0.00087                 - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
 102   CalculationNode         1     102       0.00003                   - LET #141 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
 103   CollectNode             1      15       0.00003                   - COLLECT term = #141 AGGREGATE c = LENGTH()   /* hash */
 104   CalculationNode         1      15       0.00001                   - LET #143 = (c > 1)   /* simple expression */
 105   FilterNode              1      13       0.00001                   - FILTER #143
 106   SortNode                1      13       0.00001                   - SORT c DESC   /* sorting strategy: standard */
 107   CalculationNode         1      13       0.00001                   - LET #145 = { "term" : term, "c" : c }   /* simple expression */
 222   SubqueryEndNode         1       5       0.00001                   - RETURN  #145 ) /* subquery end */
 110   CalculationNode         1       5       0.00001               - LET #147 = { "species" : { "project" : #36[0] }, "users_common_names" : #41 }   /* simple expression */
 220   SubqueryEndNode         1       4       0.00001               - RETURN  #147 ) /* subquery end */
 121   CalculationNode         1       4       0.00002           - LET #153 = { "obs_id" : v.`observation_id`, "img_id" : v.`image_id`, "user_id" : v.`user_id`, "user" : #47[0], "type" : v.`type`, "val" : v.`value`, "sp" : ((v.`type` == "determination") ? #43 : null)[0] }   /* simple expression */   /* collections used: v : votes */
 218   SubqueryEndNode         1       1       0.00001           - RETURN  #153 ) /* subquery end */
 213   SubqueryStartNode       1       2       0.00001       - LET #20 = ( /* subquery begin */
 198   IndexNode               1       1       0.00012         - FOR t IN taxa   /* hash index scan */    FILTER ("bdtgf_jfm_sync_v1" IN t.`sources`)   /* early pruning */   
 215   SubqueryStartNode       1       1       0.00001           - LET #18 = ( /* subquery begin */
 199   IndexNode               1       1       0.00005             - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
  52   CalculationNode         1       1       0.00000               - LET #113 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
  53   CollectNode             1       1       0.00001               - COLLECT term = #113 AGGREGATE c = LENGTH()   /* hash */
  54   CalculationNode         1       1       0.00001               - LET #115 = (c > 1)   /* simple expression */
  55   FilterNode              1       1       0.00000               - FILTER #115
  56   SortNode                1       1       0.00001               - SORT c DESC   /* sorting strategy: standard */
  57   CalculationNode         1       1       0.00000               - LET #117 = { "term" : term, "c" : c }   /* simple expression */
 216   SubqueryEndNode         1       1       0.00000               - RETURN  #117 ) /* subquery end */
  40   CalculationNode         1       1       0.00000           - LET #103 = t.`species_list`   /* attribute expression */   /* collections used: t : taxa */
  60   CalculationNode         1       1       0.00000           - LET #119 = { "users_common_names" : #18 }   /* simple expression */
  41   EnumerateListNode       1       1       0.00005           - FOR sp IN #103   /* list iteration */
  42   CalculationNode         1       1       0.00000             - LET #105 = ((sp.`nameAccordingTo` == "bdtgf_jfm_sync_v1") && (o.`computed`.`current_name` IN sp.`synonyms`))   /* simple expression */   /* collections used: o : observations */
  43   FilterNode              1       1       0.00000             - FILTER #105
 214   SubqueryEndNode         1       1       0.00000             - RETURN  #119 ) /* subquery end */
 209   SubqueryStartNode       1       2       0.00001       - LET taxo = ( /* subquery begin */
 196   IndexNode               1       1       0.00010         - FOR t IN taxa   /* hash index scan */    FILTER ("bdtgf_jfm_sync_v1" IN t.`sources`)   /* early pruning */   
 211   SubqueryStartNode       1       1       0.00000           - LET #8 = ( /* subquery begin */
 197   IndexNode               1       1       0.00006             - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
  23   CalculationNode         1       1       0.00000               - LET #91 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
  24   CollectNode             1       1       0.00001               - COLLECT term = #91 AGGREGATE c = LENGTH()   /* hash */
  25   CalculationNode         1       1       0.00000               - LET #93 = (c > 1)   /* simple expression */
  26   FilterNode              1       1       0.00000               - FILTER #93
  27   SortNode                1       1       0.00001               - SORT c DESC   /* sorting strategy: standard */
  28   CalculationNode         1       1       0.00000               - LET #95 = { "term" : term, "c" : c }   /* simple expression */
 212   SubqueryEndNode         1       1       0.00000               - RETURN  #95 ) /* subquery end */
  13   CalculationNode         1       1       0.00000           - LET #83 = t.`species_list`   /* attribute expression */   /* collections used: t : taxa */
  31   CalculationNode         1       1       0.00000           - LET #97 = { "users_common_names" : #8 }   /* simple expression */
  14   EnumerateListNode       1       1       0.00007           - FOR sp IN #83   /* list iteration */
  15   CalculationNode         1       1       0.00000             - LET #85 = (sp.`nameAccordingTo` == "bdtgf_jfm_sync_v1")   /* simple expression */
  16   FilterNode              1       1       0.00001             - FILTER #85
 210   SubqueryEndNode         1       1       0.00001             - RETURN  #97 ) /* subquery end */
  63   CalculationNode         1       1       0.00000       - LET syno = (LENGTH(taxo) ? [ ] : #20)   /* simple expression */
  64   CalculationNode         1       1       0.00001       - LET taxodata = (LENGTH(taxo) ? taxo[0] : syno[0])   /* simple expression */
 184   CalculationNode         1       1       0.00001       - LET #197 = { "taxo" : taxodata, "votes" : obs_votes, "partners_votes" : partners_votes, "obs_tags" : obs_tags }   /* simple expression */
 185   ReturnNode              1       1       0.00001       - RETURN #197

Indexes used:
  By   Name                      Type         Collection     Unique   Sparse   Selectivity   Fields                                                       Ranges
 195   primary                   primary      observations   true     false       100.00 %   [ `_key` ]                                                   (o.`_key` == "1003316854")
 206   idx_10219655942           hash         taxa           true     false       100.00 %   [ `name` ]                                                   (t.`name` == v.`name`)
 208   idx_49132                 hash         projects       false    false        97.06 %   [ `_pn_key` ]                                                (p.`_pn_key` == s)
 207   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))
 205   idx_1722945121587036160   persistent   tags_catalog   true     false       100.00 %   [ `name` ]                                                   (tc.`name` == tag.`name`)
 200   idx_1723108440400723968   persistent   votes          true     false       100.00 %   [ `observation_id`, `user_id`, `image_id`, `type`, `tag` ]   (v.`observation_id` == "1003316854")
 204   primary                   primary      users          true     false       100.00 %   [ `_key` ]                                                   (u.`_key` == v.`user_id`)
 201   idx_10219655942           hash         taxa           true     false       100.00 %   [ `name` ]                                                   (t.`name` == v.`value`)
 203   idx_49132                 hash         projects       false    false        97.06 %   [ `_pn_key` ]                                                (p.`_pn_key` == s)
 203   idx_49132                 hash         projects       false    false        97.06 %   [ `_pn_key` ]                                                (p.`_pn_key` == s)
 202   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))
 198   idx_49216                 hash         taxa           false    false       100.00 %   [ `synonyms[*]` ]                                            (o.`computed`.`current_name` IN t.`synonyms`)
 199   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))
 196   idx_10219655942           hash         taxa           true     false       100.00 %   [ `name` ]                                                   (t.`name` == o.`computed`.`current_name`)
 197   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   remove-redundant-calculations
  4   remove-unnecessary-calculations
  5   remove-redundant-sorts
  6   optimize-subqueries
  7   move-calculations-up-2
  8   move-filters-up-2
  9   propagate-constant-attributes
 10   use-indexes
 11   remove-filter-covered-by-index
 12   remove-unnecessary-calculations-2
 13   move-calculations-down
 14   fuse-filters
 15   sort-limit
 16   reduce-extraction-to-projection
 17   move-filters-into-enumerate
 18   splice-subqueries

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Peak Mem [b]   Exec Time [s]
           0            0           0          243          7       19365888         0.51132

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00035
 optimizing ast             0.00005
 loading collections        0.00002
 instantiating plan         0.00045
 optimizing plan            0.50284
 executing                  0.00760
 finalizing                 0.00017

from arangodb.

dothebart avatar dothebart commented on May 28, 2024

you can use i.e. index hints if you didn't already?

from arangodb.

jsteemann avatar jsteemann commented on May 28, 2024

@matcho : the number of execution plans can be limited globally by adjusting the server startup option --query.optimizer-max-plans. That startup option defaults to 128, which means that the optimizer is allowed to generated up to 128 query execution plans for each AQL query.
Adjusting the value of this option will affect all AQL queries.

Then there is also the option to override the maximum number of plans per query.
This can be done by passing the maxNumberOfPlans attribute when executing the query. In the arangosh, it would look like this (https://docs.arangodb.com/3.11/aql/how-to-invoke-aql/with-arangosh/#maxnumberofplans):

maxNumberOfPlans = 64; /* whatever value */
db._query(query, bindVars, { maxNumberOfPlans });

The option can also be passed via the HTTP API (https://docs.arangodb.com/3.11/develop/http-api/queries/aql-queries/#create-a-cursor_body_options_maxNumberOfPlans).

Potentially this option is also supported by client drivers, but I don't know which one is in use.

While adjusting the maximum number of plans option will at least lead to less plans being generated, the question remains why the optimizer creates that many plans in the first place.
The plan explosion must be caused by one or many of the applied optimizer rules

  1   move-calculations-up
  2   move-filters-up
  3   remove-redundant-calculations
  4   remove-unnecessary-calculations
  5   remove-redundant-sorts
  6   optimize-subqueries
  7   move-calculations-up-2
  8   move-filters-up-2
  9   propagate-constant-attributes
 10   use-indexes
 11   remove-filter-covered-by-index
 12   remove-unnecessary-calculations-2
 13   move-calculations-down
 14   fuse-filters
 15   sort-limit
 16   reduce-extraction-to-projection
 17   move-filters-into-enumerate
 18   splice-subqueries

Would be interesting to know which one it is, because maybe it can be tuned to generate less plans.

If you want to investigate, you can run the query with specific optimizer rules turned off, just to check if the number of plans goes down then.
With the arangosh, the way to do this would be:

ruleName = "fuse-filters"; /* whatever rule to exclude */
db._explain(query, bindVars, { optimizer: { rules: ["-" + ruleName] } });

Running that will explain the query with the specific rule turned off. It turning off any of the rules leads to substantially fewer plans being generated, then this is a good indicator. Note that turned off some of the rules will still be unwanted, e.g. "use-indexes". While turning off that rule may lead to fewer execution plans, it may also mean that indexes will be used in fewer cases, which can have detrimental effcets on the query runtime later.
So I suggest to turn off optimizer rules only for finding out which rules lead to the plan explosion.

from arangodb.

matcho avatar matcho commented on May 28, 2024

@dothebart do you mean that unsing indexHints could help the optimizer generate less plans? I tried to add an indexHint to every collection in the query, but the execution time did not change. Also tried adding forceIndexHint and it changes nothing. By the way, the indexHint was refused for 2 collections in the query (projects line 62 and tags_catalog line 139), although when running explain it clearly shows that those 2 indexes will be used anyway… strange.

from arangodb.

matcho avatar matcho commented on May 28, 2024

@jsteemann

First, limiting the number of plans indeed works:

db._query(q1, {}, { maxNumberOfPlans: 64 }) → 297ms
db._query(q1, {}, { maxNumberOfPlans: 32 }) → 123ms
db._query(q1, {}, { maxNumberOfPlans: 16 }) → 88ms
db._query(q1, {}, { maxNumberOfPlans: 1 }) → 20ms

Query result is valid in all cases above, even with 1 plan max.

Second, after investigating as you suggested, it appears that the rule that leads to a huge number of plans is interchange-adjacent-enumerations (found it in "Optimization rules with highest execution times" section of explain output). After disabling it, number of plans drops to 16 and query execution time to 70ms (explain below).

From the documentation :

interchange-adjacent-enumerations: Try out permutations of FOR statements in queries that contain multiple loops, which may enable further optimizations by other rules.

Makes sense that a higher number of loops leads to an exploding combination of permutations.

This is great 🥳 thank you very much !
Do you think that disabling interchange-adjacent-enumerations server-wide could have a negative impact on certain queries ?

PS: indexHint issue is still present in 3.11.4, will try to report it properly soon.

127.0.0.1:8529@plantnetv2> db._explain(q1, {}, { optimizer: { rules: [ "-interchange-adjacent-enumerations" ] } });
Query String (2797 chars, cacheable: true):
 LET project_pn_key = 'bdtgf_jfm_sync_v1' LET r = [] FOR o IN observations FILTER o._key == 
 '1003316854' LET taxo = ( FOR t IN taxa FILTER t.name == o.computed.current_name FILTER 
 project_pn_key IN t.sources FOR sp IN t.species_list FILTER sp.nameAccordingTo == project_pn_key 
 RETURN { users_common_names: ( for ve in vernaculars filter ve.species == t.name filter ve.lang == 
 'fr' collect term = ve.term with count into c filter c > 1 sort c desc return {term, c} ) } ) LET 
 syno = (LENGTH(taxo)) ? [] : ( FOR t in taxa FILTER o.computed.current_name IN t.synonyms FILTER 
 project_pn_key IN t.sources FOR sp IN t.species_list FILTER sp.nameAccordingTo == project_pn_key 
 FILTER o.computed.current_name IN sp.synonyms RETURN { users_common_names: ( for ve in vernaculars 
 filter ve.species == t.name filter ve.lang == 'fr' collect term = ve.term with count into c filter c 
 > 1 sort c desc return {term, c} ) } ) LET taxodata = (LENGTH(taxo) ? taxo[0] : syno[0]) LET 
 obs_votes = ( FOR v IN votes FILTER v.observ...

Execution plan:
  Id   NodeType                 Est.   Comment
   1   SingletonNode               1   * ROOT
 195   IndexNode                   1     - FOR o IN observations   /* primary index scan */    
 125   CalculationNode             1       - LET #155 = o.`computed`.`votes`   /* attribute expression */   /* collections used: o : observations */
 231   SubqueryStartNode           1       - LET partners_votes = ( /* subquery begin */
 126   EnumerateListNode         100         - FOR v IN #155   /* list iteration */
 233   SubqueryStartNode         100           - LET sp = ( /* subquery begin */
 206   IndexNode                 100             - FOR t IN taxa   /* hash index scan */    
 191   LimitNode                   1               - LIMIT 0, 1
 132   CalculationNode             1               - LET #159 = t.`sources`   /* attribute expression */   /* collections used: t : taxa */
 237   SubqueryStartNode           1               - LET #56 = ( /* subquery begin */
 133   EnumerateListNode         100                 - FOR s IN #159   /* list iteration */
 208   IndexNode                 100                   - FOR p IN projects   /* hash index scan, projections: `_key` */    
 137   LimitNode                   1                     - LIMIT 0, 1
 190   LimitNode                   1                     - LIMIT 0, 1
 138   CalculationNode             1                     - LET #163 = p.`_key`   /* attribute expression */   /* collections used: p : projects */
 238   SubqueryEndNode             1                     - RETURN  #163 ) /* subquery end */
 235   SubqueryStartNode           1               - LET #61 = ( /* subquery begin */
 207   IndexNode                   2                 - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
 147   CalculationNode             2                   - LET #169 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
 148   CollectNode                 2                   - COLLECT term = #169 AGGREGATE c = LENGTH()   /* hash */
 149   CalculationNode             2                   - LET #171 = (c > 1)   /* simple expression */
 150   FilterNode                  2                   - FILTER #171
 151   SortNode                    2                   - SORT c DESC   /* sorting strategy: standard */
 152   CalculationNode             2                   - LET #173 = { "term" : term, "c" : c }   /* simple expression */
 236   SubqueryEndNode             1                   - RETURN  #173 ) /* subquery end */
 155   CalculationNode             1               - LET #175 = { "species" : { "project" : #56[0] }, "users_common_names" : #61 }   /* simple expression */
 234   SubqueryEndNode           100               - RETURN  #175 ) /* subquery end */
 158   CalculationNode           100           - LET #177 = { "obs_id" : o.`_key`, "val" : v.`name`, "sp" : sp[0], "plus" : v.`plus` }   /* simple expression */   /* collections used: o : observations */
 232   SubqueryEndNode             1           - RETURN  #177 ) /* subquery end */
 162   CalculationNode             1       - LET #179 = o.`computed`.`tags`   /* attribute expression */   /* collections used: o : observations */
 227   SubqueryStartNode           1       - LET obs_tags = ( /* subquery begin */
 163   EnumerateListNode         100         - FOR tag IN #179   /* list iteration */
 205   IndexNode                 100           - FOR tc IN tags_catalog   /* persistent index scan, projections: `public_display`, `target`, `values` */    FILTER ((tc.`target` == "observation") && (tc.`public_display` == true))   /* early pruning */   
 172   CalculationNode           100             - LET #187 = tc.`values`   /* attribute expression */   /* collections used: tc : tags_catalog */
 229   SubqueryStartNode         100             - LET #72 = ( /* subquery begin */
 173   FilterNode                100               - FILTER #187
 175   EnumerateListNode       10000               - FOR tv IN #187   /* list iteration */
 176   CalculationNode         10000                 - LET #191 = (tv.`name` == tag.`value`)   /* simple expression */
 177   FilterNode              10000                 - FILTER #191
 192   LimitNode                   1                 - LIMIT 0, 1
 178   CalculationNode             1                 - LET #193 = { "key" : tv.`name`, "label" : (tv.`description`.`fr` || tv.`description`.`en`) }   /* simple expression */
 230   SubqueryEndNode           100                 - RETURN  #193 ) /* subquery end */
 181   CalculationNode           100             - LET #195 = { "value" : #72[0] }   /* simple expression */
 228   SubqueryEndNode             1             - RETURN  #195 ) /* subquery end */
 217   SubqueryStartNode           1       - LET obs_votes = ( /* subquery begin */
 200   IndexNode             4291240         - FOR v IN votes   /* persistent index scan */    
 225   SubqueryStartNode     4291240           - LET #47 = ( /* subquery begin */
 204   IndexNode             4291240             - FOR u IN users   /* primary index scan, projections: `email` */    
 193   LimitNode                   1               - LIMIT 0, 1
 118   CalculationNode             1               - LET #151 = { "email" : u.`email` }   /* simple expression */   /* collections used: u : users */
 226   SubqueryEndNode       4291240               - RETURN  #151 ) /* subquery end */
 219   SubqueryStartNode     4291240           - LET #43 = ( /* subquery begin */
 201   IndexNode             4291240             - FOR t IN taxa   /* hash index scan */    
  82   CalculationNode       4291240               - LET #129 = t.`sources`   /* attribute expression */   /* collections used: t : taxa */
 223   SubqueryStartNode     4291240               - LET #36 = ( /* subquery begin */
  83   EnumerateListNode   429124000                 - FOR s IN #129   /* list iteration */
 203   IndexNode           858248000                   - FOR p IN projects   /* hash index scan, projections: `_key`, `_pn_key`, `roles` */    
  85   CalculationNode     858248000                     - LET #131 = ((p.`_pn_key` == s) && ((LENGTH(p.`roles`) == 0) || (LENGTH(p.`roles`[* FILTER false RETURN  #32_]) > 0)))   /* simple expression */   /* collections used: p : projects */
  86   FilterNode          858248000                     - FILTER #131
  90   CalculationNode     858248000                     - LET order = ((p.`_key` == "guyane") ? 0 : 1)   /* simple expression */   /* collections used: p : projects */
  91   SortNode            858248000                     - SORT order ASC   /* sorting strategy: constrained heap */
  92   LimitNode                   1                     - LIMIT 0, 1
 194   LimitNode                   1                     - LIMIT 0, 1
  93   CalculationNode             1                     - LET #135 = p.`_key`   /* attribute expression */   /* collections used: p : projects */
 224   SubqueryEndNode       4291240                     - RETURN  #135 ) /* subquery end */
 221   SubqueryStartNode     4291240               - LET #41 = ( /* subquery begin */
 202   IndexNode             8582480                 - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
 102   CalculationNode       8582480                   - LET #141 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
 103   CollectNode           6865984                   - COLLECT term = #141 AGGREGATE c = LENGTH()   /* hash */
 104   CalculationNode       6865984                   - LET #143 = (c > 1)   /* simple expression */
 105   FilterNode            6865984                   - FILTER #143
 106   SortNode              6865984                   - SORT c DESC   /* sorting strategy: standard */
 107   CalculationNode       6865984                   - LET #145 = { "term" : term, "c" : c }   /* simple expression */
 222   SubqueryEndNode       4291240                   - RETURN  #145 ) /* subquery end */
 110   CalculationNode       4291240               - LET #147 = { "species" : { "project" : #36[0] }, "users_common_names" : #41 }   /* simple expression */
 220   SubqueryEndNode       4291240               - RETURN  #147 ) /* subquery end */
 121   CalculationNode       4291240           - LET #153 = { "obs_id" : v.`observation_id`, "img_id" : v.`image_id`, "user_id" : v.`user_id`, "user" : #47[0], "type" : v.`type`, "val" : v.`value`, "sp" : ((v.`type` == "determination") ? #43 : null)[0] }   /* simple expression */   /* collections used: v : votes */
 218   SubqueryEndNode             1           - RETURN  #153 ) /* subquery end */
 213   SubqueryStartNode           1       - LET #20 = ( /* subquery begin */
 198   IndexNode                   1         - FOR t IN taxa   /* hash index scan */    FILTER ("bdtgf_jfm_sync_v1" IN t.`sources`)   /* early pruning */   
 215   SubqueryStartNode           1           - LET #18 = ( /* subquery begin */
 199   IndexNode                   2             - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
  52   CalculationNode             2               - LET #113 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
  53   CollectNode                 2               - COLLECT term = #113 AGGREGATE c = LENGTH()   /* hash */
  54   CalculationNode             2               - LET #115 = (c > 1)   /* simple expression */
  55   FilterNode                  2               - FILTER #115
  56   SortNode                    2               - SORT c DESC   /* sorting strategy: standard */
  57   CalculationNode             2               - LET #117 = { "term" : term, "c" : c }   /* simple expression */
 216   SubqueryEndNode             1               - RETURN  #117 ) /* subquery end */
  40   CalculationNode             1           - LET #103 = t.`species_list`   /* attribute expression */   /* collections used: t : taxa */
  60   CalculationNode             1           - LET #119 = { "users_common_names" : #18 }   /* simple expression */
  41   EnumerateListNode         100           - FOR sp IN #103   /* list iteration */
  42   CalculationNode           100             - LET #105 = ((sp.`nameAccordingTo` == "bdtgf_jfm_sync_v1") && (o.`computed`.`current_name` IN sp.`synonyms`))   /* simple expression */   /* collections used: o : observations */
  43   FilterNode                100             - FILTER #105
 214   SubqueryEndNode             1             - RETURN  #119 ) /* subquery end */
 209   SubqueryStartNode           1       - LET taxo = ( /* subquery begin */
 196   IndexNode                   1         - FOR t IN taxa   /* hash index scan */    FILTER ("bdtgf_jfm_sync_v1" IN t.`sources`)   /* early pruning */   
 211   SubqueryStartNode           1           - LET #8 = ( /* subquery begin */
 197   IndexNode                   2             - FOR ve IN vernaculars   /* hash index scan, projections: `term` */    
  23   CalculationNode             2               - LET #91 = ve.`term`   /* attribute expression */   /* collections used: ve : vernaculars */
  24   CollectNode                 2               - COLLECT term = #91 AGGREGATE c = LENGTH()   /* hash */
  25   CalculationNode             2               - LET #93 = (c > 1)   /* simple expression */
  26   FilterNode                  2               - FILTER #93
  27   SortNode                    2               - SORT c DESC   /* sorting strategy: standard */
  28   CalculationNode             2               - LET #95 = { "term" : term, "c" : c }   /* simple expression */
 212   SubqueryEndNode             1               - RETURN  #95 ) /* subquery end */
  13   CalculationNode             1           - LET #83 = t.`species_list`   /* attribute expression */   /* collections used: t : taxa */
  31   CalculationNode             1           - LET #97 = { "users_common_names" : #8 }   /* simple expression */
  14   EnumerateListNode         100           - FOR sp IN #83   /* list iteration */
  15   CalculationNode           100             - LET #85 = (sp.`nameAccordingTo` == "bdtgf_jfm_sync_v1")   /* simple expression */
  16   FilterNode                100             - FILTER #85
 210   SubqueryEndNode             1             - RETURN  #97 ) /* subquery end */
  63   CalculationNode             1       - LET syno = (LENGTH(taxo) ? [ ] : #20)   /* simple expression */
  64   CalculationNode             1       - LET taxodata = (LENGTH(taxo) ? taxo[0] : syno[0])   /* simple expression */
 184   CalculationNode             1       - LET #197 = { "taxo" : taxodata, "votes" : obs_votes, "partners_votes" : partners_votes, "obs_tags" : obs_tags }   /* simple expression */
 185   ReturnNode                  1       - RETURN #197

Indexes used:
  By   Name                      Type         Collection     Unique   Sparse   Selectivity   Fields                                                       Ranges
 195   primary                   primary      observations   true     false       100.00 %   [ `_key` ]                                                   (o.`_key` == "1003316854")
 206   idx_10219655942           hash         taxa           true     false       100.00 %   [ `name` ]                                                   (t.`name` == v.`name`)
 208   idx_49132                 hash         projects       false    false        97.06 %   [ `_pn_key` ]                                                (p.`_pn_key` == s)
 207   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))
 205   idx_1722945121587036160   persistent   tags_catalog   true     false       100.00 %   [ `name` ]                                                   (tc.`name` == tag.`name`)
 200   idx_1723108440400723968   persistent   votes          true     false       100.00 %   [ `observation_id`, `user_id`, `image_id`, `type`, `tag` ]   (v.`observation_id` == "1003316854")
 204   primary                   primary      users          true     false       100.00 %   [ `_key` ]                                                   (u.`_key` == v.`user_id`)
 201   idx_10219655942           hash         taxa           true     false       100.00 %   [ `name` ]                                                   (t.`name` == v.`value`)
 203   idx_49132                 hash         projects       false    false        97.06 %   [ `_pn_key` ]                                                (p.`_pn_key` == s)
 203   idx_49132                 hash         projects       false    false        97.06 %   [ `_pn_key` ]                                                (p.`_pn_key` == s)
 202   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))
 198   idx_49216                 hash         taxa           false    false       100.00 %   [ `synonyms[*]` ]                                            (o.`computed`.`current_name` IN t.`synonyms`)
 199   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))
 196   idx_10219655942           hash         taxa           true     false       100.00 %   [ `name` ]                                                   (t.`name` == o.`computed`.`current_name`)
 197   idx_10856933578           hash         vernaculars    false    false        46.39 %   [ `species`, `lang` ]                                        ((ve.`species` == t.`name`) && (ve.`lang` == "fr"))

Functions used:
 Name     Deterministic   Cacheable   Uses V8
 LENGTH   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   remove-redundant-calculations
  4   remove-unnecessary-calculations
  5   remove-redundant-sorts
  6   optimize-subqueries
  7   move-calculations-up-2
  8   move-filters-up-2
  9   propagate-constant-attributes
 10   use-indexes
 11   remove-filter-covered-by-index
 12   remove-unnecessary-calculations-2
 13   move-calculations-down
 14   fuse-filters
 15   sort-limit
 16   reduce-extraction-to-projection
 17   move-filters-into-enumerate
 18   splice-subqueries

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 specialize-collect                               0.00893
 use-indexes                                      0.00507
 splice-subqueries                                0.00465
 move-calculations-up-2                           0.00458
 move-calculations-down                           0.00396

539 rule(s) executed, 16 plan(s) created

from arangodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.