If you need to analyze the root cause of a query's failure to match some document, you

Add a root failure cause to Explanation [LUCENE-8019] about stargazers-migration-test HOT 16 CLOSED

mikemccand commented on June 3, 2024

Add a root failure cause to Explanation [LUCENE-8019]

from stargazers-migration-test.

Comments (16)

mikemccand commented on June 3, 2024

I don't like the idea of boolean query's logic being shoved into the explanation class. This should not be in explanation, it should be generic.

I don't think explanation class should be complicated by matching/non-matching stuff. If we want to improve the booleanweight impl (such as the strings it puts in there), that is one thing, but its not the responsibility of explanation to do this.

explain() is supposed to be about scoring, not matching:

Returns an Explanation that describes how doc scored against query.

We should keep it focused on that, without adding a lot of complexity related to "matching" which is not its job. it already has enough challenges trying to explain scores...

[Legacy Jira: Robert Muir (@rmuir) on Oct 26 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

The challenge with doing this by modifying booleanweight is that it really needs to be threaded all up and down the tree to be useful. To go that route, I think we'd have to add methods to weight?

[Legacy Jira: Michael Sokolov (@msokolov) on Oct 26 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

Weight already has such a method: Scorer.getChildren

[Legacy Jira: Robert Muir (@rmuir) on Oct 26 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

Ah, OK fair enough. Perhaps it's better to to implement this as a separate Weight visitor. I'll look at that.

[Legacy Jira: Michael Sokolov (@msokolov) on Oct 26 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

Yes perhaps we could instead add sugar to IndexSearcher to "debug" a query via getChildren? If we are going to keep the getChildren api, why not at least allow it to help us :). Scorer visitors already record the relationship in the graph and similar to explain() you can simply advance() a scorer to a doc and then visit the scorer tree to maybe assist in debugging?

Because you can always climb up a Scorer to its Weight via a getter method, it may be possible to incorporate score's explanation in the same output, but I think they may be separate use-cases.

In general I think explain should work for end-users and debugging at that low level is more of a developer task.

[Legacy Jira: Robert Muir (@rmuir) on Oct 26 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

By the way there exists a very simple example of such stuff in the unit tests: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/test/org/apache/lucene/search/TestBooleanQueryVisitSubscorers.java#L262

I don't think we need to bring Collector into it as such a debug() would take docID and only work on one doc, so that would be overkill. but you can see traversal and the kind of assertions in unit tests such as:

assertEquals(
          "ConjunctionScorer\n" +
          "    MUST ConstantScoreScorer\n" +
          "    MUST MinShouldMatchSumScorer\n" +
          "            SHOULD TermScorer body:crawler\n" +
          "            SHOULD TermScorer body:web\n" +
          "            SHOULD TermScorer body:nutch",
              summary);

Output can use child.docID() == doc to determine if something matched or not, and maybe it can include stuff such as freq() and score() which may be enough to help debug issues as to why something matched or didnt match. You can always climb up Scorer -> Weight -> Query to get additional metadata/toString for improved output or to re-arrange things differently. I think it could be generally useful if we invested a little time.

Otherwise its frustrating that we bear the cost of maintaining a scorer tree api but don't use it for anything.

[Legacy Jira: Robert Muir (@rmuir) on Oct 26 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

I have a WIP impl of this that walks the Scorer tree, advancing and checking docID(). The code is definitely cleaner and less intrusive than the previous patch. However, I ran into an issue since the Scorer tree doesn't really mirror the original Query tree. A simple example is a query like

\*:\* -foo:1

This gets scored using a ReqExclScorer which has only a single child corresponding to the MatchAllQuery and can't really be used to discover if a mis-match is due to matching the foo:1 term query.

So I think I need to walk the tree of Weights, as explain() does.

[Legacy Jira: Michael Sokolov (@msokolov) on Oct 31 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

OK, I think thats expected: for such a use case its good to see the low-level scorers? I think it also relates to my comment about "climbing" to organize the output better, if you think of it as a network you can "climb" via scorer.getWeight() to have the weight nodes? And from there you can climb up to weight.getQuery() too if needed. The challenge is probably just organizing the output to be nice.

[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

I pursued that line of attack a bit further, got something that seemed to work, and then ran into a wall because I have to use a bunch of casts in order to avoid making changes to Weight, and sometimes the casts are wrong.

java.lang.ClassCastException: org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight cannot be cast to org.apache.lucene.search.BooleanWeight

so I think the only way forward here would be to add a new method to Weight, akin to explain(). In that case, the original idea seems like the least change.

[Legacy Jira: Michael Sokolov (@msokolov) on Oct 31 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

Sorry I missed your comment there, Robert. Maybe this would be clearer if I posted the code, but the issue with walking the Scorer tree seemed like a blocker to me. ReqExclScorer.getChildren() is incomplete – it does not include the exclScorer.

[Legacy Jira: Michael Sokolov (@msokolov) on Oct 31 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

Yes, maybe post the code: i don't think we should give up on doing the right thing just because of the impl of ReqExclScorer, these things could be changed.

I don't think we need to cast Weights to anything, it is probably enough to just organize scorers with the Query that generated them (just use .toString and .equals for de-duping and output, no casts needed).

It doesn't make sense to me to complicate Explanation with matching stuff, and definitely not to do that and at the same time keep Scorer.getChildren api (which is intended to do this kind of stuff). I'd rather fix the latter api

[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

I posted a patch file that has two failed attempts: one that walks over the Scorer tree, and the other that walks over the Weight tree. Beyond the problem I mentioned before, I wonder if I'm missing something about how to use Scorer.advance() (this is somewhat new to me), since it seems that when I call advance() on the BooleanWeight's scorer, say for a conjunction, it advances all the child scorers beyond the doc of interest, so it's impossible to tell if any of them matched.

[Legacy Jira: Michael Sokolov (@msokolov) on Oct 31 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

Maybe we should add a DebugBooleanQuery, somewhere under test-framework, that is slow and stupid inefficient on how it does matching, but makes it easier to see why a query didn't match, get child scorers that are "on" the current doc if the doc matched, etc.?

[Legacy Jira: Michael McCandless (@mikemccand) on Nov 02 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

slow and stupid inefficient on how it does matching

So – are you suggesting it would score every sub-query independently?

[Legacy Jira: Michael Sokolov (@msokolov) on Nov 02 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

+1 for the DebugBooleanQuery idea. When I looked at the patches here it seems thats really what you want to do.

It can be separately useful for tests purposes.

[Legacy Jira: Robert Muir (@rmuir) on Nov 02 2017]

from stargazers-migration-test.

mikemccand commented on June 3, 2024

So – are you suggesting it would score every sub-query independently?

Yeah, I think so? This should ensure that if a child scorer is not on the current hit, that means it did not match the current hit. But hopefully it can compute the same scores that BQ's more advanced scorers do? Or maybe because of order-of-operation differences this is too hard or something?

[Legacy Jira: Michael McCandless (@mikemccand) on Nov 02 2017]

from stargazers-migration-test.

Add a root failure cause to Explanation [LUCENE-8019] about stargazers-migration-test HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent