Giter VIP home page Giter VIP logo

Comments (8)

mikemccand avatar mikemccand commented on June 12, 2024

I opened this as a separate issue because we may want to backport it. Relevance pretty much falls apart in this case today with these sims (except Classic which does not use a pivot, hence unchanged). Here are my results on the bengali collection (I had it handy from working with the analyzer):

Sim Baseline MAP DOCS_ONLY MAP (master) DOCS_ONLY MAP (patch)
Classic 0.2266 0.1231 0.1231
BM25 0.2947 0.0531 0.1390
I(ne)B2 0.3074 0.0534 0.1485
I(ne)B1 0.2848 0.0529 0.1248
PL2 0.2856 0.0148 0.1377
LM(dirichlet) 0.2982 0.0035 0.1803
DFI(chisquare) 0.2887 0.0035 0.1703

I can dig up some other datasets to confirm.

edit: updated table with 2 additional sims.

[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

patch. it falls back to the bogus value only if sumDocFreq is unavailable, which doesn't happen with any codecs since lucene 4 or so.

note for SimilarityBase it doesn't just correct avgdl but also the numberOfFieldTokens, which was previously (bogusly) set to docFreq as if the term being scored was the only one in the collection! I will update tests across more sims such as LM and DFI that are sensitive to this to see any improvement.

[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

I added DFI and LM as well, they are the worst case: just fall apart completely today for omitTFAP because the bogus sumTotalTF=docFreq makes them lose the ability to discriminate term importance too.

[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

+1

[Legacy Jira: Adrien Grand (@jpountz) on Oct 31 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 7495a9d75bb2efde2f76d68b376560ab86693cd9 in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7495a9d

LUCENE-8025: Use totalTermFreq=sumDocFreq when scoring DOCS_ONLY fields

[Legacy Jira: ASF subversion and git services on Nov 01 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 4e1ef13a1274a3beb17b2696d08318a241e4d86e in lucene-solr's branch refs/heads/branch_7x from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4e1ef13

LUCENE-8025: Use totalTermFreq=sumDocFreq when scoring DOCS_ONLY fields

[Legacy Jira: ASF subversion and git services on Nov 01 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 2658ff62c84e2cc8405a6b6ef988060be430f61a in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2658ff6

LUCENE-8025: fix changes entry, its sumTotalTermFreq

[Legacy Jira: ASF subversion and git services on Nov 01 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 7b7bdf39927ffd9a2654f002bf066cdd817315da in lucene-solr's branch refs/heads/branch_7x from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b7bdf3

LUCENE-8025: fix changes entry, its sumTotalTermFreq

[Legacy Jira: ASF subversion and git services on Nov 01 2017]

from stargazers-migration-test.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.