Giter VIP home page Giter VIP logo

Comments (6)

mikemccand avatar mikemccand commented on June 12, 2024

But this seems tricky, today you can downgrade to DOCS_ONLY on the fly,

Maybe we should stop allowing this? I.e. throw an exception if the index options try to downgrade for a field.

[Legacy Jira: Michael McCandless (@mikemccand) on Nov 02 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Here is a patch, i didn't yet improve tests and didn't address downgrading at all though.

I ran omitTF experiments: mean average precision on 3 test collections, different languages, with/without stopwords, with different scoring systems.

english:

EnglishAnalyzer(CharArraySet.EMPTY_SET)

Sim DOCS_AND_FREQS DOCS (master) DOCS (patch) diff
Classic 0.3363 0.1465 0.2080 +42.0%
BM25 0.4492 0.2023 0.2746 +35.7%
I(ne)B2 0.4553 0.2151 0.2801 +30.2%
I(ne)B1 0.4231 0.1679 0.2539 +51.2%
PL2 0.3624 0.2006 0.2656 +32.4%
LM(dirichlet) 0.4408 0.2814 0.2851 +1.3%
DFI(chisquare) 0.4236 0.2493 0.2819 +13.1%

EnglishAnalyzer()

Sim DOCS_AND_FREQS DOCS (master) DOCS (patch) diff
Classic 0.3478 0.1651 0.2052 +24.3%
BM25 0.4505 0.2269 0.2720 +19.9%
I(ne)B2 0.4563 0.2401 0.2785 +16.0%
I(ne)B1 0.4285 0.1992 0.2516 +26.3%
PL2 0.4438 0.2182 0.2617 +19.9%
LM(dirichlet) 0.4372 0.2827 0.2851 +0.8%
DFI(chisquare) 0.4380 0.2637 0.2858 +8.4%

bengali:

BengaliAnalyzer(CharArraySet.EMPTY_SET)

Sim DOCS_AND_FREQS DOCS (master) DOCS (patch) diff
Classic 0.2326 0.1211 0.1371 +13.2%
BM25 0.2989 0.1367 0.1673 +22.4%
I(ne)B2 0.3111 0.1469 0.1738 +18.3%
I(ne)B1 0.2886 0.1237 0.1520 +22.9%
PL2 0.2906 0.1372 0.1636 +19.2%
LM(dirichlet) 0.3007 0.1805 0.1829 +1.3%
DFI(chisquare) 0.2938 0.1678 0.1790 +6.7%

BengaliAnalyzer()

Sim DOCS_AND_FREQS DOCS (master) DOCS (patch) diff
Classic 0.2266 0.1231 0.1360 +10.5%
BM25 0.2947 0.1390 0.1649 +18.6%
I(ne)B2 0.3074 0.1485 0.1723 +16.0%
I(ne)B1 0.2848 0.1248 0.1486 +19.1%
PL2 0.2856 0.1377 0.1608 +16.8%
LM(dirichlet) 0.2982 0.1803 0.1836 +1.8%
DFI(chisquare) 0.2887 0.1703 0.1810 +6.3%

kurdish:

SoraniAnalyzer(CharArraySet.EMPTY_SET)

Sim DOCS_AND_FREQS DOCS (master) DOCS (patch) diff
Classic 0.2957 0.1625 0.1811 +11.4%
BM25 0.3207 0.1871 0.2087 +11.5%
I(ne)B2 0.3354 0.1937 0.2113 +9.1%
I(ne)B1 0.3263 0.1762 0.1992 +13.1%
PL2 0.3134 0.1738 0.2002 +15.2%
LM(dirichlet) 0.2877 0.2130 0.2149 +0.9%
DFI(chisquare) 0.3157 0.2014 0.2129 +5.7%

SoraniAnalyzer()

Sim DOCS_AND_FREQS DOCS (master) DOCS (patch) diff
Classic 0.2977 0.1654 0.1781 +7.7%
BM25 0.3205 0.1918 0.2077 +8.3%
I(ne)B2 0.3345 0.1979 0.2107 +6.5%
I(ne)B1 0.3266 0.1798 0.1970 +9.6%
PL2 0.3115 0.1761 0.1998 +13.5%
LM(dirichlet) 0.2815 0.2116 0.2144 +1.3%
DFI(chisquare) 0.3143 0.2022 0.2115 +4.6%

[Legacy Jira: Robert Muir (@rmuir) on Nov 06 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

+1 to disallow downgrading

[Legacy Jira: Adrien Grand (@jpountz) on Jan 15 2018]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Let's move forward with your change now that LUCENE-8134 is merged?

[Legacy Jira: Adrien Grand (@jpountz) on Feb 15 2018]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 29e5b8abcee8a566cc057b862ab99c5ffef13a76 in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=29e5b8a

LUCENE-8031: DOCS_ONLY fields set incorrect length norm

[Legacy Jira: ASF subversion and git services on Feb 24 2018]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Thank you for doing the hard part Adrien!

[Legacy Jira: Robert Muir (@rmuir) on Feb 24 2018]

from stargazers-migration-test.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.