Comments (6)
But this seems tricky, today you can downgrade to DOCS_ONLY on the fly,
Maybe we should stop allowing this? I.e. throw an exception if the index options try to downgrade for a field.
[Legacy Jira: Michael McCandless (@mikemccand) on Nov 02 2017]
from stargazers-migration-test.
Here is a patch, i didn't yet improve tests and didn't address downgrading at all though.
I ran omitTF experiments: mean average precision on 3 test collections, different languages, with/without stopwords, with different scoring systems.
english:
EnglishAnalyzer(CharArraySet.EMPTY_SET)
Sim | DOCS_AND_FREQS | DOCS (master) | DOCS (patch) | diff |
---|---|---|---|---|
Classic | 0.3363 | 0.1465 | 0.2080 | +42.0% |
BM25 | 0.4492 | 0.2023 | 0.2746 | +35.7% |
I(ne)B2 | 0.4553 | 0.2151 | 0.2801 | +30.2% |
I(ne)B1 | 0.4231 | 0.1679 | 0.2539 | +51.2% |
PL2 | 0.3624 | 0.2006 | 0.2656 | +32.4% |
LM(dirichlet) | 0.4408 | 0.2814 | 0.2851 | +1.3% |
DFI(chisquare) | 0.4236 | 0.2493 | 0.2819 | +13.1% |
EnglishAnalyzer()
Sim | DOCS_AND_FREQS | DOCS (master) | DOCS (patch) | diff |
---|---|---|---|---|
Classic | 0.3478 | 0.1651 | 0.2052 | +24.3% |
BM25 | 0.4505 | 0.2269 | 0.2720 | +19.9% |
I(ne)B2 | 0.4563 | 0.2401 | 0.2785 | +16.0% |
I(ne)B1 | 0.4285 | 0.1992 | 0.2516 | +26.3% |
PL2 | 0.4438 | 0.2182 | 0.2617 | +19.9% |
LM(dirichlet) | 0.4372 | 0.2827 | 0.2851 | +0.8% |
DFI(chisquare) | 0.4380 | 0.2637 | 0.2858 | +8.4% |
bengali:
BengaliAnalyzer(CharArraySet.EMPTY_SET)
Sim | DOCS_AND_FREQS | DOCS (master) | DOCS (patch) | diff |
---|---|---|---|---|
Classic | 0.2326 | 0.1211 | 0.1371 | +13.2% |
BM25 | 0.2989 | 0.1367 | 0.1673 | +22.4% |
I(ne)B2 | 0.3111 | 0.1469 | 0.1738 | +18.3% |
I(ne)B1 | 0.2886 | 0.1237 | 0.1520 | +22.9% |
PL2 | 0.2906 | 0.1372 | 0.1636 | +19.2% |
LM(dirichlet) | 0.3007 | 0.1805 | 0.1829 | +1.3% |
DFI(chisquare) | 0.2938 | 0.1678 | 0.1790 | +6.7% |
BengaliAnalyzer()
Sim | DOCS_AND_FREQS | DOCS (master) | DOCS (patch) | diff |
---|---|---|---|---|
Classic | 0.2266 | 0.1231 | 0.1360 | +10.5% |
BM25 | 0.2947 | 0.1390 | 0.1649 | +18.6% |
I(ne)B2 | 0.3074 | 0.1485 | 0.1723 | +16.0% |
I(ne)B1 | 0.2848 | 0.1248 | 0.1486 | +19.1% |
PL2 | 0.2856 | 0.1377 | 0.1608 | +16.8% |
LM(dirichlet) | 0.2982 | 0.1803 | 0.1836 | +1.8% |
DFI(chisquare) | 0.2887 | 0.1703 | 0.1810 | +6.3% |
kurdish:
SoraniAnalyzer(CharArraySet.EMPTY_SET)
Sim | DOCS_AND_FREQS | DOCS (master) | DOCS (patch) | diff |
---|---|---|---|---|
Classic | 0.2957 | 0.1625 | 0.1811 | +11.4% |
BM25 | 0.3207 | 0.1871 | 0.2087 | +11.5% |
I(ne)B2 | 0.3354 | 0.1937 | 0.2113 | +9.1% |
I(ne)B1 | 0.3263 | 0.1762 | 0.1992 | +13.1% |
PL2 | 0.3134 | 0.1738 | 0.2002 | +15.2% |
LM(dirichlet) | 0.2877 | 0.2130 | 0.2149 | +0.9% |
DFI(chisquare) | 0.3157 | 0.2014 | 0.2129 | +5.7% |
SoraniAnalyzer()
Sim | DOCS_AND_FREQS | DOCS (master) | DOCS (patch) | diff |
---|---|---|---|---|
Classic | 0.2977 | 0.1654 | 0.1781 | +7.7% |
BM25 | 0.3205 | 0.1918 | 0.2077 | +8.3% |
I(ne)B2 | 0.3345 | 0.1979 | 0.2107 | +6.5% |
I(ne)B1 | 0.3266 | 0.1798 | 0.1970 | +9.6% |
PL2 | 0.3115 | 0.1761 | 0.1998 | +13.5% |
LM(dirichlet) | 0.2815 | 0.2116 | 0.2144 | +1.3% |
DFI(chisquare) | 0.3143 | 0.2022 | 0.2115 | +4.6% |
[Legacy Jira: Robert Muir (@rmuir) on Nov 06 2017]
from stargazers-migration-test.
+1 to disallow downgrading
[Legacy Jira: Adrien Grand (@jpountz) on Jan 15 2018]
from stargazers-migration-test.
Let's move forward with your change now that LUCENE-8134 is merged?
[Legacy Jira: Adrien Grand (@jpountz) on Feb 15 2018]
from stargazers-migration-test.
Commit 29e5b8abcee8a566cc057b862ab99c5ffef13a76 in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=29e5b8a
LUCENE-8031: DOCS_ONLY fields set incorrect length norm
[Legacy Jira: ASF subversion and git services on Feb 24 2018]
from stargazers-migration-test.
Thank you for doing the hard part Adrien!
[Legacy Jira: Robert Muir (@rmuir) on Feb 24 2018]
from stargazers-migration-test.
Related Issues (20)
- Code Cleanup: Use entryset for map iteration wherever possible - part 2 possible. [LUCENE-8979] HOT 5
- Optimise SegmentTermsEnum.seekExact performance [LUCENE-8980] HOT 9
- Update javadocs to reflect experimental status of Kuromoji DictionaryBuilder [LUCENE-8981] HOT 3
- Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] HOT 31
- PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase [LUCENE-8983] HOT 13
- SynonymGraphFilter cannot handle input stream with tokens filtered. [LUCENE-8985] HOT 12
- Add asf.yaml to our git repo [LUCENE-8986] HOT 7
- Move Lucene web site from svn to git [LUCENE-8987] HOT 58
- Maximal -- Minimum Based Early Termination For TopFieldCollector [LUCENE-8988]
- IndexSearcher Should Handle Rejection of Concurrent Task [LUCENE-8989] HOT 10
- IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document [LUCENE-8990] HOT 8
- disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 [LUCENE-8991] HOT 13
- Share minimum score across segments in concurrent search [LUCENE-8992] HOT 7
- Change Maven POM repository URLs to https [LUCENE-8993] HOT 15
- Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). [LUCENE-8994] HOT 5
- TopSuggestDocsCollector#collect should be able to signal rejection [LUCENE-8995] HOT 1
- Add type of triangle info to ShapeField encoding [LUCENE-8997] HOT 4
- OverviewImplTest.testIsOptimized reproducible failure [LUCENE-8998] HOT 5
- expectThrows doesn't play nicely with "assume" failures [LUCENE-8999] HOT 12
- Cannot resolve classes from org.apache.lucene.core plugin and others [LUCENE-9000] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stargazers-migration-test.