Comments (8)
I opened this as a separate issue because we may want to backport it. Relevance pretty much falls apart in this case today with these sims (except Classic which does not use a pivot, hence unchanged). Here are my results on the bengali collection (I had it handy from working with the analyzer):
Sim | Baseline MAP | DOCS_ONLY MAP (master) | DOCS_ONLY MAP (patch) |
---|---|---|---|
Classic | 0.2266 | 0.1231 | 0.1231 |
BM25 | 0.2947 | 0.0531 | 0.1390 |
I(ne)B2 | 0.3074 | 0.0534 | 0.1485 |
I(ne)B1 | 0.2848 | 0.0529 | 0.1248 |
PL2 | 0.2856 | 0.0148 | 0.1377 |
LM(dirichlet) | 0.2982 | 0.0035 | 0.1803 |
DFI(chisquare) | 0.2887 | 0.0035 | 0.1703 |
I can dig up some other datasets to confirm.
edit: updated table with 2 additional sims.
[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]
from stargazers-migration-test.
patch. it falls back to the bogus value only if sumDocFreq is unavailable, which doesn't happen with any codecs since lucene 4 or so.
note for SimilarityBase it doesn't just correct avgdl but also the numberOfFieldTokens, which was previously (bogusly) set to docFreq as if the term being scored was the only one in the collection! I will update tests across more sims such as LM and DFI that are sensitive to this to see any improvement.
[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]
from stargazers-migration-test.
I added DFI and LM as well, they are the worst case: just fall apart completely today for omitTFAP because the bogus sumTotalTF=docFreq makes them lose the ability to discriminate term importance too.
[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]
from stargazers-migration-test.
+1
[Legacy Jira: Adrien Grand (@jpountz) on Oct 31 2017]
from stargazers-migration-test.
Commit 7495a9d75bb2efde2f76d68b376560ab86693cd9 in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7495a9d
LUCENE-8025: Use totalTermFreq=sumDocFreq when scoring DOCS_ONLY fields
[Legacy Jira: ASF subversion and git services on Nov 01 2017]
from stargazers-migration-test.
Commit 4e1ef13a1274a3beb17b2696d08318a241e4d86e in lucene-solr's branch refs/heads/branch_7x from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4e1ef13
LUCENE-8025: Use totalTermFreq=sumDocFreq when scoring DOCS_ONLY fields
[Legacy Jira: ASF subversion and git services on Nov 01 2017]
from stargazers-migration-test.
Commit 2658ff62c84e2cc8405a6b6ef988060be430f61a in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2658ff6
LUCENE-8025: fix changes entry, its sumTotalTermFreq
[Legacy Jira: ASF subversion and git services on Nov 01 2017]
from stargazers-migration-test.
Commit 7b7bdf39927ffd9a2654f002bf066cdd817315da in lucene-solr's branch refs/heads/branch_7x from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b7bdf3
LUCENE-8025: fix changes entry, its sumTotalTermFreq
[Legacy Jira: ASF subversion and git services on Nov 01 2017]
from stargazers-migration-test.
Related Issues (20)
- Update javadocs to reflect experimental status of Kuromoji DictionaryBuilder [LUCENE-8981] HOT 3
- Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] HOT 31
- PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase [LUCENE-8983] HOT 13
- MoreLikeThis MLT is biased for uncommon fields [LUCENE-8984] HOT 11
- SynonymGraphFilter cannot handle input stream with tokens filtered. [LUCENE-8985] HOT 12
- Add asf.yaml to our git repo [LUCENE-8986] HOT 7
- Move Lucene web site from svn to git [LUCENE-8987] HOT 58
- Maximal -- Minimum Based Early Termination For TopFieldCollector [LUCENE-8988]
- IndexSearcher Should Handle Rejection of Concurrent Task [LUCENE-8989] HOT 10
- IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document [LUCENE-8990] HOT 8
- disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 [LUCENE-8991] HOT 13
- Share minimum score across segments in concurrent search [LUCENE-8992] HOT 7
- Change Maven POM repository URLs to https [LUCENE-8993] HOT 15
- Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). [LUCENE-8994] HOT 5
- TopSuggestDocsCollector#collect should be able to signal rejection [LUCENE-8995] HOT 1
- maxScore is sometimes missing from distributed grouped responses [LUCENE-8996] HOT 45
- Add type of triangle info to ShapeField encoding [LUCENE-8997] HOT 4
- OverviewImplTest.testIsOptimized reproducible failure [LUCENE-8998] HOT 5
- expectThrows doesn't play nicely with "assume" failures [LUCENE-8999] HOT 12
- Cannot resolve classes from org.apache.lucene.core plugin and others [LUCENE-9000] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stargazers-migration-test.