Comments (6)
I noticed this when trying to debug AfterEffectB in LUCENE-8015.
The formula should be: (F + 1) / (n * (tfn + 1))
but we currently use (F + 1 + 1) / ((n + 1) * (tfn + 1))
and I couldn't remember why we had this mess everywhere.
[Legacy Jira: Robert Muir (@rmuir) on Oct 28 2017]
from stargazers-migration-test.
Here is a patch. core tests pass. I didn't yet revert all the similarity formula hacks as that will take more time.
[Legacy Jira: Robert Muir (@rmuir) on Oct 28 2017]
from stargazers-migration-test.
Updated patch with javadocs improvements for Term/CollectionStatistics. solr distributed idf code needed small tweaks but they were easy because it already assumed this method might return null.
[Legacy Jira: Robert Muir (@rmuir) on Oct 28 2017]
from stargazers-migration-test.
I reviewed callers of termStatistics and found also TermAutomatonQuery in sandbox (scores like phrase query but has the same current issue as SpanOrQuery if some don't exist), fixed it the same way, and added unit tests. I think its ready.
[Legacy Jira: Robert Muir (@rmuir) on Oct 28 2017]
from stargazers-migration-test.
Commit e0bde579815ae5ce2525bb659d04e908812f1605 in lucene-solr's branch refs/heads/master from @rmuir
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e0bde57
LUCENE-8020: don't force sim to score bogus terms (e.g. docfreq=0)
[Legacy Jira: ASF subversion and git services on Oct 31 2017]
from stargazers-migration-test.
I spun off LUCENE-8023 and LUCENE-8024 to simplify sim impls.
[Legacy Jira: Robert Muir (@rmuir) on Oct 31 2017]
from stargazers-migration-test.
Related Issues (20)
- Code Cleanup: Use entryset for map iteration wherever possible - part 2 possible. [LUCENE-8979] HOT 5
- Optimise SegmentTermsEnum.seekExact performance [LUCENE-8980] HOT 9
- Update javadocs to reflect experimental status of Kuromoji DictionaryBuilder [LUCENE-8981] HOT 3
- Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] HOT 31
- PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase [LUCENE-8983] HOT 13
- SynonymGraphFilter cannot handle input stream with tokens filtered. [LUCENE-8985] HOT 12
- Add asf.yaml to our git repo [LUCENE-8986] HOT 7
- Move Lucene web site from svn to git [LUCENE-8987] HOT 58
- Maximal -- Minimum Based Early Termination For TopFieldCollector [LUCENE-8988]
- IndexSearcher Should Handle Rejection of Concurrent Task [LUCENE-8989] HOT 10
- IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document [LUCENE-8990] HOT 8
- disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 [LUCENE-8991] HOT 13
- Share minimum score across segments in concurrent search [LUCENE-8992] HOT 7
- Change Maven POM repository URLs to https [LUCENE-8993] HOT 15
- Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). [LUCENE-8994] HOT 5
- TopSuggestDocsCollector#collect should be able to signal rejection [LUCENE-8995] HOT 1
- Add type of triangle info to ShapeField encoding [LUCENE-8997] HOT 4
- OverviewImplTest.testIsOptimized reproducible failure [LUCENE-8998] HOT 5
- expectThrows doesn't play nicely with "assume" failures [LUCENE-8999] HOT 12
- Cannot resolve classes from org.apache.lucene.core plugin and others [LUCENE-9000] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stargazers-migration-test.