Comments (24)
I looked into it, this similarity ends up doing something like that:
double tfn = // non-decreasing function of tf
return (tfn * C1) * (C2 / (tfn + 1)); // C1 and C2 are some constants
The issue is that even if tfn increases, the result might decrease if tfn * C1
is rounded down and/or C2/(tfn + 1)
is rounded up. One way to fix it that I can think of is to make the value of tfn more discrete by doing eg.
diff --git a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
index aacd246..554d12f 100644
--- a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
+++ b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
@@ -108,7 +108,7 @@ public class DFRSimilarity extends SimilarityBase {
`@Override`
protected double score(BasicStats stats, double freq, double docLen) {
- double tfn = normalization.tfn(stats, freq, docLen);
+ double tfn = (float) normalization.tfn(stats, freq, docLen); // cast to float on purpose to introduce gaps between consecutive values and prevent double rounding errors to make the score decrease when tfn increases
return stats.getBoost() *
basicModel.score(stats, tfn) * afterEffect.score(stats, tfn);
}
Opinions?
[Legacy Jira: Adrien Grand (@jpountz) on Oct 26 2017]
from stargazers-migration-test.
lets take a step back first. which 3 DFR components are involved? Can you include the test output?
[Legacy Jira: Robert Muir (@rmuir) on Oct 26 2017]
from stargazers-migration-test.
[junit4] <JUnit4> says 你好! Master seed: 86E85958B1183E93
[junit4] Executing 1 suite with 1 JVM.
[junit4]
[junit4] Started J0 PID(22203@localhost).
[junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelIne
[junit4] 1> 7.0E-45 = score(DFRSimilarity, doc=0, freq=0.99999994), computed from:
[junit4] 1> 1.4E-45 = boost
[junit4] 1> 3.46341352E16 = NormalizationH1, computed from:
[junit4] 1> 0.99999994 = tf
[junit4] 1> 2.09433728E9 = avgFieldLength
[junit4] 1> 26.0 = len
[junit4] 1> 1.03902406E17 = BasicModelIne, computed from:
[junit4] 1> 11.0 = numberOfDocuments
[junit4] 1> 1.0 = totalTermFreq
[junit4] 1> 4.3309873E-17 = AfterEffectB, computed from:
[junit4] 1> 3.46341352E16 = tfn
[junit4] 1> 1.0 = totalTermFreq
[junit4] 1> 1.0 = docFreq
[junit4] 1>
[junit4] 1> 5.6E-45 = score(DFRSimilarity, doc=0, freq=1.0), computed from:
[junit4] 1> 1.4E-45 = boost
[junit4] 1> 3.46341374E16 = NormalizationH1, computed from:
[junit4] 1> 1.0 = tf
[junit4] 1> 2.09433728E9 = avgFieldLength
[junit4] 1> 26.0 = len
[junit4] 1> 1.03902414E17 = BasicModelIne, computed from:
[junit4] 1> 11.0 = numberOfDocuments
[junit4] 1> 1.0 = totalTermFreq
[junit4] 1> 4.330987E-17 = AfterEffectB, computed from:
[junit4] 1> 3.46341374E16 = tfn
[junit4] 1> 1.0 = totalTermFreq
[junit4] 1> 1.0 = docFreq
[junit4] 1>
[junit4] 1> DFR I(ne)B1
[junit4] 1> field="field",maxDoc=16,docCount=11,sumTotalTermFreq=23037710092,sumDocFreq=1421016222
[junit4] 1> term="term",docFreq=1,totalTermFreq=1
[junit4] 1> norm=26 (doc length ~ 26)
[junit4] 1> freq=1.0
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBasicModelIne -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu -Dtests.asserts=true -Dtests.file.encoding=UTF8
[junit4] FAILURE 3.13s | TestBasicModelIne.testRandomScoring <<<
[junit4] > Throwable #1: java.lang.AssertionError: score(0.99999994)=7.0E-45 > score(1.0)=5.6E-45
[junit4] > at __randomizedtesting.SeedInfo.seed([86E85958B1183E93:D7700EAAB6FD899]:0)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {field=PostingsFormat(name=Memory)}, docValues:{}, maxPointsInLeafNode=285, maxMBSortInHeap=6.307483399953041, sim=RandomSimilarity(queryNorm=false): {field=IB LL-DZ(0.3)}, locale=vi-VN, timezone=Pacific/Tongatapu
[junit4] 2> NOTE: Linux 4.4.0-97-generic amd64/Oracle Corporation 1.8.0_102 (64-bit)/cpus=8,threads=1,free=241459528,total=344457216
[junit4] 2> NOTE: All tests run in this JVM: [TestBasicModelIne]
[junit4] Completed [1/1 (1!)] in 3.79s, 1 test, 1 failure <<< FAILURES!
[junit4]
[junit4]
[junit4] Tests with failures [seed: 86E85958B1183E93]:
[junit4] - org.apache.lucene.search.similarities.TestBasicModelIne.testRandomScoring
[junit4]
[junit4]
[junit4] JVM J0: 0.40 .. 4.74 = 4.34s
[junit4] Execution time total: 4.75 sec.
[junit4] Tests summary: 1 suite, 1 test, 1 failure
[Legacy Jira: Adrien Grand (@jpountz) on Oct 26 2017]
from stargazers-migration-test.
Thanks I will look in. Its hard to debug it specifically without fixing explains first (we really need to do that, then you can "see" what goes wrong from test fails like this). Separately the test is inefficient in that this only comes out with beasting many iterations. We should improve the test to more often enumerate edges (e.g. min/max values) that look like this so that its more efficient.
at a glance it looks like small collection with mostly super-huge docs but then one tiny doc. So it may stress some extremes in computations like dl/avgdl
type stuff, and expose a hazard in one of the components here. I have to look more...
[Legacy Jira: Robert Muir (@rmuir) on Oct 26 2017]
from stargazers-migration-test.
Maybe the issue is better fixed in after-effect B? Instead of (F+1)/(n * (tf + 1))
we can do (F+1)/n * 1/(tf+1)
. Keep in mind F and n are presumably large, as they are the term's totalTermFreq and docFreq although not in this particular failure.
[Legacy Jira: Robert Muir (@rmuir) on Oct 28 2017]
from stargazers-migration-test.
Two reproducing Jenkins failures, of a different test suite (*In
vs. *Ine
): first, from https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20789/:
Checking out Revision 39376cd8b5ef03b3338c2e8fa31dce732749bcd7 (refs/remotes/origin/master)
[...]
[junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelIn
[junit4] 1> 1.27634828E18 = score(DFRSimilarity, doc=0, freq=1.18869171E9), computed from:
[junit4] 1> 2.14748365E9 = boost
[junit4] 1> 1.18869171E9 = NormalizationZ, computed from:
[junit4] 1> 1.18869171E9 = tf
[junit4] 1> 6.0234362E8 = avgFieldLength
[junit4] 1> 76.0 = len
[junit4] 1> 1.18869171E9 = BasicModelIn, computed from:
[junit4] 1> 2.0 = numberOfDocuments
[junit4] 1> 1.0 = docFreq
[junit4] 1> 0.50000006 = AfterEffectB, computed from:
[junit4] 1> 1.18869171E9 = tfn
[junit4] 1> 1.18869184E9 = totalTermFreq
[junit4] 1> 1.0 = docFreq
[junit4] 1>
[junit4] 1> 1.27634814E18 = score(DFRSimilarity, doc=0, freq=1.18869184E9), computed from:
[junit4] 1> 2.14748365E9 = boost
[junit4] 1> 1.18869184E9 = NormalizationZ, computed from:
[junit4] 1> 1.18869184E9 = tf
[junit4] 1> 6.0234362E8 = avgFieldLength
[junit4] 1> 76.0 = len
[junit4] 1> 1.18869184E9 = BasicModelIn, computed from:
[junit4] 1> 2.0 = numberOfDocuments
[junit4] 1> 1.0 = docFreq
[junit4] 1> 0.5 = AfterEffectB, computed from:
[junit4] 1> 1.18869184E9 = tfn
[junit4] 1> 1.18869184E9 = totalTermFreq
[junit4] 1> 1.0 = docFreq
[junit4] 1>
[junit4] 1> DFR I👎BZ(1.4E-45)
[junit4] 1> field="field",maxDoc=2,docCount=2,sumTotalTermFreq=1204687257,sumDocFreq=2
[junit4] 1> term="term",docFreq=1,totalTermFreq=1188691903
[junit4] 1> norm=53 (doc length ~ 76)
[junit4] 1> freq=1.18869184E9
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBasicModelIn -Dtests.method=testRandomScoring -Dtests.seed=4210BC5FDD9E3841 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=pt -Dtests.timezone=AET -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
[junit4] FAILURE 6.16s J1 | TestBasicModelIn.testRandomScoring <<<
[junit4] > Throwable #1: java.lang.AssertionError: score(1.18869171E9)=1.27634828E18 > score(1.18869184E9)=1.27634814E18
[junit4] > at __randomizedtesting.SeedInfo.seed([4210BC5FDD9E3841:C98FE5EDC7E9DE4B]:0)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
[junit4] > at java.lang.Thread.run(Thread.java:748)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {field=PostingsFormat(name=LuceneVarGapDocFreqInterval)}, docValues:{}, maxPointsInLeafNode=839, maxMBSortInHeap=6.659456353481144, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@19821e9), locale=pt, timezone=AET
[junit4] 2> NOTE: Linux 4.10.0-37-generic i386/Oracle Corporation 1.8.0_144 (32-bit)/cpus=8,threads=1,free=227959720,total=316669952
And from https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20744/ (output below is from my reproduction, since the job output is no longer accessible - git sha is 95d287e):
[junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelIn
[junit4] 1> 8.0517238E9 = score(DFRSimilarity, doc=0, freq=1.86950656E9), computed from:
[junit4] 1> 1.6103447E9 = boost
[junit4] 1> 2.6727952E22 = NormalizationH1, computed from:
[junit4] 1> 1.86950656E9 = tf
[junit4] 1> 1.4181463E9 = avgFieldLength
[junit4] 1> 213016.0 = len
[junit4] 1> 1.3363976E23 = BasicModelIn, computed from:
[junit4] 1> 79.0 = numberOfDocuments
[junit4] 1> 2.0 = docFreq
[junit4] 1> 3.7414016E-23 = AfterEffectL, computed from:
[junit4] 1> 2.6727952E22 = tfn
[junit4] 1>
[junit4] 1> 8.0517233E9 = score(DFRSimilarity, doc=0, freq=1.86950669E9), computed from:
[junit4] 1> 1.6103447E9 = boost
[junit4] 1> 2.6727954E22 = NormalizationH1, computed from:
[junit4] 1> 1.86950669E9 = tf
[junit4] 1> 1.4181463E9 = avgFieldLength
[junit4] 1> 213016.0 = len
[junit4] 1> 1.3363977E23 = BasicModelIn, computed from:
[junit4] 1> 79.0 = numberOfDocuments
[junit4] 1> 2.0 = docFreq
[junit4] 1> 3.7414013E-23 = AfterEffectL, computed from:
[junit4] 1> 2.6727954E22 = tfn
[junit4] 1>
[junit4] 1> DFR I👎L1
[junit4] 1> field="field",maxDoc=852,docCount=79,sumTotalTermFreq=112033561766,sumDocFreq=79
[junit4] 1> term="term",docFreq=2,totalTermFreq=2487761701
[junit4] 1> norm=149 (doc length ~ 213016)
[junit4] 1> freq=1.86950669E9
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBasicModelIn -Dtests.method=testRandomScoring -Dtests.seed=56DDF2F2BA9390E3 -Dtests.slow=true -Dtests.locale=cs -Dtests.timezone=Mexico/BajaNorte -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 3.91s | TestBasicModelIn.testRandomScoring <<<
[junit4] > Throwable #1: java.lang.AssertionError: score(1.86950656E9)=8.0517238E9 > score(1.86950669E9)=8.0517233E9
[junit4] > at __randomizedtesting.SeedInfo.seed([56DDF2F2BA9390E3:DD42AB40A0E476E9]:0)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
[junit4] > at java.lang.Thread.run(Thread.java:748)
[junit4] 2> NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=4442, maxDocsPerChunk=146, blockSize=5), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=4442, blockSize=5)), sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@146b4f40), locale=cs, timezone=Mexico/BajaNorte
[Legacy Jira: Steven Rowe on Nov 01 2017]
from stargazers-migration-test.
Another reproducing failure, from my Jenkins; it's a different test suite, but looks similar enough to me to comment on this issue:
Checking out Revision b44497fdb721fb67c3c8f20dd0a781f6beaaa8a6 (refs/remotes/origin/master)
[...]
[junit4] Suite: org.apache.lucene.search.similarities.TestBasicModelG
[junit4] 1> 5.9448525E9 = score(DFRSimilarity, doc=0, freq=0.99999994), computed from:
[junit4] 1> 1.98161741E9 = boost
[junit4] 1> 1.49336593E16 = NormalizationH1, computed from:
[junit4] 1> 0.99999994 = tf
[junit4] 1> 1.05701216E9 = avgFieldLength
[junit4] 1> 152.0 = len
[junit4] 1> 4.4800976E16 = BasicModelG, computed from:
[junit4] 1> 12.0 = numberOfDocuments
[junit4] 1> 1.0 = totalTermFreq
[junit4] 1> 6.6962825E-17 = AfterEffectL, computed from:
[junit4] 1> 1.49336593E16 = tfn
[junit4] 1>
[junit4] 1> 5.944852E9 = score(DFRSimilarity, doc=0, freq=1.0), computed from:
[junit4] 1> 1.98161741E9 = boost
[junit4] 1> 1.49336603E16 = NormalizationH1, computed from:
[junit4] 1> 1.0 = tf
[junit4] 1> 1.05701216E9 = avgFieldLength
[junit4] 1> 152.0 = len
[junit4] 1> 4.480098E16 = BasicModelG, computed from:
[junit4] 1> 12.0 = numberOfDocuments
[junit4] 1> 1.0 = totalTermFreq
[junit4] 1> 6.696282E-17 = AfterEffectL, computed from:
[junit4] 1> 1.49336603E16 = tfn
[junit4] 1>
[junit4] 1> DFR GL1
[junit4] 1> field="field",maxDoc=50,docCount=12,sumTotalTermFreq=12684145308,sumDocFreq=12
[junit4] 1> term="term",docFreq=1,totalTermFreq=1
[junit4] 1> norm=64 (doc length ~ 152)
[junit4] 1> freq=1.0
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBasicModelG -Dtests.method=testRandomScoring -Dtests.seed=4B5C3926B202A201 -Dtests.slow=true -Dtests.locale=en-IE -Dtests.timezone=Pacific/Bougainville -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 7.31s J0 | TestBasicModelG.testRandomScoring <<<
[junit4] > Throwable #1: java.lang.AssertionError: score(0.99999994)=5.9448525E9 > score(1.0)=5.944852E9
[junit4] > at __randomizedtesting.SeedInfo.seed([4B5C3926B202A201:C0C36094A875440B]:0)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.doTestScoring(BaseSimilarityTestCase.java:405)
[junit4] > at org.apache.lucene.search.similarities.BaseSimilarityTestCase.testRandomScoring(BaseSimilarityTestCase.java:357)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {field=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, maxPointsInLeafNode=68, maxMBSortInHeap=6.052983739984725, sim=RandomSimilarity(queryNorm=false): {field=DFR I(ne)B3(800.0)}, locale=en-IE, timezone=Pacific/Bougainville
[junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=293394976,total=351797248
[Legacy Jira: Steven Rowe on Nov 01 2017]
from stargazers-migration-test.
Thanks Steve: I am not sure if the 3 failures represent just one bug, but its very relevant.
Adrien's suggested fix alone will fix #1 and #3 but not #2. #2 is very clearly the hazard in AfterEffectB that I described (you can see it from the explain). If you combine both of our suggested fixes, all 3 cases will pass.
We should first maybe make the test more efficient.
[Legacy Jira: Robert Muir (@rmuir) on Nov 01 2017]
from stargazers-migration-test.
I tested your last failure of GL2 (#4) and its also covered by adrien's fix.
[Legacy Jira: Robert Muir (@rmuir) on Nov 01 2017]
from stargazers-migration-test.
Here is a patch improving the test. It just tries to hit the edge cases with more probability.
It now seems to generally fail with only 1 or 2 rounds of
ant beast -Dbeast.iters=10 -Dtests.class="org.apache.lucene.search.similarities.Test*"
I'd rather it fail every time of course, but I think its still an improvement.
[Legacy Jira: Robert Muir (@rmuir) on Nov 01 2017]
from stargazers-migration-test.
I dug into the I\👎 and I(ne) failures here via the new test, their biggest problem is in the BasicModel itself.
These idf-like functions have the "log1p" trap due to the formulas in use. Note their formula is log2((maxDoc + 1) / (x + 0.5))
where x is docFreq for I\👎, expected docFreq for I(ne), and totalTermFreq for I(F). So the worst case (e.g. term in every doc) gets even worse as collection size increases, because we take log of values increasingly closer to 1.
BasicModel I(F) never fails because we added a floor in its log: we had to, since it would otherwise go negative when totalTermFreq exceeds maxDoc, which can easily happen. But we should fix the other two in the same way, I think. It does not change retrieval quality in my tests.
If I floor them to avoid this issue like this, it fixes all their fails here and they survive hundred rounds of beasting by my new test:
--- a/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java
+++ b/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIn.java
@@ -33,7 +33,7 @@ public class BasicModelIn extends BasicModel {
public final double score(BasicStats stats, double tfn) {
long N = stats.getNumberOfDocuments();
long n = stats.getDocFreq();
- return tfn * log2((N + 1) / (n + 0.5));
+ return tfn * log2(1 + (N + 1) / (n + 0.5));
}
`@Override`
--- a/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIne.java
+++ b/lucene/core/src/java/org/apache/lucene/search/similarities/BasicModelIne.java
@@ -34,7 +34,7 @@ public class BasicModelIne extends BasicModel {
long N = stats.getNumberOfDocuments();
long F = stats.getTotalTermFreq();
double ne = N * (1 - Math.pow((N - 1) / (double)N, F));
- return tfn * log2((N + 1) / (ne + 0.5));
+ return tfn * log2(1 + (N + 1) / (ne + 0.5));
}
Model G failures are separate, I have not looked into it yet.
[Legacy Jira: Robert Muir (@rmuir) on Nov 01 2017]
from stargazers-migration-test.
+1 to giving it a try
[Legacy Jira: Adrien Grand (@jpountz) on Nov 02 2017]
from stargazers-migration-test.
I have been looking into the following model G failure.
7.0E-45 = score(DFRSimilarity, doc=0, freq=0.99999994), computed from:
1.4E-45 = boost
3.09640771E16 = NormalizationH1, computed from:
0.99999994 = tf
1.61490253E9 = avgFieldLength
112.0 = len
9.2892231E16 = BasicModelG, computed from:
12.0 = numberOfDocuments
1.0 = totalTermFreq
4.8443234E-17 = AfterEffectB, computed from:
3.09640771E16 = tfn
1.0 = totalTermFreq
1.0 = docFreq
5.6E-45 = score(DFRSimilarity, doc=0, freq=1.0), computed from:
1.4E-45 = boost
3.09640792E16 = NormalizationH1, computed from:
1.0 = tf
1.61490253E9 = avgFieldLength
112.0 = len
9.289224E16 = BasicModelG, computed from:
12.0 = numberOfDocuments
1.0 = totalTermFreq
4.844323E-17 = AfterEffectB, computed from:
3.09640792E16 = tfn
1.0 = totalTermFreq
1.0 = docFreq
DFR GB1
field="field",maxDoc=46519,docCount=12,sumTotalTermFreq=19378830951,sumDocFreq=19378830951
term="term",docFreq=1,totalTermFreq=1
norm=59 (doc length ~ 112)
freq=1.0
NOTE: reproduce with: ant test -Dtestcase=TestBasicModelG -Dtests.method=testRandomScoring -Dtests.seed=3C22B051C61EEC84 -Dtests.locale=cs-CZ -Dtests.timezone=Atlantic/Madeira -Dtests.asserts=true -Dtests.file.encoding=UTF-8
In short, the scoring formula here looks like (A + B * tfn) * (C / (tfn + 1))
where A, B and C are constants. This function increases when tfn increases when B > A, which is always the case. The problem is that tfn is so large (ulp(tfn) = 4) , that tfn+1
always returns tfn
and A + B * tfn
always returns the same as B * tfn
. So when tfn gets high, the formula is effectively (B * tfn) * (C / tfn)
. This is a constant, but since we compute the left and right parts independently, this might decrease when tfn increases about half the time.
Even though I triggered it with BasicModelG, I suspect it affects almost all DFRSimilarity impls.
[Legacy Jira: Adrien Grand (@jpountz) on Dec 04 2017]
from stargazers-migration-test.
thanks for the analysis! I still don't even really want to commit the "floor" modifications for In and Ine because i dont like it: really a scoring formula should be able to return a tiny tiny value for a stopword, that should be ok. It shouldnt have to be a number between 1 and 43 or whatever to work with lucene.
For model IF its justifiable, just like its justifiable in the BM25 case, because the formula is fundamentally broken you know, i mean we dont want negative scores for stopwords.
But your analysis suggests maybe we can look at a more surgical fix, like a nextUp/nextDown somewhere?
[Legacy Jira: Robert Muir (@rmuir) on Dec 04 2017]
from stargazers-migration-test.
I don't think we can fix this with a nextUp/nextDown? One way we could fix it for sure would be by implementing the basic model and the after effect in a single method. For instance (A + B * tfn) * (C / (tfn + 1))
could be rewritten as (A - B + B * (1 + tfn))) * C / (tfn + 1) = (A - B) * C / (tfn + 1) + B * C
. Since there is only one occurrence of tfn in the latter, it would be guaranteed to be non-decreasing when tfn increases. Fixing it in the general case looks challenging however?
Maybe one reasonable way to avoid this issue would be to bound the values that tfn may take? This isn't nice but it wouldn't affect the general case, only when freq, avgdl, or some other stats have extreme values?
[Legacy Jira: Adrien Grand (@jpountz) on Dec 04 2017]
from stargazers-migration-test.
Given that this bug is not easy to reproduce due to how we finally cast double scores to floats, which often returns the same value for consecutive values of freq, I tried to hack the test framework to compare the produced doubles (see attached patch - note this is for testing purpose only, I don't plan/want to merge it). My assumption is that if we can reproduce the issue with doubles, it means it can happen after a float cast as well, the scorer just needs to produce a value that is close enough from the boundary between two floats so that both values would round to different floats. And indeed tests fail systematically with this patch. The bad news is that I can't think of a way to fix the formula. Even if I put quite severe restrictions on the values that tfn may take, there are still some special freq values that manage to prove the score is not monotonic. Good news is that it doesn't make some other SimilarityBase impls fail like the axiomatic ones.
[Legacy Jira: Adrien Grand (@jpountz) on Dec 05 2017]
from stargazers-migration-test.
I think our best option is to specialize some combinations. We should be able to specialize basic models G, IF, I👎 and I(ne) with after effects B, L and NoAfterEffect and make them pass tests. For instance, I tested out this specialization of model G and after effect L to make sure it actually passes the tests:
/** BasicModel G + AfterEffect L */
public class DFRSimilarityGL extends SimilarityBase {
private final Normalization normalization;
public DFRSimilarityGL(Normalization normalization) {
this.normalization = Objects.requireNonNull(normalization);
}
`@Override`
protected double score(BasicStats stats, double freq, double docLen) {
double tfn = normalization.tfn(stats, freq, docLen);
// approximation only holds true when F << N, so we use lambda = F / (N + F)
double F = stats.getTotalTermFreq() + 1;
double N = stats.getNumberOfDocuments();
double lambda = F / (N + F);
// -log(1 / (lambda + 1)) -> log(lambda + 1)
double A = log2(lambda + 1);
double B = log2((1 + lambda) / lambda);
// basic model G uses (A + B * tfn)
// after effect L takes the result and divides it by (1 + tfn)
// so in the end we have (A + B * tfn) / (1 + tfn)
// which we change to B - (B - A) / (1 + tfn) to reduce floating-point accuracy issues
// (since tfn appears only once it is guaranteed to be non decreasing with tfn
return B - (B - A) / (1 + tfn);
}
`@Override`
public String toString() {
return "DFR GL" + normalization.toString();
}
}
[Legacy Jira: Adrien Grand (@jpountz) on Dec 05 2017]
from stargazers-migration-test.
Adrien it should reproduce every time with the test changes i made on this issue? Its just a bug in the test that it doesn't explicitly test the extremes but instead relies on randomness.
[Legacy Jira: Robert Muir (@rmuir) on Dec 05 2017]
from stargazers-migration-test.
I think I like the proposed solution. Lets drop NoAfterEffect though, i'm not sure its even theoretical: I don't see it in the DFR paper (http://theses.gla.ac.uk/1570/1/2003amatiphd.pdf). That would yield 8 solid combinations which seems manageable. There are also some "+1"'s that maybe are no longer necessary (I don't know if it makes this task easier, just mentioning it: LUCENE-8023)
[Legacy Jira: Robert Muir (@rmuir) on Dec 05 2017]
from stargazers-migration-test.
Adrien it should reproduce every time with the test changes i made on this issue?
It doesn't because the fact we always compute scores as doubles then cast to a float hides the issue: even if score the score of Math.nextDown(freq) is more than the score of freq, the float cast rounds both values to the same float almost all the time.
[Legacy Jira: Adrien Grand (@jpountz) on Dec 05 2017]
from stargazers-migration-test.
If we are fine with removing support for NoAfterEffect
, maybe we could change the contract of AfterEffect to return the product of the after effect with (1+tfn), which has the nice property of not depending on tfn for both B and L. It makes the internal API less nice but has the benefit of keeping the basic model and after effect plugged in separately, similarly to the Terrier API.
The attached patch implements this idea and removes after effects BE, D and P, which I couldn't fix to produce scores that do not decrease when tfn increases. Tests pass for all combinations of the DFRSimilarity with this patch.
[Legacy Jira: Adrien Grand (@jpountz) on Dec 06 2017]
from stargazers-migration-test.
Took a glance, I am good with this approach, thank you! I would like to combine your patch with my test patch (attached to this issue) though, because it makes the test much better for all sims not just this particular case by exercising the extremes explicitly.
[Legacy Jira: Robert Muir (@rmuir) on Dec 06 2017]
from stargazers-migration-test.
Commit 63b63c573487fe6b054afb6073c057a88a15288f in lucene-solr's branch refs/heads/master from @jpountz
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=63b63c5
LUCENE-8015: Fixed DFR similarities' scores to not decrease when tfn increases.
[Legacy Jira: ASF subversion and git services on Dec 06 2017]
from stargazers-migration-test.
Done, I combined both patches and beasting didn't find any failures so I merged. Thank you!
[Legacy Jira: Adrien Grand (@jpountz) on Dec 06 2017]
from stargazers-migration-test.
Related Issues (20)
- Update javadocs to reflect experimental status of Kuromoji DictionaryBuilder [LUCENE-8981] HOT 3
- Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] HOT 31
- PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase [LUCENE-8983] HOT 13
- MoreLikeThis MLT is biased for uncommon fields [LUCENE-8984] HOT 11
- SynonymGraphFilter cannot handle input stream with tokens filtered. [LUCENE-8985] HOT 12
- Add asf.yaml to our git repo [LUCENE-8986] HOT 7
- Move Lucene web site from svn to git [LUCENE-8987] HOT 58
- Maximal -- Minimum Based Early Termination For TopFieldCollector [LUCENE-8988]
- IndexSearcher Should Handle Rejection of Concurrent Task [LUCENE-8989] HOT 10
- IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document [LUCENE-8990] HOT 8
- disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 [LUCENE-8991] HOT 13
- Share minimum score across segments in concurrent search [LUCENE-8992] HOT 7
- Change Maven POM repository URLs to https [LUCENE-8993] HOT 15
- Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). [LUCENE-8994] HOT 5
- TopSuggestDocsCollector#collect should be able to signal rejection [LUCENE-8995] HOT 1
- maxScore is sometimes missing from distributed grouped responses [LUCENE-8996] HOT 45
- Add type of triangle info to ShapeField encoding [LUCENE-8997] HOT 4
- OverviewImplTest.testIsOptimized reproducible failure [LUCENE-8998] HOT 5
- expectThrows doesn't play nicely with "assume" failures [LUCENE-8999] HOT 12
- Cannot resolve classes from org.apache.lucene.core plugin and others [LUCENE-9000] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stargazers-migration-test.