mikemccand / luceneutil Goto Github PK
View Code? Open in Web Editor NEWVarious utility scripts for running Lucene performance tests
License: Apache License 2.0
Various utility scripts for running Lucene performance tests
License: Apache License 2.0
When we (Amazon product search) upgraded to Lucene 8.5.1, which includes newly added block compression for BINARY doc values, we saw a sizable (~30%) reduction in our red-line QPS (throughput). This might be unique to how we are using BINARY doc values. Other applications would likely not be so heavily affected.
However, it looks like Lucene's nightly benchmarks failed to even detect this performance impact.
There were benchmarks run as the change was being developed, but for some reason they did not seem to show any impact at all, which is surprising.
Let's add something to luceneutil
to better test BINARY doc values query-time performance.
While setting up the benchmarking for apache/lucene#177 (and previous ones), I noticed that the benchamrking tool uses some strange default to find the Java executable, but on the other hand uses "gradlew" to build,. This leads to inconsistencies.
As Gradle does not work with recent Java versions, we have 2 environment variables, which are used by Gradle:
I propose to change the defaults to work like this:
Gradle also uses TEST_JVM_ARGS to get additional parameters when executing Java. If that would be added to the beginning of the java command line, it would be good. I use this to pass the custom modules to jvm command line, otherwise new code won't run: --add-modules jdk.incubator.foreign
I worked around this in my localconfig.py, but this should really be part of the defaults: RUNTIME_JAVA_HOME, JAVA_HOME, TEST_JVM_ARGS.
This is my current workaround. My python knowledge is limited, so I can't provide PR with all features mentioned above:
RUNTIME_JAVA_HOME = os.environ.get('RUNTIME_JAVA_HOME')
JAVA_EXE = '%s/bin/java' % RUNTIME_JAVA_HOME
JAVAC_EXE = '%s/bin/javac' % RUNTIME_JAVA_HOME
#JAVA_COMMAND = '%s --add-modules jdk.incubator.foreign -server -Xms2g -Xmx2g -XX:-TieredCompilation -XX:+HeapDumpOnOutOfMemoryError -Xbatch' % JAVA
JAVA_COMMAND = '%s --add-modules jdk.incubator.foreign -server -Xms2g -Xmx2g -XX:+HeapDumpOnOutOfMemoryError' % JAVA_EXE
The PKLookupTask appears to be an unsound benchmark. It chooses some number of IDs purely at random. Much of the time, the IDs will be purely before or after the actual IDs in the indexed field, and will not likely match anything at all. This is not realistic!
Furthermore, there is some bug that insists PKLookup is run even when the Competition is configured with task patterns that match other tasks. The expectation is that only the tasks matching the configured patters is run.
A change in the API for LatLonShape has broken the GeoBenchmarks.
I ran six re-indexing benchmarks using wikimediumall
, and each time, got a different total number of documents indexed:
[mike@beast3 trunk]$ grep "indexing done" /l/logs/trunk?.txt
/l/logs/trunk1.txt:Indexer: indexing done (89114 msec); total 27624170 docs
/l/logs/trunk2.txt:Indexer: indexing done (89974 msec); total 27624192 docs
/l/logs/trunk3.txt:Indexer: indexing done (90614 msec); total 27624409 docs
[mike@beast3 trunk]$ grep "indexing done" /l/logs/base?.log
/l/logs/base1.log:Indexer: indexing done (89271 msec); total 27623915 docs
/l/logs/base2.log:Indexer: indexing done (91676 msec); total 27624107 docs
/l/logs/base3.log:Indexer: indexing done (93120 msec); total 27624268 docs
Why is that :) It should be exactly the same document count every time!
While implementing the change in PR apache/lucene-solr#1733 we saw gains in browse throughput and possibly some gains in merge time. While we were able to measure the throughput changes, we did not have a way to benchmark merge time
Unless tens/hundreds of thousands of searches are run by the benchmark, P99.9 should not be reported.
In https://issues.apache.org/jira/browse/LUCENE-9674 we are (we think) accelerating retrieval of vector values using their (DISI) iterator API, but we don't have good benchmarks for it. And it turns out that the thing we implemented a while back is broken. This issue is for fixing that:
In SearchTask we added vector retrieval for tasks https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is using the docid to retrieve from the ordinal API.
I guess we might also want a separate issue to enable such retrieval in some tasks for nightly benchmarks to track, but as things stand today is is only really straightforward to enable for all tasks, which doesn't seem ideal, so let's track that separately.
@msokolov had a great idea to make it easier to understand WTFs that suddenly appear in our nightly charts, e.g. https://home.apache.org/~mikemccand/lucenebench/2020.12.21.00.03.10.html
Where we currently show the git hash for that night's run, we could make a it a link and link to a change set of all commits since the previous data point.
Main part of my localrun.py
index1= comp.newIndex("lucene_baseline", sourceData,
analyzer='StandardAnalyzerNoStopWords',
postingsFormat='Lucene90',
idFieldPostingsFormat='Lucene90',
mergePolicy='TieredMergePolicy',
facets = (('taxonomy:Date', 'Date'),
('taxonomy:Month', 'Month'),
('taxonomy:DayOfYear', 'DayOfYear'),
('sortedset:Month', 'Month'),
('sortedset:DayOfYear', 'DayOfYear')),
useCMS=True,
numThreads=4,
maxConcurrentMerges=12,
rearrange=555,
addDVFields=True)
index2 = comp.newIndex("lucene_candidate", sourceData,
analyzer='StandardAnalyzerNoStopWords',
postingsFormat='Lucene90',
idFieldPostingsFormat='Lucene90',
mergePolicy='TieredMergePolicy',
facets = (('taxonomy:Date', 'Date'),
('taxonomy:Month', 'Month'),
('taxonomy:DayOfYear', 'DayOfYear'),
('sortedset:Month', 'Month'),
('sortedset:DayOfYear', 'DayOfYear')),
useCMS=True,
numThreads=4,
maxConcurrentMerges=12,
rearrange=555,
addDVFields=True)
comp.competitor('baseline', 'lucene_baseline',
index = index1, concurrentSearches = False)
comp.competitor('my_modified_version', 'lucene_candidate',
index = index2, concurrentSearches = False)
And the lucene codes are identical (since they're just symlinks)
But I'm getting errors below when running python src/python/localrun.py -source wikimedium10k
WARNING: cat=OrHighHigh: hit counts differ: 1387+ vs 1379+
Traceback (most recent call last):
File "src/python/localrun.py", line 95, in <module>
comp.benchmark("baseline_vs_patch")
File "/Users/haoyzhai/Documents/lucene-home/luceneutil/src/python/competition.py", line 457, in benchmark
searchBench.run(id, base, challenger,
File "/Users/haoyzhai/Documents/lucene-home/luceneutil/src/python/searchBench.py", line 196, in run
raise RuntimeError('errors occurred: %s' % str(cmpDiffs))
RuntimeError: errors occurred: ([], ["query=body:act filter=None sort=None groupField=None hitCount=136: hit 1 has wrong id/s ([909, 1226], '3.3034647') vs ([909, 1227], '3.3034647')", 'query=body:"jpg thumb"~4 filter=None sort=None groupField=None hitCount=299: hit 4 has wrong id/s ([1095, 3859, 4759], \'3.3647363\') vs ([1093, 3859, 4759], \'3.3647363\')', "query=+body:can +body:companies filter=None sort=None groupField=None hitCount=6: hit 0 has wrong id/s ([2412], '3.627793') vs ([2417], '3.627793')", 'query=body:"1 ref"~4 filter=None sort=None groupField=None hitCount=377: hit 5 has wrong id/s ([2513, 7085], \'1.2669312\') vs ([2513, 7083], \'1.2669312\')', "query=body:1890s~1 filter=None sort=None groupField=None hitCount=203: hit 4 has wrong id/s ([1203, 4008, 4952], '3.5986476') vs ([1203, 4007, 4950], '3.5986476')", 'query=body:"needed date" filter=None sort=None groupField=None hitCount=317: hit 1 has wrong id/s ([9297], \'3.883874\') vs ([9299], \'3.883874\')', "query=body:MAXWIDTH/10(ORDERED(2011,ref)) filter=None sort=None groupField=None hitCount=624: hit 0 has wrong id/s ([5867], '0.8888889') vs ([5868], '0.8888889')", "query=+body:after +body:may filter=None sort=None groupField=None hitCount=139: hit 6 has wrong id/s ([8672], '2.43158') vs ([8673], '2.43158')", "query=body:2009 body:listened filter=None sort=None groupField=None hitCount=1014+: hit 4 has wrong id/s ([1572], '1.9808738') vs ([1570], '1.9808738')", 'query=body:year body:since filter=None sort=None groupField=None hitCount=1387+: wrong hitCount: 1387+ vs 1379+', 'query=body:"make up"~4 filter=None sort=None groupField=None hitCount=37: hit 1 has wrong id/s ([3421, 9840], \'2.9576194\') vs ([3421, 9842], \'2.9576194\')', "query=spanNear([body:book_result, body:ct], 10, true) filter=None sort=None groupField=None hitCount=39: hit 0 has wrong id/s ([8729], '7.6768456') vs ([8730], '7.6768456')", "query=body:bad filter=None sort=None groupField=None hitCount=48: hit 0 has wrong id/s ([9342], '4.06048') vs ([9344], '4.06048')", "query=body:2006 filter=None sort=DayOfYear groupField=None hitCount=769: hit 0 has wrong id/s ([9812], '52') vs ([9814], '52')", 'query=body:"left thumb" filter=None sort=None groupField=None hitCount=35: hit 0 has wrong id/s ([3636], \'3.0522845\') vs ([3637], \'3.0522845\')', "query=spanNear([body:soon, body:after], 10, true) filter=None sort=None groupField=None hitCount=22: hit 0 has wrong id/s ([8275], '3.235054') vs ([8274], '3.235054')", "query=body:8 body:different filter=None sort=None groupField=None hitCount=1022+: hit 6 has wrong id/s ([5364, 8200], '2.7650805') vs ([5318, 8199], '2.7650805')", "query=body:marek~2 filter=None sort=None groupField=None hitCount=1004+: hit 3 has wrong id/s ([8037], '2.1000693') vs ([8038], '2.1000693')", 'query=body:"per year" filter=None sort=None groupField=None hitCount=40: hit 1 has wrong id/s ([3736], \'3.3056784\') vs ([3737], \'3.3056784\')', "query=spanNear([body:new, body:york], 10, true) filter=None sort=None groupField=None hitCount=574: hit 0 has wrong id/s ([1214], '3.8572712') vs ([1215], '3.8572712')"], 1.0)
Notice that most of the errors have id off by 1, for example: "query=spanNear([body:new, body:york], 10, true) filter=None sort=None groupField=None hitCount=574: hit 0 has wrong id/s ([1214], '3.8572712') vs ([1215], '3.8572712')"]
I tried to debug a bit by making rearrange only use 1 thread (so that the segment order is fixed as well) and printing out stats of id field including: min, max, number of unique ids, and did not find anything abnormal. In brief, the segment doc nums are 1810, 1800 * 4, 180 * 5, 18 * 5
and each segment is containing the correct ids.
While working on apache/lucene#177, I had to fight against bad defaults for Java's command line parameters.
Please don't set any default command line options like -Xbatch
or disable tiered compilation with -XX:-TieredCompilation
. Both options behave very badly for long running processes like those testing real-world benchmarking. In fact without tiered compilation, the JVM takes like 6 seconds (out of 50 seconds total runtime) until it was fully optimized, because the JVM wasn't able to do it in parallel (due to batch) and only had one chance to optimize at all (non-tiered). During that time the new Panama Java API produced so many garbage instances on heap that 15 seconds following the 6 seconds, the garbage collector was driving crazy to cleanup.
In addition, using batch and no tiered compilation is not reflecting reality. Nobody would run JDK like this, except for very short-running processes that need no optimizations (like unit test cases).
In addition with tiered compilation the benchmark runs 20% faster (except the GC problems introduced by project panama). Running a single benchmark round on main branch with MappedByteBuffer directory ran 57 seconds with tiered and batch, but took <40 seconds on all java defaults. WIth the GC problems mentioned before it was 75 seconds total.
There're some ancient files, such as pom.xml, is misleading and not used anymore. We should clean those up.
I got surprising numbers on https://issues.apache.org/jira/browse/LUCENE-7239: -polyMedium
becomes super fast but the -poly N
is ... meh.
I did some rough profiling: over a third of cpu time is spent generating the polygons-to-be-benchmarked. I will see if i can pull this out.
Hi there,
Minor nit but since people.apache.org moved to home.apache.org the test datasets do not download.
e.g. http://people.apache.org/~mikemccand/enwiki-20120502-lines-1k.txt.lzma
Can we put them up on home.apache.org?
Since switching to beast3
, occasionally a java process will drop to 0% CPU and not finish. When I look with jstack
threads are reported as RUNNABLE
, yet, looking again 30 seconds later, the thread stacks are all the same:
[mike@beast3 trunk]$ jstack 372752
2020-11-21 19:45:10
Full thread dump OpenJDK 64-Bit Server VM (12.0.2+10 mixed mode, sharing):
Threads class SMR info:
_java_thread_list=0x00007f0cf4000c30, length=17, elements={
0x00007f0d48011800, 0x00007f0d4809e000, 0x00007f0d480a0000, 0x00007f0d480a6000,
0x00007f0d480a8000, 0x00007f0d480aa000, 0x00007f0d480ac000, 0x00007f0d480e8000,
0x00007f0d480f1800, 0x00007f0d482bb000, 0x00007f0d48469000, 0x00007f0d4843a000,
0x00007f0d4846b000, 0x00007f0d4846f000, 0x00007f0d48470800, 0x00007f0d48472800,
0x00007f0cf4001000
}
"main" #1 prio=5 os_prio=0 cpu=405.22ms elapsed=152342.45s tid=0x00007f0d48011800 nid=0x5b011 waiting on condition [0x00007f0d4df05000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000007625fa0a0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly([email protected]/AbstractQueuedSynchronizer.java:1039)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly([email protected]/AbstractQueuedSynchronizer.java:1345)
at java.util.concurrent.CountDownLatch.await([email protected]/CountDownLatch.java:232)
at perf.TaskThreads.finish(TaskThreads.java:46)
at perf.SearchPerfTest._main(SearchPerfTest.java:543)
at perf.SearchPerfTest.main(SearchPerfTest.java:132)
"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.36ms elapsed=152342.43s tid=0x00007f0d4809e000 nid=0x5b014 waiting on condition [0x00007f0d0a076000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList([email protected]/Native Method)
at java.lang.ref.Reference.processPendingReferences([email protected]/Reference.java:241)
at java.lang.ref.Reference$ReferenceHandler.run([email protected]/Reference.java:213)
"Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.25ms elapsed=152342.43s tid=0x00007f0d480a0000 nid=0x5b015 in Object.wait() [0x00007f0d09f75000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <0x0000000600019088> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
- locked <0x0000000600019088> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:176)
at java.lang.ref.Finalizer$FinalizerThread.run([email protected]/Finalizer.java:170)
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.37ms elapsed=152342.43s tid=0x00007f0d480a6000 nid=0x5b016 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 cpu=835.64ms elapsed=152342.43s tid=0x00007f0d480a8000 nid=0x5b017 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"C1 CompilerThread0" #19 daemon prio=9 os_prio=0 cpu=798.64ms elapsed=152342.43s tid=0x00007f0d480aa000 nid=0x5b018 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"Sweeper thread" #26 daemon prio=9 os_prio=0 cpu=6.75ms elapsed=152342.43s tid=0x00007f0d480ac000 nid=0x5b019 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" #27 daemon prio=9 os_prio=0 cpu=0.58ms elapsed=152342.40s tid=0x00007f0d480e8000 nid=0x5b01b runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Common-Cleaner" #28 daemon prio=8 os_prio=0 cpu=94.60ms elapsed=152342.40s tid=0x00007f0d480f1800 nid=0x5b01d in Object.wait() [0x00007f0d09485000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <no object reference available>
at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
- locked <0x00000006000115a8> (a java.lang.ref.ReferenceQueue$Lock)
at jdk.internal.ref.CleanerImpl.run([email protected]/CleanerImpl.java:148)
at java.lang.Thread.run([email protected]/Thread.java:835)
at jdk.internal.misc.InnocuousThread.run([email protected]/InnocuousThread.java:134)
"pool-1-thread-1" #29 prio=5 os_prio=0 cpu=1.90ms elapsed=152342.27s tid=0x00007f0d482bb000 nid=0x5b072 waiting on condition [0x00007f0baf5eb000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000007557b9b18> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:1170)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:899)
at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at java.lang.Thread.run([email protected]/Thread.java:835)
"Thread-0" #31 prio=5 os_prio=0 cpu=7.77ms elapsed=152342.01s tid=0x00007f0d48469000 nid=0x5b080 in Object.wait() [0x00007f08d740f000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.index.TermsEnum.<clinit>(TermsEnum.java:191)
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:171)
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:313)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:756)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:527)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:504)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:515)
at perf.SearchTask.go(SearchTask.java:262)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
"Thread-1" #32 prio=5 os_prio=0 cpu=6.02ms elapsed=152342.01s tid=0x00007f0d4843a000 nid=0x5b081 in Object.wait() [0x00007f08d730f000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.FieldReader.iterator(FieldReader.java:148)
at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:123)
at org.apache.lucene.index.TermStates.get(TermStates.java:187)
at org.apache.lucene.search.TermQuery$TermWeight.getTermsEnum(TermQuery.java:125)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:100)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:169)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:738)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:638)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:559)
at perf.SearchTask.go(SearchTask.java:267)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
"Thread-2" #33 prio=5 os_prio=0 cpu=3.70ms elapsed=152342.01s tid=0x00007f0d4846b000 nid=0x5b082 in Object.wait() [0x00007f08d720e000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:180)
at org.apache.lucene.util.automaton.CompiledAutomaton.getTermsEnum(CompiledAutomaton.java:343)
at org.apache.lucene.search.AutomatonQuery.getTermsEnum(AutomatonQuery.java:117)
at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:303)
at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:141)
at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.bulkScorer(MultiTermQueryConstantScoreWrapper.java:193)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:738)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:504)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:515)
at perf.SearchTask.go(SearchTask.java:262)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
"Thread-3" #34 prio=5 os_prio=0 cpu=2.77ms elapsed=152342.01s tid=0x00007f0d4846f000 nid=0x5b083 in Object.wait() [0x00007f08d710d000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:180)
at org.apache.lucene.util.automaton.CompiledAutomaton.getTermsEnum(CompiledAutomaton.java:343)
at org.apache.lucene.search.AutomatonQuery.getTermsEnum(AutomatonQuery.java:117)
at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:303)
at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:141)
at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.bulkScorer(MultiTermQueryConstantScoreWrapper.java:193)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:738)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:504)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:515)
at perf.SearchTask.go(SearchTask.java:262)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
"Thread-4" #35 prio=5 os_prio=0 cpu=5.02ms elapsed=152342.01s tid=0x00007f0d48470800 nid=0x5b084 in Object.wait() [0x00007f08d700c000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.FieldReader.iterator(FieldReader.java:148)
at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:123)
at org.apache.lucene.index.TermStates.build(TermStates.java:109)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:190)
at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:849)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
at org.apache.lucene.facet.FacetsCollector.doSearch(FacetsCollector.java:238)
at org.apache.lucene.facet.FacetsCollector.search(FacetsCollector.java:160)
at perf.SearchTask.go(SearchTask.java:222)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
"Thread-5" #36 prio=5 os_prio=0 cpu=6.74ms elapsed=152342.01s tid=0x00007f0d48472800 nid=0x5b085 in Object.wait() [0x00007f08d6f0b000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.blocktree.FieldReader.iterator(FieldReader.java:148)
at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:123)
at org.apache.lucene.index.TermStates.get(TermStates.java:187)
at org.apache.lucene.search.TermQuery$TermWeight.getTermsEnum(TermQuery.java:125)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:100)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:169)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:738)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:638)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:559)
at perf.SearchTask.go(SearchTask.java:267)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
"Attach Listener" #37 daemon prio=9 os_prio=0 cpu=0.39ms elapsed=123608.38s tid=0x00007f0cf4001000 nid=0x5b0cd waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"VM Thread" os_prio=0 cpu=3113.34ms elapsed=152342.43s tid=0x00007f0d4809a800 nid=0x5b013 runnable
"ParGC Thread#0" os_prio=0 cpu=0.26ms elapsed=152342.44s tid=0x00007f0d48039000 nid=0x5b012 runnable
"ParGC Thread#1" os_prio=0 cpu=1.16ms elapsed=152342.35s tid=0x00007f0d04001000 nid=0x5b01f runnable
"ParGC Thread#2" os_prio=0 cpu=1.10ms elapsed=152342.35s tid=0x00007f0d04002800 nid=0x5b020 runnable
"ParGC Thread#3" os_prio=0 cpu=1.08ms elapsed=152342.35s tid=0x00007f0d04004800 nid=0x5b021 runnable
"ParGC Thread#4" os_prio=0 cpu=1.93ms elapsed=152342.35s tid=0x00007f0d04006800 nid=0x5b022 runnable
"ParGC Thread#5" os_prio=0 cpu=0.98ms elapsed=152342.35s tid=0x00007f0d04008000 nid=0x5b023 runnable
"ParGC Thread#6" os_prio=0 cpu=1.89ms elapsed=152342.35s tid=0x00007f0d0400a000 nid=0x5b024 runnable
"ParGC Thread#7" os_prio=0 cpu=2.84ms elapsed=152342.35s tid=0x00007f0d0400b800 nid=0x5b025 runnable
"ParGC Thread#8" os_prio=0 cpu=0.32ms elapsed=152342.35s tid=0x00007f0d0400d800 nid=0x5b026 runnable
"ParGC Thread#9" os_prio=0 cpu=0.94ms elapsed=152342.35s tid=0x00007f0d0400f000 nid=0x5b027 runnable
"ParGC Thread#10" os_prio=0 cpu=0.35ms elapsed=152342.35s tid=0x00007f0d04011000 nid=0x5b028 runnable
"ParGC Thread#11" os_prio=0 cpu=0.30ms elapsed=152342.35s tid=0x00007f0d04012800 nid=0x5b029 runnable
"ParGC Thread#12" os_prio=0 cpu=0.35ms elapsed=152342.35s tid=0x00007f0d04014800 nid=0x5b02a runnable
"ParGC Thread#13" os_prio=0 cpu=0.25ms elapsed=152342.35s tid=0x00007f0d04016800 nid=0x5b02b runnable
"ParGC Thread#14" os_prio=0 cpu=0.30ms elapsed=152342.35s tid=0x00007f0d04018000 nid=0x5b02c runnable
"ParGC Thread#15" os_prio=0 cpu=0.39ms elapsed=152342.35s tid=0x00007f0d0401a000 nid=0x5b02d runnable
"ParGC Thread#16" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d0401b800 nid=0x5b02e runnable
"ParGC Thread#17" os_prio=0 cpu=0.31ms elapsed=152342.35s tid=0x00007f0d0401d000 nid=0x5b02f runnable
"ParGC Thread#18" os_prio=0 cpu=0.29ms elapsed=152342.35s tid=0x00007f0d0401e800 nid=0x5b030 runnable
"ParGC Thread#19" os_prio=0 cpu=0.26ms elapsed=152342.35s tid=0x00007f0d04020800 nid=0x5b031 runnable
"ParGC Thread#20" os_prio=0 cpu=0.35ms elapsed=152342.35s tid=0x00007f0d04022800 nid=0x5b032 runnable
"ParGC Thread#21" os_prio=0 cpu=0.26ms elapsed=152342.35s tid=0x00007f0d04024000 nid=0x5b033 runnable
"ParGC Thread#22" os_prio=0 cpu=0.27ms elapsed=152342.35s tid=0x00007f0d04026000 nid=0x5b034 runnable
"ParGC Thread#23" os_prio=0 cpu=0.26ms elapsed=152342.35s tid=0x00007f0d04027800 nid=0x5b035 runnable
"ParGC Thread#24" os_prio=0 cpu=0.31ms elapsed=152342.35s tid=0x00007f0d04029800 nid=0x5b036 runnable
"ParGC Thread#25" os_prio=0 cpu=0.26ms elapsed=152342.35s tid=0x00007f0d0402b000 nid=0x5b037 runnable
"ParGC Thread#26" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d0402d000 nid=0x5b038 runnable
"ParGC Thread#27" os_prio=0 cpu=0.24ms elapsed=152342.35s tid=0x00007f0d0402f000 nid=0x5b039 runnable
"ParGC Thread#28" os_prio=0 cpu=0.24ms elapsed=152342.35s tid=0x00007f0d04030800 nid=0x5b03a runnable
"ParGC Thread#29" os_prio=0 cpu=1.51ms elapsed=152342.35s tid=0x00007f0d04032800 nid=0x5b03b runnable
"ParGC Thread#30" os_prio=0 cpu=0.29ms elapsed=152342.35s tid=0x00007f0d04034000 nid=0x5b03c runnable
"ParGC Thread#31" os_prio=0 cpu=0.23ms elapsed=152342.35s tid=0x00007f0d04036000 nid=0x5b03d runnable
"ParGC Thread#32" os_prio=0 cpu=1.25ms elapsed=152342.35s tid=0x00007f0d04037800 nid=0x5b03e runnable
"ParGC Thread#33" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d04039800 nid=0x5b03f runnable
"ParGC Thread#34" os_prio=0 cpu=2.23ms elapsed=152342.35s tid=0x00007f0d0403b800 nid=0x5b040 runnable
"ParGC Thread#35" os_prio=0 cpu=1.06ms elapsed=152342.35s tid=0x00007f0d0403d000 nid=0x5b041 runnable
"ParGC Thread#36" os_prio=0 cpu=0.31ms elapsed=152342.35s tid=0x00007f0d0403f000 nid=0x5b042 runnable
"ParGC Thread#37" os_prio=0 cpu=0.89ms elapsed=152342.35s tid=0x00007f0d04040800 nid=0x5b043 runnable
"ParGC Thread#38" os_prio=0 cpu=1.02ms elapsed=152342.35s tid=0x00007f0d04042800 nid=0x5b044 runnable
"ParGC Thread#39" os_prio=0 cpu=0.98ms elapsed=152342.35s tid=0x00007f0d04044000 nid=0x5b045 runnable
"ParGC Thread#40" os_prio=0 cpu=0.20ms elapsed=152342.35s tid=0x00007f0d04046000 nid=0x5b046 runnable
"ParGC Thread#41" os_prio=0 cpu=0.27ms elapsed=152342.35s tid=0x00007f0d04048000 nid=0x5b047 runnable
"ParGC Thread#42" os_prio=0 cpu=0.30ms elapsed=152342.35s tid=0x00007f0d04049800 nid=0x5b048 runnable
"ParGC Thread#43" os_prio=0 cpu=1.25ms elapsed=152342.35s tid=0x00007f0d0404b800 nid=0x5b049 runnable
"ParGC Thread#44" os_prio=0 cpu=1.16ms elapsed=152342.35s tid=0x00007f0d0404d000 nid=0x5b04a runnable
"ParGC Thread#45" os_prio=0 cpu=1.10ms elapsed=152342.35s tid=0x00007f0d0404f000 nid=0x5b04b runnable
"ParGC Thread#46" os_prio=0 cpu=0.32ms elapsed=152342.35s tid=0x00007f0d04050800 nid=0x5b04c runnable
"ParGC Thread#47" os_prio=0 cpu=2.16ms elapsed=152342.35s tid=0x00007f0d04052800 nid=0x5b04d runnable
"ParGC Thread#48" os_prio=0 cpu=0.27ms elapsed=152342.35s tid=0x00007f0d04054800 nid=0x5b04e runnable
"ParGC Thread#49" os_prio=0 cpu=0.30ms elapsed=152342.35s tid=0x00007f0d04056000 nid=0x5b04f runnable
"ParGC Thread#50" os_prio=0 cpu=1.22ms elapsed=152342.35s tid=0x00007f0d04058000 nid=0x5b050 runnable
"ParGC Thread#51" os_prio=0 cpu=0.24ms elapsed=152342.35s tid=0x00007f0d04059800 nid=0x5b051 runnable
"ParGC Thread#52" os_prio=0 cpu=0.34ms elapsed=152342.35s tid=0x00007f0d0405b800 nid=0x5b052 runnable
"ParGC Thread#53" os_prio=0 cpu=0.28ms elapsed=152342.35s tid=0x00007f0d0405d000 nid=0x5b053 runnable
"ParGC Thread#54" os_prio=0 cpu=0.17ms elapsed=152342.35s tid=0x00007f0d0405f000 nid=0x5b054 runnable
"ParGC Thread#55" os_prio=0 cpu=0.28ms elapsed=152342.35s tid=0x00007f0d04061000 nid=0x5b055 runnable
"ParGC Thread#56" os_prio=0 cpu=0.27ms elapsed=152342.35s tid=0x00007f0d04062800 nid=0x5b056 runnable
"ParGC Thread#57" os_prio=0 cpu=0.98ms elapsed=152342.35s tid=0x00007f0d04064800 nid=0x5b057 runnable
"ParGC Thread#58" os_prio=0 cpu=0.81ms elapsed=152342.35s tid=0x00007f0d04066000 nid=0x5b058 runnable
"ParGC Thread#59" os_prio=0 cpu=0.82ms elapsed=152342.35s tid=0x00007f0d04068000 nid=0x5b059 runnable
"ParGC Thread#60" os_prio=0 cpu=0.32ms elapsed=152342.35s tid=0x00007f0d04069800 nid=0x5b05a runnable
"ParGC Thread#61" os_prio=0 cpu=0.24ms elapsed=152342.35s tid=0x00007f0d0406b800 nid=0x5b05b runnable
"ParGC Thread#62" os_prio=0 cpu=0.16ms elapsed=152342.35s tid=0x00007f0d0406d800 nid=0x5b05c runnable
"ParGC Thread#63" os_prio=0 cpu=1.02ms elapsed=152342.35s tid=0x00007f0d0406f000 nid=0x5b05d runnable
"ParGC Thread#64" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d04071000 nid=0x5b05e runnable
"ParGC Thread#65" os_prio=0 cpu=0.20ms elapsed=152342.35s tid=0x00007f0d04072800 nid=0x5b05f runnable
"ParGC Thread#66" os_prio=0 cpu=0.17ms elapsed=152342.35s tid=0x00007f0d04074800 nid=0x5b060 runnable
"ParGC Thread#67" os_prio=0 cpu=0.12ms elapsed=152342.35s tid=0x00007f0d04076000 nid=0x5b061 runnable
"ParGC Thread#68" os_prio=0 cpu=0.78ms elapsed=152342.35s tid=0x00007f0d04078000 nid=0x5b062 runnable
"ParGC Thread#69" os_prio=0 cpu=0.15ms elapsed=152342.35s tid=0x00007f0d0407a000 nid=0x5b063 runnable
"ParGC Thread#70" os_prio=0 cpu=0.34ms elapsed=152342.35s tid=0x00007f0d0407b800 nid=0x5b064 runnable
"ParGC Thread#71" os_prio=0 cpu=2.34ms elapsed=152342.35s tid=0x00007f0d0407d800 nid=0x5b065 runnable
"ParGC Thread#72" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d0407f000 nid=0x5b066 runnable
"ParGC Thread#73" os_prio=0 cpu=2.08ms elapsed=152342.35s tid=0x00007f0d04081000 nid=0x5b067 runnable
"ParGC Thread#74" os_prio=0 cpu=0.23ms elapsed=152342.35s tid=0x00007f0d04082800 nid=0x5b068 runnable
"ParGC Thread#75" os_prio=0 cpu=0.26ms elapsed=152342.35s tid=0x00007f0d04084800 nid=0x5b069 runnable
"ParGC Thread#76" os_prio=0 cpu=0.24ms elapsed=152342.35s tid=0x00007f0d04086000 nid=0x5b06a runnable
"ParGC Thread#77" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d04088000 nid=0x5b06b runnable
"ParGC Thread#78" os_prio=0 cpu=0.29ms elapsed=152342.35s tid=0x00007f0d0408a000 nid=0x5b06c runnable
"ParGC Thread#79" os_prio=0 cpu=0.28ms elapsed=152342.35s tid=0x00007f0d0408b800 nid=0x5b06d runnable
"ParGC Thread#80" os_prio=0 cpu=0.25ms elapsed=152342.35s tid=0x00007f0d0408d800 nid=0x5b06e runnable
"ParGC Thread#81" os_prio=0 cpu=0.22ms elapsed=152342.35s tid=0x00007f0d0408f000 nid=0x5b06f runnable
"ParGC Thread#82" os_prio=0 cpu=0.93ms elapsed=152342.35s tid=0x00007f0d04091000 nid=0x5b070 runnable
"VM Periodic Task Thread" os_prio=0 cpu=70815.15ms elapsed=152342.40s tid=0x00007f0d480ea000 nid=0x5b01c waiting on condition
JNI global refs: 34, weak refs: 0
I have largely trusted that enabling JFR does not harm performance much, as long as you don't configure overly aggressive sampling.
But I've seen internal evidence (Amazon product search, closed source) that maybe JFR might hurt red-line QPS non-trivially, even when using the same .jfc
configuration as we use here in luceneutil
.
Let's try enabling/disabling JFR and confirm the overhead is not too bad (<= 1%?)?
Since the nightly benchmarks prints detailed profiler results to the daily log file, we should not also print to console output.
I am trying to execute the SearchPerfTest class as :
java -Xms2g -Xmx2g -classpath $LUCENE_PATH perf.SearchPerfTest -dirImpl MMapDirectory -indexPath index -analyzer StandardAnalyzerNoStopWords -taskSource test.tasks -searchThreadCount 2 -field body -topN 10 -staticSeed 0 -seed 0 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log search.log -nrt -indexThreadCount 1 -docsPerSecPerThread 10 -reopenEverySec 5 -postingsFormat Lucene50 -idFieldPostingsFormat Lucene50 -taskRepeatCount 1000 -tasksPerCat 5 -lineDocsFile <>
I am facing error as :
Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in NRTCachingDirectory(MMapDirectory@***/lucene-7.3.0/index/ lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a3ffec; maxCacheMB=400.0 maxMergeSizeMB=20.0): files: []
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:670)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:627)
at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:260)
at perf.PerfUtils.findCommitPoint(PerfUtils.java:38)
at perf.SearchPerfTest._main(SearchPerfTest.java:303)
at perf.SearchPerfTest.main(SearchPerfTest.java:128)
Is this due to some missing files while Index creation ?
Also I wanted to understand the usage of few parameters :
what does lineDocs File point to ?
what does the commit variable do when we give multi ?
taskSource refers to what ?
what combination of indexThreadCount and searchThreadCount gives best results ?
Running python src/python/localrun.py -source wikimedium10k
produces the following error for me:
Compile:
compile lucene:core...
[RUN: ./gradlew lucene:core:jar, cwd=/.../lucene_bench_home/patch]
/.../lucene_bench_home/patch/lucene/benchmark
Traceback (most recent call last):
File "src/python/localrun.py", line 54, in <module>
comp.benchmark("trunk_vs_patch")
File "/.../lucene_bench_home/util/src/python/competition.py", line 389, in benchmark
taskPatterns = (self.onlyTaskPatterns, self.notTaskPatterns), randomSeed = self.randomSeed)
File "/.../lucene_bench_home/util/src/python/searchBench.py", line 46, in run
r.compile(c)
File "/.../lucene_bench_home/util/src/python/benchUtil.py", line 1017, in compile
cp = self.classPathToString(self.getClassPath(competitor.checkout))
File "/.../lucene_bench_home/util/src/python/benchUtil.py", line 935, in getClassPath
return self.getAntClassPath(checkout)
File "/.../lucene_bench_home/util/src/python/benchUtil.py", line 893, in getAntClassPath
for filename in os.listdir('%s/lucene/build/core' % path):
OSError: [Errno 2] No such file or directory: '/.../lucene_bench_home/patch/lucene/build/core'
Commenting these lines in src/python/benchUtil.py
seems to fix an error for me:
def getClassPath(self, checkout):
path = checkoutToPath(checkout)
+ #if not os.path.exists(os.path.join(checkout, 'build.gradle')):
+ # return self.getAntClassPath(checkout)
@msokolov I am wondering if you can suggest a proper way to fix this issue 🙏
Currently this feature (retrieving a vector value for each search result) is disabled, and the only way to enable it is by changing this line https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/TaskParser.java#L150 and passing vectorField
to the SearchTask
constructor instead of null
. But this would enable for all tasks. So we should find a way to configure for some tasks and then enable and create a chart
LUCENE-8440 has provided the possibility of indexing points using the new {{LatLonshape}} approach. It would be good to add this to the GeoBench.
I think the benchmark results would be more useful if we could see some basic analysis of the code under load. We could imagine summary tables, graphs, flame charts, whatever is needed to knock out the hotspots and keep them under control.
Instead of seeing just that performance dropped or improved, we might be able to reason about why: it could make things better for developers. And maybe, just maybe, we have some giant frogs boiling for years that nobody never noticed yet: you know, easy wins to improve perf that went undetected.
As a start I imagine breaking the problem down into two profiling mechamisms:
StatisticsHelper.java generates information about the garbage collector during a benchmark task.
Here's a snippet that winds up at the end of a ".stdout" log file:
Garbage Generated in Young Generation: 48913.992 MiB
Garbage Generated in Survivor Generation: 69.45353 MiB
Garbage Generated in Old Generation: 66.00384 MiB
Due to a bug, those numbers if done today will all be zero. StatisticsHelper.run calls Thread.currentThread().setDaemon(true);
which looked wrong to me, and sure enough was causing an IllegalStateException on the thread (or something like that if I recall). A Runnable being invoked by an Executor ought not to mess with the state of it's thread. And I think it's unnecessary to try to make these threads daemons since stopStatistics() will cancel then shut it down.
Note quite related: I also thought it prudent to explicitly call run() from stopStatistics so we get the very latest information.
Another aside: it's suspicious to see all these volatile fields and furthermore "+=" operations on them. I don't think there are thread-safety concerns; they can all be regular fields. I don't think there needs to be synchronization in this class either. Calling executor.shutdown() triggers a happens-before relationship with the state modified by the task thread.
I could submit a PR if desired.
CC @shaie whom I suspect wrote StatisticsHelper originally
I can't believe we don't already have this, but we seem not to plot the size of the (enwiki
) nightly Lucene index!
We do for geospatial
(NYC taxis) benchmarks ... let's add it for enwiki
too?
benchmarks failed with:
WARNING: search result differences: []
Traceback (most recent call last):
File “/l/util.nightly/src/python/nightlyBench.py”, line 1745, in <module>
run()
File “/l/util.nightly/src/python/nightlyBench.py”, line 967, in run
raise RuntimeError(‘search result differences: %s’ % str(errors))
RuntimeError: search result differences: [“query=<vector:knn:<golf>[-0.07267512,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([1152765], ‘0.6784564’) vs ([26172611], ‘0.6658\
321’)“, “query=<vector:knn:<many geografia>[0.02625591,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([27478947], ‘0.5582471’) vs ([24551622], ‘0.5419257’)“, “query=<vector:k\
nn:<this school>[0.013859561,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([2504080], ‘0.9584719’) vs ([18644213], ‘0.9588711’)“, “query=<vector:knn:<such 2007>[-0.020270307\
,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([26667661], ‘0.9178002’) vs ([26389782], ‘0.9066777’)“, “query=<vector:knn:<http>[-0.12073054,...]> filter=None sort=None grou\
pField=None hitCount=150: hit 0 has wrong field/score value ([25418691], ‘0.59050024’) vs ([27431973], ‘0.67712307’)“]
Traceback (most recent call last):
File “/l/util.nightly/src/python/nightlyBench.py”, line 1745, in <module>
run()
File “/l/util.nightly/src/python/nightlyBench.py”, line 967, in run
raise RuntimeError(‘search result differences: %s’ % str(errors))
RuntimeError: search result differences: [“query=<vector:knn:<golf>[-0.07267512,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([1152765], ‘0.6784564’) vs ([26172611], ‘0.6658\
321’)“, “query=<vector:knn:<many geografia>[0.02625591,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([27478947], ‘0.5582471’) vs ([24551622], ‘0.5419257’)“, “query=<vector:k\
nn:<this school>[0.013859561,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([2504080], ‘0.9584719’) vs ([18644213], ‘0.9588711’)“, “query=<vector:knn:<such 2007>[-0.020270307\
,...]> filter=None sort=None groupField=None hitCount=150: hit 0 has wrong field/score value ([26667661], ‘0.9178002’) vs ([26389782], ‘0.9066777’)“, “query=<vector:knn:<http>[-0.12073054,...]> filter=None sort=None grou\
pField=None hitCount=150: hit 0 has wrong field/score value ([25418691], ‘0.59050024’) vs ([27431973], ‘0.67712307’)“]
Currently the data being searched fits in RAM, so the benchmark is cheating in a way. For example: adding or removing random accesses probably wouldn't even register a blip. But it hurts performance for users that have TB of data and so on.
We can use something like https://github.com/mikemccand/luceneutil/blob/master/src/python/ramhog.c to mlock
a ton of RAM and reduce the amount of memory available to fairly measure this? There are more sophisticated ways probably too, but this is easiest to understand.
LUCENE-9694 has introduced a new tool that will let us construct a deterministic index in a concurrent manner.
Currently we achieved this by indexing in a single thread manner, but that is way slower when more cpu cores are available. We should switch to that new tools instead.
Benchmarks are not running anymore as now the minimum version for the JDK is 11:
https://issues.apache.org/jira/browse/LUCENE-9478 introduced an accidentally memory leak into 8.6.0 ...
The leak affects full flush, so our nightly NRT benchmark should've seen massive jump in heap usage.
On Java11, I see the following errors when compiling master:
import org.apache.lucene.queries.intervals.Intervals;
^
/local/home/atris/latestgreatest_luceneutil/util/src/main/perf/TaskParser.java:40: error: package org.apache.lucene.queries.intervals does not exist
import org.apache.lucene.queries.intervals.IntervalQuery;
^
/local/home/atris/latestgreatest_luceneutil/util/src/main/perf/RandomQuery.java:26: error: cannot find symbol
import org.apache.lucene.search.QueryVisitor;
^
symbol: class QueryVisitor
location: package org.apache.lucene.search
/local/home/atris/latestgreatest_luceneutil/util/src/main/perf/RandomQuery.java:99: error: cannot find symbol
public void visit(QueryVisitor visitor) {
^
symbol: class QueryVisitor
location: class RandomQuery
/local/home/atris/latestgreatest_luceneutil/util/src/main/perf/TaskParser.java:203: error: cannot find symbol
query = new IntervalQuery(fieldName,
^
symbol: class IntervalQuery
location: class TaskParser
/local/home/atris/latestgreatest_luceneutil/util/src/main/perf/TaskParser.java:204: error: cannot find symbol
Am I the only one who is seeing it?
The enwiki
files we use for luceneutil
benchmarks, including nightly benchmarks, are very very old by now, almost a decade: /l/data/enwiki-20110115-lines-1k-fixed.bin
.
Also, how these files were created is not exactly clear.
Let's 1) upgrade to latest enwiki
export, and 2) clarify the tools used to create the binary block form of these documents source file.
Hi there, I was setting up luceneutils to benchmark some stuff I have been thinking about w.r.t integer compression in posting lists.
As part of setting it up I wired up an A/A test where both competitors are exactly the same version from git.
I seem to get results that nearly always have the first competitor winning slightly.
Task QPS A StdDev QPS B StdDev Pct diff
HighPhrase 741.01 (15.5%) 702.61 (14.4%) -5.2% ( -30% - 29%)
IntNRQ 1759.47 (13.8%) 1670.37 (18.8%) -5.1% ( -33% - 31%)
AndHighHigh 1082.02 (15.7%) 1040.52 (15.0%) -3.8% ( -29% - 31%)
AndHighLow 2048.39 (12.4%) 1974.62 (16.1%) -3.6% ( -28% - 28%)
LowSpanNear 1595.02 (11.9%) 1547.22 (19.6%) -3.0% ( -30% - 32%)
Respell 361.26 (15.9%) 351.29 (11.5%) -2.8% ( -26% - 29%)
PKLookup 270.92 (13.4%) 263.87 (18.1%) -2.6% ( -30% - 33%)
MedTerm 3937.97 (13.0%) 3862.01 (18.5%) -1.9% ( -29% - 34%)
LowTerm 3756.76 (9.4%) 3685.07 (18.2%) -1.9% ( -26% - 28%)
OrHighMed 1175.71 (14.4%) 1154.50 (14.9%) -1.8% ( -27% - 32%)
HighSloppyPhrase 534.14 (13.4%) 525.54 (15.4%) -1.6% ( -26% - 31%)
MedPhrase 1176.03 (9.7%) 1159.14 (15.7%) -1.4% ( -24% - 26%)
Prefix3 484.72 (11.9%) 478.04 (17.4%) -1.4% ( -27% - 31%)
Fuzzy2 43.52 (14.7%) 43.08 (18.3%) -1.0% ( -29% - 37%)
HighSpanNear 500.88 (20.4%) 502.61 (29.8%) 0.3% ( -41% - 63%)
AndHighMed 1372.91 (16.2%) 1382.71 (20.6%) 0.7% ( -31% - 44%)
LowPhrase 666.81 (15.4%) 673.52 (17.5%) 1.0% ( -27% - 40%)
Wildcard 394.91 (13.3%) 398.97 (18.6%) 1.0% ( -27% - 37%)
MedSloppyPhrase 761.00 (12.8%) 772.69 (19.2%) 1.5% ( -26% - 38%)
HighTerm 3362.84 (10.5%) 3438.57 (14.2%) 2.3% ( -20% - 30%)
Fuzzy1 324.88 (12.0%) 333.20 (15.9%) 2.6% ( -22% - 34%)
OrHighLow 1786.50 (13.1%) 1844.04 (14.3%) 3.2% ( -21% - 35%)
LowSloppyPhrase 874.38 (10.8%) 904.91 (21.0%) 3.5% ( -25% - 39%)
MedSpanNear 415.60 (21.4%) 434.18 (20.1%) 4.5% ( -30% - 58%)
OrHighHigh 1075.23 (15.3%) 1155.18 (21.9%) 7.4% ( -25% - 52%)
With the competitors reversed
Task QPS B StdDev QPS A StdDev Pct diff
MedTerm 3199.44 (10.6%) 2931.05 (13.2%) -8.4% ( -29% - 17%)
LowSloppyPhrase 1233.67 (9.2%) 1134.70 (7.8%) -8.0% ( -22% - 9%)
MedPhrase 932.71 (8.5%) 866.26 (9.9%) -7.1% ( -23% - 12%)
Wildcard 398.51 (8.6%) 375.41 (10.2%) -5.8% ( -22% - 14%)
AndHighLow 2292.24 (10.2%) 2201.04 (18.7%) -4.0% ( -29% - 27%)
HighSloppyPhrase 544.45 (16.8%) 528.25 (13.6%) -3.0% ( -28% - 32%)
AndHighMed 1340.34 (10.0%) 1301.05 (13.2%) -2.9% ( -23% - 22%)
HighSpanNear 519.63 (21.7%) 505.05 (20.9%) -2.8% ( -37% - 50%)
OrHighMed 1168.76 (11.6%) 1145.84 (14.6%) -2.0% ( -25% - 27%)
Respell 234.98 (10.9%) 230.39 (12.6%) -2.0% ( -22% - 24%)
AndHighHigh 945.69 (17.2%) 929.94 (11.4%) -1.7% ( -25% - 32%)
IntNRQ 1607.76 (14.0%) 1582.74 (11.7%) -1.6% ( -23% - 28%)
OrHighHigh 1026.79 (9.2%) 1012.06 (10.6%) -1.4% ( -19% - 20%)
OrHighLow 1573.45 (14.1%) 1556.98 (15.6%) -1.0% ( -26% - 33%)
Fuzzy2 53.03 (13.9%) 52.59 (13.4%) -0.8% ( -24% - 30%)
Prefix3 456.26 (9.1%) 453.52 (7.0%) -0.6% ( -15% - 17%)
LowSpanNear 1009.28 (10.5%) 1003.82 (12.2%) -0.5% ( -21% - 24%)
HighTerm 3539.41 (13.6%) 3531.60 (13.2%) -0.2% ( -23% - 30%)
MedSpanNear 345.51 (23.4%) 347.01 (18.8%) 0.4% ( -33% - 55%)
HighPhrase 559.54 (12.7%) 562.42 (11.3%) 0.5% ( -20% - 28%)
LowPhrase 1080.39 (12.6%) 1103.94 (13.5%) 2.2% ( -21% - 32%)
Fuzzy1 212.41 (13.2%) 217.09 (13.7%) 2.2% ( -21% - 33%)
MedSloppyPhrase 872.23 (13.6%) 892.19 (10.8%) 2.3% ( -19% - 30%)
PKLookup 274.07 (10.3%) 286.41 (7.5%) 4.5% ( -12% - 24%)
LowTerm 3817.47 (9.9%) 4074.73 (14.8%) 6.7% ( -16% - 34%)
These seem a bit wild on the swings any thoughts as to what could cause this?
I am working on LUCENE-8374 and I would like to have independent verification of my findings. I have looked through luceneutil
and it does not seem to have a test for my case. I could try and add a fitting test and make a pull request, but I fear that I might be doctoring it to fit my view on the issue, so I ask for input on this.
What I have found is a performance regression introduced in Lucene 7+ with the sequential API for DocValues. The regression is (not surprisingly) for random access of DocValues fields and gets progressively worse as segment size (measured in #documents) grows. The simplest real-world case is document retrieval as part of a search, with fields that are not stored but which has DocValues enabled.
Compounding factors are
maxDoc
is a usable indicator)The pathological case is a 100M+ doc index merged down to 1 segment (which we have in abundance), but the effect is measurable even with segments of 2M documents with randomized searches and small result sets.
Does this sound like something that is covered by any of the tests in luceneutil
? Maybe the taxi-corpus? I looked, but it seems to use floating point numbers, which aren't as affected.
I noticed that when running multiple tasks with multiple search threads (ie the IndexSearcher uses an executor to search multiple segment slices concurrently) that the performance results from one task influence the others. Basically what happens is that all tasks take as long as the slowest task, which dominates the execution time and starves other tasks, making them take longer (in wall time) even though they presumably use less CPU time.
I will push an interim fix that sorts tasks by category, but this isn't ideal since it causes hotspot compiler to optimize in a skewed way, probably based on the first task, rather than a mix of them. Also there can still be pollution at the task boundaries, when the test harness switches over from running one task to the next.Maybe we ought to track and report on per-thread CPU time rather wall clock time? The change would be tricky though, probably requiring a custom Executor to do the tracking.
Currently the default codec is not what is being benchmarked. Instead it all kinds of options are passed in configuring things in a non-default way.
For example we aren't testing lucene's default LZ4-based stored fields compression... looking at the history, the code change to use deflate-based stored fields wasn't even intentional!
So I think its gotten a bit too crazy (both the options on the Lucene side, and the benchmark's use of them). Many of these options don't even need to exist (e.g. we should compression SortedDocValues terms dict, always, it's a no-brainer), and sometimes it seems bad choices were made for defaults (e.g. compressing BinaryDocValues seems wrong). Sadly, I feel responsible for it by adding the stored field's BEST_COMPRESSION option in the first place.
Can we take a first step towards health and change things to just benchmark Lucene90Codec?
Current codec:
final Codec codec = new Lucene90Codec(Lucene90Codec.Mode.BEST_COMPRESSION) {
@Override
public PostingsFormat getPostingsFormatForField(String field) {
return PostingsFormat.forName(field.equals("id") ?
idFieldPostingsFormat : defaultPostingsFormat);
}
private final DocValuesFormat facetsDVFormat = new Lucene90DocValuesFormat(Lucene90DocValuesFormat.Mode.BEST_COMPRESSION);
//private final DocValuesFormat lucene42DVFormat = DocValuesFormat.forName("Lucene42");
//private final DocValuesFormat diskDVFormat = DocValuesFormat.forName("Disk");
// private final DocValuesFormat lucene45DVFormat = DocValuesFormat.forName("Lucene45");
//private final DocValuesFormat directDVFormat = DocValuesFormat.forName("Direct");
@Override
public DocValuesFormat getDocValuesFormatForField(String field) {
if (facetFields.contains(field)) {
return facetsDVFormat;
} else {
// Use default DVFormat for all else:
// System.out.println("DV: field=" + field + " format=" + super.getDocValuesFormatForField(field));
return super.getDocValuesFormatForField(field);
}
}
};
Nightly charts now zoom in by default to past 1 or 2 years, but it is not clear how we can then zoom out to see the full history.
Nightly benchmarks died due to disk full and it looks like rearranger is not cleaning up its temp space?
beast3:tmp$ du -shc *
21G rearrange12281786844594601579
21G rearrange12730148586185328932
22G rearrange13157388191989446289
22G rearrange1340491668446817616
21G rearrange14481089764614984310
22G rearrange17217387980835183119
22G rearrange4960206459366128658
21G rearrange5673520091188199499
21G rearrange673948730419081822
22G rearrange7876107730909355863
21G rearrange8412748926503092058
22G rearrange9360351473498560918
22G rearrange9805293399343436604
271G total
therefore the instructions in https://github.com/mikemccand/luceneutil#preparing-the-benchmark-candidates make no longer sense.
Can you adapt your setup script that it automatically checks out two user-given git branches from the lucene git repo and compares them for performance?
This benchmark seems to have been failing new for a while:
https://home.apache.org/~mikemccand/geobench.html#search-sort
I had a look and it seems the nightlies were unstable around that time so it is not so easy to narrow down the possible change that cases the failure.
I believe the issue comes from the following change: apache/lucene-solr#1856. The issue is that the performance run for this benchmark has the following check:
if (hits.totalHits.relation != TotalHits.Relation.EQUAL_TO) {
// IndexSearcher can never optimize top-hits collection in that case,
// se we should get accurate hit counts
throw new AssertionError();
}
I guess the comment is not true anymore and we can now optimise and skip documents?
We use this StatisticsHelper.java
class to watch JIT/GC metrics and print summary to console output at the end of each Indexing and Searching run, but today we only produce a chart for the JIT/GC during indexing (scroll to bottom of that page).
It's a useful chart since it changes quite a bit from JVM to JVM, e.g. the Young GC time dropped substantially recently with the upgrade to OpenJDK 12/13.
It turns out the SearchPerfTest
runs already run StatisticsHelper
, so all we need to do here is fix the makeGraphs
method in nightlyBench.py
to parse/aggregate metrics and write a new chart ... shouldn't be too hard.
./perf/RandomQuery.java:42: error: method does not override or implement a method from a supertype
@OverRide
^
./perf/SearchTask.java:144: error: cannot find symbol
final BlockGroupingCollector c = new BlockGroupingCollector(Sort.RELEVANCE, 10, true, searcher.createWeight(searcher.rewrite(state.groupEndQuery), ScoreMode.COMPLETE_NO_SCORES, 1));
^
symbol: variable COMPLETE_NO_SCORES
location: class ScoreMode
Facing above issue during compilation .
We have a fun nightly analyzer benchmark, but it only tests a few analyzers.
Recently there was an accidental ~6X slowdown on Kuromoji analysis throughput, that was only caught and root-caused much later (thanks @jpountz!), and it means at least 8.4.x and 8.5.x releases have degraded Kuromoji performance.
We should add Kuromoji into the nightly analysis benchmarks!
We have noticed a big drop in the throughput from faceting for all Day of Year
facets. One possible explanation for this could be a change to add two modes to the Lucene codec: BEST_SPEED
, BEST_COMPRESSION
.
It is possible that using the BEST_COMPRESSION
mode caused a drop in the faceting throughput. This issue is to confirm if FACET_FIELD_DV_FORMAT_DEFAULT='Lucene80'
by default uses BEST_COMPRESSION
and if yes does changing it to BEST_SPEED
improve the throughput to the previous rate.
On running
python src/python/localrun.py -source wikimedium10k
I get the following compilation error:
/Users/amisaha/Code/Apache/luceneutil/src/main/perf/RandomQuery.java:42: error: <anonymous perf.RandomQuery$1> is not abstract and does not override abstract method isCacheable(LeafReaderContext) in SegmentCacheable
return new ConstantScoreWeight(this, boost) {
Here are the details printed at the start:
JAVA:
java version "9.0.1"
Java(TM) SE Runtime Environment (build 9.0.1+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
OS:
Darwin ********* 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64
precommit
does all sorts of good checks, but, it does them slowly -- let's add a nightly chart so we can at least catch regressions going forward.
Generated index successfully, and then this - Exception in thread "main" java.lang.IllegalArgumentException: facetDim Date was not indexed
Search:
iter 0
my_modified_version:
log: /Users/michael/workspace/logs/trunk_vs_patch.my_modified_version.0 + stdout
run: java -server -Xms2g -Xmx2g -XX:-TieredCompilation -XX:+HeapDumpOnOutOfMemoryError -Xbatch -classpath /Users/michael/workspace/lucene-solr-mbraun/lucene/build/core/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/core/classes/test:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/sandbox/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/misc/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/facet/classes/java:/home/mike/src/lucene-c-boost/dist/luceneCBoost-SNAPSHOT.jar:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/analysis/common/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/analysis/icu/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/queryparser/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/grouping/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/suggest/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/highlighter/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/codecs/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/build/queries/classes/java:/Users/michael/workspace/lucene-solr-mbraun/lucene/facet/lib/hppc-0.7.3.jar:/Users/michael/workspace/luceneutil/lib/HdrHistogram.jar:/Users/michael/workspace/luceneutil/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /Users/michael/workspace/indices/wikimedium10k.lucene-solr-mbraun2.Lucene50.Memory.nd0.01M -analyzer StandardAnalyzer -taskSource /Users/michael/workspace/luceneutil/20346.tasks -searchThreadCount 2 -taskRepeatCount 20 -field body -tasksPerCat 1 -staticSeed -3048916 -seed -4311098 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log /Users/michael/workspace/logs/trunk_vs_patch.my_modified_version.0 -topN 10 -pk
SearchPerfTest FAILED:
========================================
Statistics Started at Sun Feb 11 18:11:10 EST 2018
Operative System: Mac OS X 10.13.3 x86_64
JVM : Oracle Corporation Java HotSpot(TM) 64-Bit Server VM runtime 25.162-b12 1.8.0_162-b12
Processors: 4
System Memory: 60.940506% used of 8.0 GiB
Used Heap Size: 10.63401 MiB
Max Heap Size: 1963.0 MiB
Young Generation Heap Size: 597.5 MiB
- - - - - - - - - - - - - - - - - - - -
Using dir impl org.apache.lucene.store.MMapDirectory
Analyzer StandardAnalyzer
Similarity BM25Similarity
Search thread count 2
topN 10
JVM is 64bit
Pointer is 4 bytes
Opening searcher on commit=multi
found commit=multi
maxDoc=10000 numDocs=10000 %tg deletes=100.0
473 msec to init searcher/NRT
Searcher: numDocs=10000 maxDoc=10000: IndexSearcher(StandardDirectoryReader(segments_2:1236 _1y(8.0.0):C1152 _3x(8.0.0):C1152 _5w(8.0.0):C1152 _7v(8.0.0):C1152 _9u(8.0.0):C1152 _bt(8.0.0):C1152 _ds(8.0.0):C1152 _fr(8.0.0):C1152 _gw(8.0.0):C666 _gx(8.0.0):C18 _gy(8.0.0):C18 _gz(8.0.0):C18 _h0(8.0.0):C18 _h1(8.0.0):C18 _h2(8.0.0):C18 _h3(8.0.0):C10); executor=null)
- - - - - - - - - - - - - - - - - - - -
Statistics Ended at Sun Feb 11 18:11:14 EST 2018
Elapsed time: 4606 ms
Time in JIT compilation: 2840 ms
Time in Young Generation GC: 0 ms (0 collections)
Time in Old Generation GC: 0 ms (0 collections)
Garbage Generated in Young Generation: 0.0 MiB
Garbage Generated in Survivor Generation: 0.0 MiB
Garbage Generated in Old Generation: 0.0 MiB
Average CPU Load: 98.01696/400
----------------------------------------
Exception in thread "main" java.lang.IllegalArgumentException: facetDim Date was not indexed
at perf.TaskParser.parseOneTask(TaskParser.java:143)
at perf.LocalTaskSource.loadTasks(LocalTaskSource.java:149)
at perf.LocalTaskSource.<init>(LocalTaskSource.java:46)
at perf.SearchPerfTest._main(SearchPerfTest.java:525)
at perf.SearchPerfTest.main(SearchPerfTest.java:128)
Traceback (most recent call last):
File "src/python/localrun.py", line 39, in <module>
comp.benchmark("trunk_vs_patch")
File "/Users/michael/workspace/luceneutil/src/python/competition.py", line 379, in benchmark
verifyScores = self.verifyScores, taskPatterns = (self.onlyTaskPatterns, self.notTaskPatterns), randomSeed = self.randomSeed)
File "/Users/michael/workspace/luceneutil/src/python/searchBench.py", line 176, in run
filter=None, taskPatterns=taskPatterns)
File "/Users/michael/workspace/luceneutil/src/python/benchUtil.py", line 1067, in runSimpleSearchBench
raise RuntimeError('SearchPerfTest failed; see log %s.stdout' % logFile)
RuntimeError: SearchPerfTest failed; see log /Users/michael/workspace/logs/trunk_vs_patch.my_modified_version.0.stdout
Currently when a new java file is added to src/main/perf
and is supposed to be used when with lucene benchmark tools, we need to manually add them into a file list hardcoded in competition.py
.
We might want to distribute files that is inside src/main/perf
into different directories and only leave the files that need to be compiled remained in this directory. So that we don't need to add file entry to the python script each time, we can safely assume all files under the directory is meant to be compiled.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.