Comments (14)
Patch looks good to me. I'll just wait some time to give other people the opportunity to take a look.
a system with about ~1000 active searchers.
Just in case, if you are talking about searchers over the same directory, you could actually share them: IndexSearcher is thread-safe.
[Legacy Jira: Adrien Grand (@jpountz) on Oct 26 2017]
from stargazers-migration-test.
Hi Adrien,
Thanks for your fast response. We use searchers from multiple threads but we have a lot of them for consistent pagination. Ours is really an unusual workload for Lucene I guess.
[Legacy Jira: Julian Vassev on Oct 26 2017]
from stargazers-migration-test.
Btw I noticed the FieldInfo[] parameter is always sorted by number (when read from segments).
How do you think, does it make sense to check if is sorted and skip TreeMap creation altogether? Or add isSorted parameter to the constructor.
Thanks,
Julian
[Legacy Jira: Julian Vassev on Oct 26 2017]
from stargazers-migration-test.
That could work, but I suspect this is never a problem for any workload, so we should keep things simple?
[Legacy Jira: Adrien Grand (@jpountz) on Oct 27 2017]
from stargazers-migration-test.
Commit 85f8216b7bd972e276396f022057bf8aa9aa1c1f in lucene-solr's branch refs/heads/branch_7x from @jpountz
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85f8216
LUCENE-8018: FieldInfos retains garbage if non-sparse.
[Legacy Jira: ASF subversion and git services on Oct 27 2017]
from stargazers-migration-test.
Commit 401dda7e064b6f621cba405985143724d79620c4 in lucene-solr's branch refs/heads/master from @jpountz
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=401dda7
LUCENE-8018: FieldInfos retains garbage if non-sparse.
[Legacy Jira: ASF subversion and git services on Oct 27 2017]
from stargazers-migration-test.
Thank you Julian!
[Legacy Jira: Adrien Grand (@jpountz) on Oct 27 2017]
from stargazers-migration-test.
I am concerned about the complexity here introduced by "millions of fields" abuse case.
I pushed back against this from the very beginning and it appears the nightmare i predicted has become the reality: I can't tell what any of this FieldInfos code is even doing here, much less this patch.
I definitely don't think we should be adding any optimization here. Instead we should be carefully undoing existing optimizations to try to manage the technical debt here.
[Legacy Jira: Robert Muir (@rmuir) on Oct 27 2017]
from stargazers-migration-test.
I agree there is some complexity here, but this is unrelated to this patch to me. The patch actually makes things more consistent: the comment about the memory threshold makes no sense if we are retaining a reference to the TreeMap in any case. Moreover, it feels wrong to penalize users who have dense fields vs. sparse fields.
undoing existing optimizations to try to manage the technical debt here
I'd be ok with going with the array approach all the time.
[Legacy Jira: Adrien Grand (@jpountz) on Oct 27 2017]
from stargazers-migration-test.
I accidentally hit some keys on my keyboard while looking at this jira and it assigned it to me! Sorry for the noise; I reverted it.
[Legacy Jira: David Smiley (@dsmiley) on Oct 27 2017]
from stargazers-migration-test.
We use searchers from multiple threads but we have a lot of them for consistent pagination. Ours is really an unusual workload for Lucene I guess.
Wow! Impressive :) Lucene's point-in-time searchers make it possible to have consistent pagination; I'm glad you're using it that way, and I wouldn't say it's unusual.
But how does that result in 1000s of open IndexSearcher
s at once? Seems like once an IndexSearcher
is no longer the latest one, and is no longer involved in an active user session, it can be closed? Do you have a very long timeout in the user session before you consider the user done searching?
[Legacy Jira: Michael McCandless (@mikemccand) on Nov 01 2017]
from stargazers-migration-test.
Hi Michael,
Thank you for your interest in this matter.
Yes, the default session timeout is 30 minutes. As new documents are indexed almost every 10 seconds, every new session creates a searcher. This also prevents efficient merging and during a synthetic test I can observe segment file count grow as much 2.5x the number of documents.
I tried with using NRTCachingDirectory but it seems to make no difference.
[Legacy Jira: Julian Vassev on Nov 01 2017]
from stargazers-migration-test.
Hi @jvassev
, hmm it should not prevent merging, but rather prevent deleting of index files that are still in use by old searchers, even if they have been merged away in the latest index. I.e. if you print the latest searcher you should see a "contained" number of segments in it.
Also, if you refresh every 10 seconds, and every such searcher is used (i.e. a new search always happens within the 10 seconds), then shouldn't you at worst every have 30 * 6 = 180 live searchers?
Do you use SearcherLifetimeManager
to track all these searchers?
[Legacy Jira: Michael McCandless (@mikemccand) on Nov 01 2017]
from stargazers-migration-test.
Instead we should be carefully undoing existing optimizations to try to manage the technical debt here.
OK, I opened LUCENE-8018 to discuss this.
[Legacy Jira: Adrien Grand (@jpountz) on Nov 02 2017]
from stargazers-migration-test.
Related Issues (20)
- Code Cleanup: Use entryset for map iteration wherever possible - part 2 possible. [LUCENE-8979] HOT 5
- Optimise SegmentTermsEnum.seekExact performance [LUCENE-8980] HOT 9
- Update javadocs to reflect experimental status of Kuromoji DictionaryBuilder [LUCENE-8981] HOT 3
- Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] HOT 31
- PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase [LUCENE-8983] HOT 13
- SynonymGraphFilter cannot handle input stream with tokens filtered. [LUCENE-8985] HOT 12
- Add asf.yaml to our git repo [LUCENE-8986] HOT 7
- Move Lucene web site from svn to git [LUCENE-8987] HOT 58
- Maximal -- Minimum Based Early Termination For TopFieldCollector [LUCENE-8988]
- IndexSearcher Should Handle Rejection of Concurrent Task [LUCENE-8989] HOT 10
- IndexOrDocValuesQuery can take a bad decision for range queries if field has many values per document [LUCENE-8990] HOT 8
- disable java.util.HashMap assertions to avoid spurious vailures due to JDK-8205399 [LUCENE-8991] HOT 13
- Share minimum score across segments in concurrent search [LUCENE-8992] HOT 7
- Change Maven POM repository URLs to https [LUCENE-8993] HOT 15
- Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). [LUCENE-8994] HOT 5
- TopSuggestDocsCollector#collect should be able to signal rejection [LUCENE-8995] HOT 1
- Add type of triangle info to ShapeField encoding [LUCENE-8997] HOT 4
- OverviewImplTest.testIsOptimized reproducible failure [LUCENE-8998] HOT 5
- expectThrows doesn't play nicely with "assume" failures [LUCENE-8999] HOT 12
- Cannot resolve classes from org.apache.lucene.core plugin and others [LUCENE-9000] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stargazers-migration-test.