Giter VIP home page Giter VIP logo

Comments (14)

mikemccand avatar mikemccand commented on June 3, 2024

Patch looks good to me. I'll just wait some time to give other people the opportunity to take a look.

a system with about ~1000 active searchers.

Just in case, if you are talking about searchers over the same directory, you could actually share them: IndexSearcher is thread-safe.

[Legacy Jira: Adrien Grand (@jpountz) on Oct 26 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Hi Adrien,

Thanks for your fast response. We use searchers from multiple threads but we have a lot of them for consistent pagination. Ours is really an unusual workload for Lucene I guess.

[Legacy Jira: Julian Vassev on Oct 26 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Btw I noticed the FieldInfo[] parameter is always sorted by number (when read from segments).

How do you think, does it make sense to check if is sorted and skip TreeMap creation altogether? Or add isSorted parameter to the constructor.

Thanks,
Julian

[Legacy Jira: Julian Vassev on Oct 26 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

That could work, but I suspect this is never a problem for any workload, so we should keep things simple?

[Legacy Jira: Adrien Grand (@jpountz) on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Commit 85f8216b7bd972e276396f022057bf8aa9aa1c1f in lucene-solr's branch refs/heads/branch_7x from @jpountz
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85f8216

LUCENE-8018: FieldInfos retains garbage if non-sparse.

[Legacy Jira: ASF subversion and git services on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Commit 401dda7e064b6f621cba405985143724d79620c4 in lucene-solr's branch refs/heads/master from @jpountz
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=401dda7

LUCENE-8018: FieldInfos retains garbage if non-sparse.

[Legacy Jira: ASF subversion and git services on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Thank you Julian!

[Legacy Jira: Adrien Grand (@jpountz) on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

I am concerned about the complexity here introduced by "millions of fields" abuse case.

I pushed back against this from the very beginning and it appears the nightmare i predicted has become the reality: I can't tell what any of this FieldInfos code is even doing here, much less this patch.

I definitely don't think we should be adding any optimization here. Instead we should be carefully undoing existing optimizations to try to manage the technical debt here.

[Legacy Jira: Robert Muir (@rmuir) on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

I agree there is some complexity here, but this is unrelated to this patch to me. The patch actually makes things more consistent: the comment about the memory threshold makes no sense if we are retaining a reference to the TreeMap in any case. Moreover, it feels wrong to penalize users who have dense fields vs. sparse fields.

undoing existing optimizations to try to manage the technical debt here

I'd be ok with going with the array approach all the time.

[Legacy Jira: Adrien Grand (@jpountz) on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

I accidentally hit some keys on my keyboard while looking at this jira and it assigned it to me! Sorry for the noise; I reverted it.

[Legacy Jira: David Smiley (@dsmiley) on Oct 27 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

We use searchers from multiple threads but we have a lot of them for consistent pagination. Ours is really an unusual workload for Lucene I guess.

Wow! Impressive :) Lucene's point-in-time searchers make it possible to have consistent pagination; I'm glad you're using it that way, and I wouldn't say it's unusual.

But how does that result in 1000s of open IndexSearchers at once? Seems like once an IndexSearcher is no longer the latest one, and is no longer involved in an active user session, it can be closed? Do you have a very long timeout in the user session before you consider the user done searching?

[Legacy Jira: Michael McCandless (@mikemccand) on Nov 01 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Hi Michael,
Thank you for your interest in this matter.

Yes, the default session timeout is 30 minutes. As new documents are indexed almost every 10 seconds, every new session creates a searcher. This also prevents efficient merging and during a synthetic test I can observe segment file count grow as much 2.5x the number of documents.

I tried with using NRTCachingDirectory but it seems to make no difference.

[Legacy Jira: Julian Vassev on Nov 01 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Hi @jvassev, hmm it should not prevent merging, but rather prevent deleting of index files that are still in use by old searchers, even if they have been merged away in the latest index. I.e. if you print the latest searcher you should see a "contained" number of segments in it.

Also, if you refresh every 10 seconds, and every such searcher is used (i.e. a new search always happens within the 10 seconds), then shouldn't you at worst every have 30 * 6 = 180 live searchers?

Do you use SearcherLifetimeManager to track all these searchers?

[Legacy Jira: Michael McCandless (@mikemccand) on Nov 01 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 3, 2024

Instead we should be carefully undoing existing optimizations to try to manage the technical debt here.

OK, I opened LUCENE-8018 to discuss this.

[Legacy Jira: Adrien Grand (@jpountz) on Nov 02 2017]

from stargazers-migration-test.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.