Giter VIP home page Giter VIP logo

Comments (12)

mykmelez avatar mykmelez commented on June 15, 2024 1

Note https://www.openldap.org/its/index.cgi/Software%20Bugs?id=8975, which fixes a (rare) error/crash when calling mdb_env_set_mapsize() on Windows if MDB_WRITEMAP is set. It's already been landed on the mdb.RE/0.9 branch, but a new version of LMDB hasn't been released with the fix yet.

from rkv.

zen0wu avatar zen0wu commented on June 15, 2024

On a related matter, it seems that mdb_env_set_mapsize should be exposed to change the default size. I went back to lower level library because of this limitation. Not sure if there's an existing issue on that.

from rkv.

mykmelez avatar mykmelez commented on June 15, 2024

@shivawu You can change the map size using rkv. See #82 for tests that demonstrate how to do that.

from rkv.

zen0wu avatar zen0wu commented on June 15, 2024

Ah I see, somehow I missed it. Thanks for pointing that out. Probably the local doc does not work that well for me.. Also btw, the online doc is missing for the project (https://docs.rs/rkv/0.5.1/rkv/), expected?

from rkv.

mykmelez avatar mykmelez commented on June 15, 2024

Ah I see, somehow I missed it. Thanks for pointing that out. Probably the local doc does not work that well for me.. Also btw, the online doc is missing for the project (https://docs.rs/rkv/0.5.1/rkv/), expected?

Not expected, but known. It's #81, where you can read all the details (tl;dr: docs.rs is using an outdated rustc compiler).

from rkv.

ncloudioj avatar ncloudioj commented on June 15, 2024

Another way for us to deal with MAP_FULL is that we check the space usage of environment via its stats API (which reports max map size, total page # in use, total page # in the freelist etc), and then increase the map size if the disk usage goes over a pre-defined high watermark.

Specifically, we can do this either by a dedicated maintenance worker, or by the consumers themselves. Either way, some coordination is required since resizing can only be done on a free environment (only one writer without any readers).

This preventive resizing appears to be more preferred over the resize-when-you-hit-the-map-full in two aspects:

  • Write transactions are less likely to be interrupted
  • Easier to coordinate, preparing the resize condition is hard when there are active readers/writers

@mykmelez What do you think?

from rkv.

mykmelez avatar mykmelez commented on June 15, 2024

@ncloudioj It seems helpful to periodically check stats and preemptively increase the map size if the database is approaching its limit (as well as to periodically call mdb_reader_check to clear stale readers). And that'll make it less likely for a write to fail with MAP_FULL. But it seems like we'll still occasionally encounter that error, so we'll still need to handle it somehow.

Perhaps kvstore (or the consumer) can catch the MAP_FULL error and trigger a resize (and stale reader check) operation at that point, after which it can retry the write?

That operation could be identical to the one we run periodically. And perhaps it could be invoked with a flag to indicate that it's being invoked manually and should definitely increase the map size, even if the environment doesn't look too full, since a large-enough write could fail in an environment that is mostly empty.

As for coordination, I suppose that whatever is invoking the resize operation could await existing readers while clearing stale ones and blocking new ones, then resize once existing readers are cleared, and finally unblocking the new readers (after which it can return control to whoever called it—kvstore or the consumer—to retry the write).

from rkv.

ncloudioj avatar ncloudioj commented on June 15, 2024

Agreed. Looks like kvstore is the perfect candidate to do this bookkeeping. Once this particular error gets handled properly, it'll be a big step forward for us to roll out rkv in production at scale!

Speaking of maintenance, LMDB also has a set of API for the environment clone (as mentioned in #12 ), which could be useful to eschew the fragmentation problem (like VACUUM). A frequently updated store could consume much more pages than a fresh clone.

With stale reader collector, resize utility, and live copy utility, I believe they'd make a solid maintenance lineup for kvstore.

from rkv.

mykmelez avatar mykmelez commented on June 15, 2024

This morning it occurred to me that we might implement this in rkv instead of kvstore, in which case rkv consumers more generally would gain the benefit of this automatic resizing. And doing that would be consistent with the goal of rkv to provide a "simple, humane" interface to LMDB. Unsure if there's any reason not to implement this in rkv, but I'll think about it some more.

@ncloudioj Do you have any thoughts about implementing this in kvstore vs. rkv?

from rkv.

ncloudioj avatar ncloudioj commented on June 15, 2024

My reasoning was based on that the consumer knows better than rkv regarding when&how to do the resize :)

Though I agree with you that letting rkv handle it for the consumers would be nice in most cases. If we were to implement it in rkv, I think we also need to consider following cases:

  • rkv consumers might want to handle MPA_FULL by themselves. For example, they might either want a fixed size store, or do not want it to grow without bound. Perhaps leave auto-resize as an option?
  • Off the top of my mind, coordination between readers and writer for resizing would be hard because rkv doesn't know anything about the active readers, such as when they will end the underlying transactions. I just realize that kvstore actually face the same challenge though. The key point is how can we efficiently send this need-to-resize message to all the rkv readers&writers.

from rkv.

mykmelez avatar mykmelez commented on June 15, 2024

It occurred to me recently that a reason to do this in rkv rather than kvstore is that even in Firefox there are consumers who are using rkv directly, such as the one in bug 1429796.

rkv consumers might want to handle MPA_FULL by themselves. For example, they might either want a fixed size store, or do not want it to grow without bound. Perhaps leave auto-resize as an option?

As with some other issues, like #109, a consumer might want more control over their interaction with LMDB under certain circumstances while appreciating the value of rkv's higher-level abstractions in other cases. And there's a tension between wanting to support those use cases while keeping the API simple for the common case.

I do want to support those use cases, and I agree that it should be possible for an rkv consumer to "opt-out" of auto-resize. I would just want to make sure we do so in a way that imposes the least cognitive burden on consumers in the common case.

Off the top of my mind, coordination between readers and writer for resizing would be hard because rkv doesn't know anything about the active readers, such as when they will end the underlying transactions. I just realize that kvstore actually face the same challenge though. The key point is how can we efficiently send this need-to-resize message to all the rkv readers&writers.

I suspect that rkv will have to track and manage active readers somehow, such that it can await them, while blocking new ones, when a resize is pending.

from rkv.

ncloudioj avatar ncloudioj commented on June 15, 2024

Indeed, there are pros&cons in both mechanisms. Looks like equipping rkv with auto-resize works better in more cases at the moment. So yes, I am convinced, and let's get started with that design.

To keep the initial work in a reasonable scope, I'd also like to revoke my prior suggestion of making the auto-resize optional. We can re-visit such feature if we find it useful in the future.

I suspect that rkv will have to track and manage active readers somehow, such that it can await them, while blocking new ones, when a resize is pending.

Yes, it's definitely doable in a single process scenario, but will be more challenging when active readers reside in different processes. Perhaps we can have an internal db (e.g. __meta__) to store all the meta information in order to facilitate the coordination.

With this change, readers could also be blocked by the resize coordination. Though this should be easier to handle as we can return an error (such as RESIZE_IN_PROGRESS) in the reader constructor.

from rkv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.