Comments (12)
Note https://www.openldap.org/its/index.cgi/Software%20Bugs?id=8975, which fixes a (rare) error/crash when calling mdb_env_set_mapsize()
on Windows if MDB_WRITEMAP
is set. It's already been landed on the mdb.RE/0.9 branch, but a new version of LMDB hasn't been released with the fix yet.
from rkv.
On a related matter, it seems that mdb_env_set_mapsize
should be exposed to change the default size. I went back to lower level library because of this limitation. Not sure if there's an existing issue on that.
from rkv.
@shivawu You can change the map size using rkv. See #82 for tests that demonstrate how to do that.
from rkv.
Ah I see, somehow I missed it. Thanks for pointing that out. Probably the local doc does not work that well for me.. Also btw, the online doc is missing for the project (https://docs.rs/rkv/0.5.1/rkv/), expected?
from rkv.
Ah I see, somehow I missed it. Thanks for pointing that out. Probably the local doc does not work that well for me.. Also btw, the online doc is missing for the project (https://docs.rs/rkv/0.5.1/rkv/), expected?
Not expected, but known. It's #81, where you can read all the details (tl;dr: docs.rs is using an outdated rustc compiler).
from rkv.
Another way for us to deal with MAP_FULL is that we check the space usage of environment via its stats API (which reports max map size, total page # in use, total page # in the freelist etc), and then increase the map size if the disk usage goes over a pre-defined high watermark.
Specifically, we can do this either by a dedicated maintenance worker, or by the consumers themselves. Either way, some coordination is required since resizing can only be done on a free environment (only one writer without any readers).
This preventive resizing appears to be more preferred over the resize-when-you-hit-the-map-full in two aspects:
- Write transactions are less likely to be interrupted
- Easier to coordinate, preparing the resize condition is hard when there are active readers/writers
@mykmelez What do you think?
from rkv.
@ncloudioj It seems helpful to periodically check stats and preemptively increase the map size if the database is approaching its limit (as well as to periodically call mdb_reader_check to clear stale readers). And that'll make it less likely for a write to fail with MAP_FULL. But it seems like we'll still occasionally encounter that error, so we'll still need to handle it somehow.
Perhaps kvstore (or the consumer) can catch the MAP_FULL error and trigger a resize (and stale reader check) operation at that point, after which it can retry the write?
That operation could be identical to the one we run periodically. And perhaps it could be invoked with a flag to indicate that it's being invoked manually and should definitely increase the map size, even if the environment doesn't look too full, since a large-enough write could fail in an environment that is mostly empty.
As for coordination, I suppose that whatever is invoking the resize operation could await existing readers while clearing stale ones and blocking new ones, then resize once existing readers are cleared, and finally unblocking the new readers (after which it can return control to whoever called it—kvstore or the consumer—to retry the write).
from rkv.
Agreed. Looks like kvstore is the perfect candidate to do this bookkeeping. Once this particular error gets handled properly, it'll be a big step forward for us to roll out rkv in production at scale!
Speaking of maintenance, LMDB also has a set of API for the environment clone (as mentioned in #12 ), which could be useful to eschew the fragmentation problem (like VACUUM). A frequently updated store could consume much more pages than a fresh clone.
With stale reader collector, resize utility, and live copy utility, I believe they'd make a solid maintenance lineup for kvstore.
from rkv.
This morning it occurred to me that we might implement this in rkv instead of kvstore, in which case rkv consumers more generally would gain the benefit of this automatic resizing. And doing that would be consistent with the goal of rkv to provide a "simple, humane" interface to LMDB. Unsure if there's any reason not to implement this in rkv, but I'll think about it some more.
@ncloudioj Do you have any thoughts about implementing this in kvstore vs. rkv?
from rkv.
My reasoning was based on that the consumer knows better than rkv regarding when&how to do the resize :)
Though I agree with you that letting rkv handle it for the consumers would be nice in most cases. If we were to implement it in rkv, I think we also need to consider following cases:
- rkv consumers might want to handle MPA_FULL by themselves. For example, they might either want a fixed size store, or do not want it to grow without bound. Perhaps leave auto-resize as an option?
- Off the top of my mind, coordination between readers and writer for resizing would be hard because rkv doesn't know anything about the active readers, such as when they will end the underlying transactions. I just realize that kvstore actually face the same challenge though. The key point is how can we efficiently send this need-to-resize message to all the rkv readers&writers.
from rkv.
It occurred to me recently that a reason to do this in rkv rather than kvstore is that even in Firefox there are consumers who are using rkv directly, such as the one in bug 1429796.
rkv consumers might want to handle MPA_FULL by themselves. For example, they might either want a fixed size store, or do not want it to grow without bound. Perhaps leave auto-resize as an option?
As with some other issues, like #109, a consumer might want more control over their interaction with LMDB under certain circumstances while appreciating the value of rkv's higher-level abstractions in other cases. And there's a tension between wanting to support those use cases while keeping the API simple for the common case.
I do want to support those use cases, and I agree that it should be possible for an rkv consumer to "opt-out" of auto-resize. I would just want to make sure we do so in a way that imposes the least cognitive burden on consumers in the common case.
Off the top of my mind, coordination between readers and writer for resizing would be hard because rkv doesn't know anything about the active readers, such as when they will end the underlying transactions. I just realize that kvstore actually face the same challenge though. The key point is how can we efficiently send this need-to-resize message to all the rkv readers&writers.
I suspect that rkv will have to track and manage active readers somehow, such that it can await them, while blocking new ones, when a resize is pending.
from rkv.
Indeed, there are pros&cons in both mechanisms. Looks like equipping rkv with auto-resize works better in more cases at the moment. So yes, I am convinced, and let's get started with that design.
To keep the initial work in a reasonable scope, I'd also like to revoke my prior suggestion of making the auto-resize optional. We can re-visit such feature if we find it useful in the future.
I suspect that rkv will have to track and manage active readers somehow, such that it can await them, while blocking new ones, when a resize is pending.
Yes, it's definitely doable in a single process scenario, but will be more challenging when active readers reside in different processes. Perhaps we can have an internal db (e.g. __meta__
) to store all the meta information in order to facilitate the coordination.
With this change, readers could also be blocked by the resize coordination. Though this should be easier to handle as we can return an error (such as RESIZE_IN_PROGRESS
) in the reader constructor.
from rkv.
Related Issues (20)
- Expose NotFound as a specific StoreError HOT 1
- Expose open_with_permissions
- Have rkv::error::StoreError implement std::error::Error HOT 1
- Consider adding a `Value` type for `u8` and `u128`.
- Trying to understand general goal/state/bugs of the project HOT 2
- Implement closing of stores via Manager HOT 2
- Reverse iterators
- lmdb::Error and semantic versioning
- Potential performance issue: using serde's generic sequence (de)serialization instead of serde's `bytes` support
- Using RKV inside struct implementation HOT 4
- Ensure backwards-compatibility for bincode-serialized data HOT 3
- Should an empty file be considered invalid?
- Travis CI free usage ends Dec 3; mozilla repos should switch to other CI platforms
- Examples on how to iterate backwards?
- Run cargo audit in CI
- No details for the LMDB crashes mentioned in the README HOT 7
- Remove LMDB backend HOT 2
- Build is failing on Rust beta/nightly HOT 1
- Make sure there is no parallel file write by `write_to_disk`
- Replace set_discard_if_corrupted with a set_corruption_recovery_strategy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rkv.