Giter VIP home page Giter VIP logo

rkv's Introduction

rkv

CI Build Status Documentation Crate

The rkv Rust crate is a simple, humane, typed key-value storage solution. It supports multiple backend engines with varying guarantees, such as LMDB for performance, or "SafeMode" for reliability.

⚠️ Warning ⚠️

To use rkv in production/release environments at Mozilla, you may do so with the "SafeMode" backend, for example:

use rkv::{Manager, Rkv};
use rkv::backend::{SafeMode, SafeModeEnvironment};

let mut manager = Manager::<SafeModeEnvironment>::singleton().write().unwrap();
let shared_rkv = manager.get_or_create(path, Rkv::new::<SafeMode>).unwrap();

...

The "SafeMode" backend performs well, with two caveats: the entire database is stored in memory, and write transactions are synchronously written to disk (only on commit).

In the future, it will be advisable to switch to a different backend with better performance guarantees. We're working on either fixing some LMDB crashes, or offering more choices of backend engines (e.g. SQLite).

Use

Comprehensive information about using rkv is available in its online documentation, which can also be generated for local consumption:

cargo doc --open

Build

Build this project as you would build other Rust crates:

cargo build

Features

There are several features that you can opt-in and out of when using rkv:

By default, db-dup-sort and db-int-key features offer high level database APIs which allow multiple values per key, and optimizations around integer-based keys respectively. Opt out of these default features when specifying the rkv dependency in your Cargo.toml file to disable them; doing so avoids a certain amount of overhead required to support them.

To aid fuzzing efforts, with-asan, with-fuzzer, and with-fuzzer-no-link configure the build scripts responsible with compiling the underlying backing engines (e.g. LMDB) to build with these LLMV features enabled. Please refer to the official LLVM/Clang documentation on them for more informatiuon. These features are also disabled by default.

Test

Test this project as you would test other Rust crates:

cargo test

The project includes unit and doc tests embedded in the src/ files, integration tests in the tests/ subdirectory, and usage examples in the examples/ subdirectory. To ensure your changes don't break examples, also run them via the run-all-examples.sh shell script:

./run-all-examples.sh

Note: the test fixtures in the tests/envs/ subdirectory aren't included in the package published to crates.io, so you must clone this repository in order to run the tests that depend on those fixtures or use the rand and dump executables to recreate them.

Contribute

Of the various open source archetypes described in A Framework for Purposeful Open Source, the rkv project most closely resembles the Specialty Library, and we welcome contributions. Please report problems or ask questions using this repo's GitHub issue tracker and submit pull requests for code and documentation changes.

rkv relies on the latest rustfmt for code formatting, so please make sure your pull request passes the rustfmt before submitting it for review. See rustfmt's quick start for installation details.

We follow Mozilla's Community Participation Guidelines while contributing to this project.

License

The rkv source code is licensed under the Apache License, Version 2.0, as described in the LICENSE file.

rkv's People

Contributors

asyade avatar badboy avatar eijebong avatar emilio avatar glandium avatar mbrubeck avatar mozilla-github-standards avatar mykmelez avatar mythmon avatar ncloudioj avatar ordian avatar piatra avatar rnewman avatar rrichardson avatar saschanaz avatar simonsapin avatar upsuper avatar victorporof avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rkv's Issues

Placeholder: Firefox developer tooling

(Placeholder because this eventually belongs in Bugzilla.)

It would be useful for developers to be able to interact with rkv files via the Firefox developer tools, as well as via the JS API in the console.

order "create" consistently in functions that get_or_create/create_or_open

When retrieving or creating a new environment handle, the name of the Manager method is get_or_create (the word "create" appears second); but when retrieving or creating a new store handle, the name of the Rkv method is create_or_open (the word "create" appears first).

We should make these functions (and others that either get or create a thing) use a consistent ordering of those words.

(We might also want to use get/open consistently, although I'm open to arguments that environments and stores are different, and it makes sense to get the former and open the latter.)

figure out what to do about stale readers

LMDB > Caveats notes:

A broken lockfile can cause sync issues. Stale reader transactions left behind by an aborted program cause further writes to grow the database quickly…

Fix: Check for stale readers periodically, using the mdb_reader_check function or the mdb_stat tool.

We should figure out what to do about stale readers: whether to clear them out periodically ourselves or make this the responsibility of the consumer (exposing an API for them to do so).

Incorrect description about reading the uncommitted writes in the same transaction

A write transaction also supports reading, but the version of the store that it reads doesn't include changes it has made.

This seems not correct, at least running the code snippets shows the opposite result. Within the same write transaction, all the writes should be immediately visible regardless of the commit state of that transaction.

Thanks @piatra for pointing out this by noticing the discrepancy between the document and the actual code result!

Deadlock: trying to open a store while a writer is in progress

Ran into this when I was testing #58, following snippet can reproduce the deadlock.

let store = rkv.open_or_create("store");
let writer = store.write();
let another_store = rkv.open_or_create("another_store");

The deadlock was caused by the fact that LMDB doesn't allow two write transactions running at the same time. In this case, the writer is a wrapper of RwTransaction. Under the hood, open_or_create will create another write transaction to open a store, it'll hang forever since writer will never get a chance to abort or commit. So the solution would be either open all the stores before spawning any writer, or commit the writer and then open other stores.

We should at least document this for Store to avoid this kind of deadlock.

Question about locks

Hi, quick question.

Why do Read and Write locks are taken on the Rkv environment instead of the store? It seems like although there can be multiple stores withing same environment, the locks are global. Does it mean that I cannot have two threads writing to different databases?

Live backup

LMDB supports a safe atomic backup operation. We should expose this functionality.

figure out what to do about a stale writer

LMDB > Caveats notes:

A broken lockfile can cause sync issues… stale locks can block further operation.

Stale writers will be cleared automatically on some systems:

  • Windows - automatic
  • Linux, systems using POSIX mutexes with Robust option - automatic
  • not on BSD, systems using POSIX semaphores. Otherwise just make all programs using the database close it; the lockfile is always reset on first open of the environment.

We should figure out what to do about a stale writer on systems that don't clear it automatically.

C FFI

In order for rkv to be usable within Gecko and from Swift and Android, it needs a C FFI on which C++/Swift/Java APIs can be built. This issue tracks defining those FFIs.

provide mechanism for consumers to migrate DB files between 32- and 64-bit builds

rkv should provide some mechanism for consumers to migrate database files between 32-bit and 64-bit builds, since LMDB 0.9 files are bit-width-dependent, and users sometimes switch between 32-bit and 64-bit builds of software that uses rkv or copy database files from a 32-bit to a 64-bit system.

This blog post about The LMDB file format notes that it's possible to compile a 32-bit build of mdb_dump and use that on a 64-bit system to dump a 32-bit database file to a portable format that can be reloaded into a new file. It also references this Lua reimplementation of the mdb_dump utility that can read 32-bit database files on a 64-bit system (and presumably vice-versa).

Note that the database format on the LMDB master branch (i.e. the development version that is slated to become LMDB 1.0) is bit-width-independent, so this issue won't exist there. But if we upgrade rkv to LMDB master/1.0 in the future, then we'll have to deal with database migration from the 0.9 format to the 1.0 format. So we can't avoid this issue by upgrading the version of LMDB we embed.

consider making Value optional

Currently Rkv provides 4 types of Stores: Single, Multi, Integer and MultiInteger. All of them require the value to be of type Value, which imposes certain overhead, since the values must be encoded and decoded (and copied). This can be undesirable if the user only uses Blob type for values.

In general, it feels like this type (Value) and its encoding logic (or compression) should be specific for each user and I don't quite understand why it is here. I could use lmdb-rkv, but it would be nice not to deal with UNC paths on Windows and restrictions like one environment per process per path. Consider adding methods do deal with &[u8] values instead of Value.

Observer notifications

We should design and implement a system for watching particular keys, or stores as a whole. Most likely we should not notify values — the recipient can read directly from the store.

See mozilla/mentat#551 for guidance.

prevent applications from opening both named databases and default database

LMDB stores key-value pairs for named databases in the default database, which makes it dangerous for an application to open both named databases and the default database within the same environment using rkv, as the default database will contain pairs they didn't add, and those pairs cannot be read by rkv (because they aren't formatted the way rkv expects, i.e. by using bincode to serialize Rust values to bytes).

Thus we should prevent applications from opening both named databases and the default database within the same environment via a compile time (ideally) or runtime error.

Documentation: tuning and modeling guide

As a relatively flexible piece of infrastructure, rkv/LMDB will benefit from a tuning/usage guide. This will cover the following (and more):

  • Versioning parts of your data (see also #7).
  • Configuring store sizes and sharing between components.
  • Managing key options (dupes, key max sizes, key types, integer keys).
  • Avoiding copies where necessary. (Rather than decoding the value, return &[u8].)
  • Batching writes, transactions, etc.
  • Iterating over keys in order.
  • Controlling warming and liveness via background threads.
  • WRITEMAP, fsyncs, and other durability/performance/safety tradeoffs.
  • Synchronous, asynchronous, and background access.
  • Correct use from multiple threads and processes.

committing transaction returns lmdb::Error instead of StoreError

The recent changes in #101 directly expose the lmdb::RoTransaction and lmdb::RwTransaction types instead of wrapping them in Reader and Writer types when calling Rkv.read() and Rkv.write().

This has the side-effect of also exposing the lmdb::Error type when calling R[o|w]Transaction.commit(), which was previously converted into a StoreError by the Reader.commit() and Writer.commit() functions.

And that's inconsistent with most of the other functions in the public API, including Environment.begin_r[o|w]_transaction() and the various Store::[Single|Multi|etc.]Store functions, which all wrap an lmdb::Error in a StoreError::LmdbError.

It's also obviously inconsistent with any other function that returns another type of StoreError, and it means that consumers of rkv need to handle both the StoreError type and the underlying lmdb::Error type.

We should ensure that the public API returns StoreError consistently to indicate failure.

Consider replacing '&Store' with 'Store' in the function calls

Since type Store is essentially Copy-able, all its occurrences in the function calls could be passed by value instead of by reference.

It'll be consistent with the API definitions in LMDB. Clippy also suggests this change for the efficiency purpose.

Documentation: selection guidance

We should write some helpful words about why you might use this system, why you might not, and point into the tuning docs (#4) to illuminate the space of the former.

integrate clippy in some fashion

Over in #65, @ncloudioj fixed some clippy nits. We should consider integrating clippy in some fashion to reduce the risk of introducing more such nits.

Per https://github.com/rust-lang-nursery/rust-clippy, "Since this is a tool for helping the developer of a library or application write better code, it is recommended not to include Clippy as a hard dependency. Options include using it as an optional dependency, as a cargo subcommand, or as an included feature during build. These options are detailed below."

I'm unsure which of these options is best.

determine how to access multiple stores within a single transaction

Over in #42, I noted that "it's unclear if/how it's possible to open multiple stores within a single transaction, which LMDB itself supports," and @ncloudioj responded:

The current rkv::Store abstraction doesn't support that because it wraps the transaction into the Reader/Writer. To support multi-store reads/writes, it needs to take the transaction out from store, perhaps something like,

let txn = rkv.write();
let store_foo = rkv.create_or_open("foo");
let store_bar = rkv.create_or_open("bar");
store_foo.write(txn, "key0", "value0");
store_bar.write(txn, "key1", "value1");
txn.commit();

The downside is that users can't use the Reader/Writer any more. Another potential approach, which reuses the design of Writer/Reader, is introduce a MultiStore so that multiple stores could be get_or_created at the same time in a single transaction. Its read/write API will be slightly different,

let store_names = vec!["foo", "bar", "baz"];
let mega_store = rkv.create_or_open(&store_names);
let writer = mega_store.write();
writer.write("foo", "key0", "value0"); // it takes a store name here
writer.write("bar", "key1", "value1");
writer.commit();

This is a tough problem. The latter approach feels a bit more intuitive and is also likely to be more compact, provided store names are short; whereas the former grows a line for each store involved in the transaction.

On the other hand, the former approach has the advantage of being more strongly typed, because stores are referenced by handle after creation, so it isn't possible to compile code that opens the "foo" and "bar" stores and then writes to the "baz" store; whereas the latter approach will happily compile that code (and then fail at runtime).

Also note #29, although #28 (comment) suggests that I didn't actually understand LMDB database handles when I filed it, and it's the wrong thing to do.

I'm also puzzling over the requirement that LMDB database handles be opened with reference to a specific transaction but can then be reused by any other transaction, as described in the docs for mdb_dbi_open, which additionally notes:

The database handle will be private to the current transaction until the transaction is successfully committed. If the transaction is aborted the handle will be closed automatically. After a successful commit the handle will reside in the shared environment, and may be used by other transactions.

This function must not be called from multiple concurrent transactions in the same process. A transaction that uses this function must finish (either commit or abort) before any other transaction in the process may use this function.

However, lmdb-rs appears to manage those constraints by acquiring a mutex and creating/committing a throwaway transaction in Environment::create_db, so that shouldn't be an issue.

Regardless, from browsing the LMDB docs, it seems like the intent is for handles to stores to be long-lived, so perhaps the former approach is better, even though it doesn't let you use Reader/Writer, as it requires you to explicitly create the handles, which also enables you to reuse them.

Or perhaps even better is a related approach in which Rkv::write returns a non-store-specific Writer rather than an lmdb::RwTransaction (ditto for Rkv::read; it returns a Reader), and it has a put method that takes a store handle rather than a store name, i.e. something like:

let store_foo = rkv.create_or_open("foo");
let store_bar = rkv.create_or_open("bar");
let writer = rkv.write();
writer.put(store_foo, "key0", "value0");
writer.put(store_bar, "key1", "value1");
writer.commit();
// store_foo and store_bar can be reused to read
let reader = rkv.read();
reader.get(store_foo, "key0");
reader.get(store_bar, "key1");

(This has the added advantage that we no longer return the low-level RoTransaction and RwTransaction lmdb-rs structs from rkv methods.)

@ncloudioj What do you think?

Documentation: errors

We should check that our error hierarchy is clear and understandable, and make sure it's documented well.

Adding mechanisms to gracefully handle "map is full" error

Each lmdb store has a predefined size (10MB by default), if a store is running out of free space due to,

  • Store was filled by the data
  • Some orphan transactions prevented lmdb from reclaiming the unused pages

either way, all the following inserts will be rejected by lmdb with a MDB_MAP_FULL error.

We will have to provider some bailout mechanisms for this particular issue to avoid the write downtime.

  • Resizing the store, this requires all users terminate their transactions (read/write) first, also, once the size gets increased, there is no way to shrink it unless recreating a new one and copying all the data over
  • To let lmdb reclaim the unused pages, we need to ensure there is no orphan holding locks in lmdb's transaction table. Lmdb has a API (mdb_reader_check) to clean up those zombie transactions, looks like this API was not exposed by ldmb-rs, we might have to add that to upstream first.

lifetimes restricting get->set from a write txn

I have a function where I'd like to fetch a value from one key, then use that retrieved value to delete another key. I'm sure there is a way around this, but I can't find it.
The problem seems to be that the value I retrieve has a narrower lifetime than the writer, and I can't seem to convince it otherwise:

    pub fn del_by_txn(&self,
                      writer: &mut Writer<&str>,
                      store: Store,
                      name: &str,
                      key: &str) -> Result<(), MegadexError> {
        let idstore = self.indices.get(name).ok_or(MegadexError::IndexUndefined(name.into()))?;
        match writer.get(idstore, key)? {
            Some(Value::Str(ref id)) => writer.delete(&self.main, id).map_err(|e| e.into()),
            None => return Ok(()),
            e => return Err(MegadexError::InvalidType("Str".into(), format!("{:?}", e))),
        }
    }

results in :

error[E0623]: lifetime mismatch
   --> megadex/src/lib.rs:142:67
    |
136 |    writer: &mut Writer<&str>,
    |                 ------------ these two types are declared with different lifetimes...
...
142 |             Some(Value::Str(ref id)) => writer.delete(&self.main, id).map_err(|e| e.into()),
    |                                 ^^ ...but data from `writer` flows into `writer` here

I've tried making the function generic for K : AsRef<[u8]> like the prototype of Writer, and set key to type K, however that fails because I can't make id retrieved equal type K. If I am explicit about the types, per above.. I can't seem to make data from writer flow into writer

Thorough support for versioning

We will have at least three different kinds of version.

  • The disk format itself, which will be tied to the version of LMDB. Failures here will signal MDB_VERSION_MISMATCH.
  • A storage format version, which will infrequently change along with rkv itself. A change in how we represent values, in how we store metadata, or capabilities (e.g., encryption, file locking) might require us to lock out old clients who are still able to read the disk format.
  • One or more domain versions. These will be managed by consumers: they're analogous to SQLite's PRAGMA user_version. Typical uses will be to document and alter the assumptions of consuming code, to track migrations, and to lock out buggy clients. One can imagine multiple consumers using the same database file, each with their own key space and version number.

All three of these will be present in the API and in documentation.

many test failures when using Windows Subsystem for Linux

On Windows, when using Windows Subsystem for Linux (with the Ubuntu distro), I see a bunch of test failures:

$ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 5.96s
     Running target/debug/deps/rkv-5955ce6badfae226

running 22 tests
test env::tests::test_concurrent_read_transactions_prohibited ... ok
test env::tests::test_blob ... FAILED
test env::tests::test_delete_value ... FAILED
test env::tests::test_isolation ... FAILED
test env::tests::test_iter ... FAILED
test env::tests::test_iter_from_key_greater_than_existing ... FAILED
test env::tests::test_multiple_store_iter ... FAILED
test env::tests::test_multiple_store_read_write ... FAILED
test env::tests::test_open ... FAILED
test env::tests::test_open_a_missing_store ... ok
test env::tests::test_open_fail_with_badrslot ... ok
test env::tests::test_open_fails ... ok
test env::tests::test_open_from_env ... FAILED
test env::tests::test_open_store_for_read ... FAILED
test env::tests::test_open_with_capacity ... FAILED
test env::tests::test_read_before_write_num ... FAILED
test env::tests::test_read_before_write_str ... FAILED
test env::tests::test_round_trip_and_transactions ... FAILED
thread '<unnamed>' panicked at 'test integer::tests::test_integer_keys ... written: LmdbError(Corrupted)FAILED',
libcore/result.rs:test manager::tests::test_same ... 945ok:
5
test manager::tests::test_same_with_capacity ... thread 'ok<unnamed>
' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>' panicked at 'rkv: "PoisonError { inner: .. }"', libcore/result.rs:945:5
thread '<unnamed>test env::tests::test_store_multiple_thread ... ' panicked at 'FAILEDrkv: "PoisonError { inner: .. }"
',
failures:
libcore/result.rs
:---- env::tests::test_blob stdout ----
thread 'env::tests::test_blob' panicked at 'read: LmdbError(BadTxn)', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

---- env::tests::test_delete_value stdout ----
thread 'env::tests::test_delete_value' panicked at 'wrote: LmdbError(BadTxn)', libcore/result.rs:945:5
note: Panic did not include expected string 'not yet implemented'
---- env::tests::test_isolation stdout ----
thread 'env::tests::test_isolation' panicked at 'wrote: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_iter stdout ----
thread 'env::tests::test_iter' panicked at 'Unexpected LMDB error BadTxn.', /home/myk/.cargo/registry/src/github.com-1ecc6299db9ec823/lmdb-rkv-0.8.2/src/cursor.rs:263:17

---- env::tests::test_iter_from_key_greater_than_existing stdout ----
thread 'env::tests::test_iter_from_key_greater_than_existing' panicked at 'wrote: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_multiple_store_iter stdout ----
thread 'env::tests::test_multiple_store_iter' panicked at 'opened: LmdbError(Corrupted)', libcore/result.rs:945:5

---- env::tests::test_multiple_store_read_write stdout ----
thread 'env::tests::test_multiple_store_read_write' panicked at 'opened: LmdbError(Corrupted)', libcore/result.rs:945:5

---- env::tests::test_open stdout ----
Root path: "/tmp/test_openBkPsg4"
thread 'env::tests::test_open' panicked at 'success but no value: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_open_from_env stdout ----
Root path: "/tmp/test_open_from_envuAftlz"
thread 'env::tests::test_open_from_env' panicked at 'success but no value: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_open_store_for_read stdout ----
thread 'env::tests::test_open_store_for_read' panicked at 'write: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_open_with_capacity stdout ----
Root path: "/tmp/test_open_with_capacityvY2MVw"
thread 'env::tests::test_open_with_capacity' panicked at 'success but no value: LmdbError(BadTxn)', libcore/result.rs:945:5
note: Panic did not include expected string 'opened: LmdbError(DbsFull)'
---- env::tests::test_read_before_write_num stdout ----
thread 'env::tests::test_read_before_write_num' panicked at 'read: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_read_before_write_str stdout ----
thread 'env::tests::test_read_before_write_str' panicked at 'read: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_round_trip_and_transactions stdout ----
thread 'env::tests::test_round_trip_and_transactions' panicked at 'wrote: LmdbError(BadTxn)', libcore/result.rs:945:5

---- integer::tests::test_integer_keys stdout ----
thread 'integer::tests::test_integer_keys' panicked at 'write: LmdbError(BadTxn)', libcore/result.rs:945:5

---- env::tests::test_store_multiple_thread stdout ----
thread 'env::tests::test_store_multiple_thread' panicked at 'joined: Any', libcore/result.rs:945:5

945
failures:
:    env::tests::test_blob
5    env::tests::test_delete_value

    env::tests::test_isolation
    env::tests::test_iter
    env::tests::test_iter_from_key_greater_than_existing
    env::tests::test_multiple_store_iter
    env::tests::test_multiple_store_read_write
    env::tests::test_open
    env::tests::test_open_from_env
    env::tests::test_open_store_for_read
    env::tests::test_open_with_capacity
    env::tests::test_read_before_write_num
    env::tests::test_read_before_write_str
    env::tests::test_round_trip_and_transactions
    env::tests::test_store_multiple_thread
    integer::tests::test_integer_keys

test result: FAILED. 6 passed; 16 failed; 0 ignored; 0 measured; 0 filtered out

I don't see these on Windows outside of WSL, however; nor on Ubuntu running outside of Windows (in a virtual machine on a macOS host).

Command-line tooling

Likely drawing on #2, it would be useful for developers to be able to interact with rkv files directly.

Wiki changes

FYI: The following changes were made to this repository's wiki:

These were made as the result of a recent automated defacement of publically writeable wikis.

Bincode::serialize generates much bigger results on String types

Noticed this when I was investigating this TODO item. The current serialization mechanism (serialize a two-element tuple i.e. (type, value)) seems to introduced a significant amount of overheads on the String type Values.

Here is some examples:

serialize(&(1u8, true)).len() -> 2 // actual size: 2
serialize(&(2u8, 1e+9).len() -> 9 // actual size: 9 (1 + 8)
serialize(&(3u8, "hello world".to_string())).len() -> 20 // actual size: 12 (1 + 11)
serialize(&(4u8, "4dd69e99-07e7-c040-a514-ccde0cfd4781".to_string())).len() -> 45 // actual: 37 (1 + 36)

Unsure if it was caused by the padding, or by the serializations. But I think it's worth a further investigation.

Alternatively, we can just write the Type and Value directly to a buffer, then pass the result to put function. For big Values, we can avoid the double allocation by leveraging the "MDB_RESERVE" feature, which basically reserves enough space for the value, and return the buffer so that the user can populate the buffer afterwards. The following snippets illustrate the basic idea,

fn put(&self, key, value) {
    // say BIG_VALUE_THRESHOLD = 32
    let length = ::std::mem::size_of_value(&value) + 1;  // value size + type size

    if length < BIG_VALUE_THRESHOLD {
        let buf = [u8, BIG_VALIE_THRESHOLD];
        buf.write_u8(&type);
        buf.write_all(&value);
        self.txn.put(&k, &buf[..length]);
    } else {
        let mut reserved_buf = self.txn.reserve(&k, length);
        reserved_buf.write_u8(&type);
        reserved_buf.write_all(&value);
    }
}

Support lmdb write flags (HOWTO)

Lmdb supports a wide range of write flags to change the default behavior when issuing writes to the store. Currently, rkv::readwrite::Writer passes the default write flag to its put function, which simply overwrites the value if the key is already in the store.

One solution could be just exposing all the write flags from lmdb, and let developers decide which one to use, the upside is apparently it offers a great deal of flexibility, the downside is they will need to know all the store types and its corresponding write flags in lmdb. Misusing them may incur some undesired behaviors, or even worse corrupt the store.

The other way to handle the write flags is abstract them away by providing a few stores instead, with each store has its own semantics on put/get/delete/cursor. Such as:

  • Store, just a dumb k/v store as what you'd expect in JS, Rust, or Python
  • DupStore, which supports dup keys, aside from those APIs in Store, it might also have mput, mget to insert or get multiple values for the same key

The advantage is that developers do not need to know the underlying details of lmdb, just treat them as some persistent k/v stores. Obviously, they will lose the fine-grained control on the store, and perhaps some performance losses.

Some design decisions need to be made before taking actions to implement it. Given that one of rkv's design goals is to smooth out lmdb's rough edges, I am more inclined to the second plan.

@mykmelez thoughts?

Unify `*Reader` and `*Writer`

Presently there are 2 transaction types (plus the proposed Multi*) for both Read and Write transactions.

There doesn't have to be, though. In fact, it's causing problems when I want to execute a transaction over different types of Stores

I see two options:
The easiest option is pull the functions from Integer* and Multi* into Reader and Writer, and suffixing them with _int and _multi. They will take IntStore and MultiStore as parameters respectively.

The harder, but perhaps cleaner option is a significant chunk of refactoring:
Move the accessor functions into the *Store structs themselves. Make the the functions accept either a Reader or a Writer. This way we'll have more ergonomic and intuitive methods like:

IntStore::put<I: PrimitiveInt>(txn: Writer, id: I, val: Value) -> Result<(), StoreError>
MultiStore::get(txn: Reader, id: K) -> Result<Iter<Value>, StoreError>

For maximum modularity.. I would make a new trait: ReadTransaction which can be implemented by both Reader and Writer.. since you can fetch values in Write transactions as well.

order "path" parameter consistently in Rkv static constructors

The Rkv static constructor functions new, with_capacity, and from_env all take a path parameter, but they don't order it consistently: it's the first parameter for new and with_capacity and the second parameter for from_env. We should make its order consistent across all three functions, which presumably means making it the first parameter for from_env as well.

hide lmdb::RoCursor type from consumers of Reader/Writer structs

The Readable trait in readwrite.rs currently leaks the lmdb::RoCursor type, and we generally don't want to show consumers the types from the lmdb crate dependency (preferring to encapsulate them in higher-level rkv types). We should figure out a way to hide that type from consumers of the Reader/Writer structs that implement the Readable trait.

support iterators and ranged lookups

In order to be able to iterate keys, and do so from an arbitrary point in the key space, rkv should expose LMDB's support for iterators and ranged lookups (behind humane abstractions as appropriate).

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please see Mozilla-GitHub-Standards or email [email protected].

(Message COC001)

Implemenet clear/drop on stores

LMDB provides mdb_drop with two flavors:

  • Clear all the kv pairs in the given store and keep the store
  • Drop the store, which truncates the store and also deletes it in the environment

lmdb-rs has two separate functions for this, txn.clear_db and txn.drop_db, respectively. The latter is marked as unsafe, because the underlying store will be unsafe to use after the call. Rkv can't enforce that, and it's all up to the consumers.

@mykmelez For kvstore, shall we focus on the common case "clear" for now?

document various limitations of LMDB

We might want to document various limitations of LMDB in order to offer the "least surprise" for the rkv users. Off top of my head, LMDB has following limitations:

  • Max key size (MDB_MAXKEYSIZE = 511 bytes), it also applies to the value for the dupsort store. Note that it's a compile-time configuration, and can't be changed at runtime
  • Max environment size (default as 10 MB), once the environment is full, following writes will fail
  • Max number of database (default as 5), hosting a moderate number (say up to a few dozens) of databases in a single environment is fine, however, hosting too many has both memory and performance impact
  • Max readers (default as 126), the maximum readers/threads allowed to access an LMDB environment. Passing this limit will end up failing to create readers
  • Too many writes in a single transaction. Unsure what exactly the maximum writes is, but LMDB may complain while conducting a bulk load write in a single transaction. To workaround this, we can periodically commit a transaction and create a new one for the rest of writes.

enable management of environments with non-default configurations

There are currently three Rkv methods that create environments: new, with_capacity, and from_env. If one consumer opens an environment using Rkv::new (which uses the default capacity of 5 databases), and then a second consumer tries to open the same environment with Rkv::with_capacity(10), then the Manager's RwLock will serialize those calls, but what should the result of the second call be? If we return the cached environment, then we're returning an environment with a different capacity than the consumer requested.

At the moment, we actually avoid this problem, since Manager::get_or_create only accepts an Rkv::* callback that takes a single parameter, which means it only accepts Rkv::New, since Rkv::with_capacity and Rkv::from_env both take two parameters. But that actually raises a new issue: how do you use the Manager to protect an environment with a non-default configuration?

We should rethink the way we manage environments to enable managing environments with non-default configurations. In the process, we'll need to figure out what to return when a consumer tries to get_or_create an environment with a different configuration than an already-cached environment for the same path.

Blob support

Just about anything that we can get as u8s is something we can store…

This would be in incompatible version bump to add the type signature.

Add a JSON export API

It would be useful for debugging and testing to be able to dump an entire database as JSON.

This should be relatively easy to implement: all of our types work with Serde, so we would 'just' need to make an iterator over keys and values serialize as a container.

Finish manager interface

In order to maintain LMDB's requirement that each database is opened only once at a time in each process, we have a manager that canonicalizes paths and maintains a set of open databases.

This interface needs to be finished:

  • We need a mechanism for closing stores when we're done with them. I don't think it's enough to simply remove the Arc from the map: that would allow duplicate opens. It might be enough to use Weak instead of Arc, which will automatically close a database if it isn't referenced by any consumer.
  • The manager doesn't expose the same builder API that direct opening supports. Instead one supplies a closure to do the work. We might be able to smooth this out a little.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.