pmem / pmemkv Goto Github PK

View Code? Open in Web Editor NEW

397.0 34.0 120.0 8.17 MB

Key/Value Datastore for Persistent Memory

Home Page: https://pmem.io

License: Other

CMake 9.88% C++ 47.04% C 35.84% Shell 4.92% Perl 0.82% Roff 0.05% Groovy 1.46%

kv-store pmem pmdk pmemkv

pmemkv's Introduction

pmemkv

⚠️ Discontinuation of the project

The pmemkv project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
You will find more information here.

Introduction

pmemkv is a local/embedded key-value datastore optimized for persistent memory. Rather than being tied to a single language or backing implementation, pmemkv provides different options for language bindings and storage engines.

For more information, including C API and C++ API see: https://pmem.io/pmemkv. Documentation is available for every branch/release. For most recent always see (master branch):

Latest releases can be found on the "releases" tab.

There is also a small helper library pmemkv_json_config provided. See its manual for details.

Installation
Language Bindings
- C/C++ Examples
- Other Languages
Storage Engines
Benchmarks
Contact us

Installation

Installation guide provides detailed instructions how to build and install pmemkv from sources, build rpm and deb packages and explains usage of experimental engines and pool sets.

Language Bindings

pmemkv is written in C/C++ and can be used in other languages - Java, Node.js, Python, and Ruby.

C/C++ Examples

Examples for C and C++ can be found within this repository in examples directory.

Other Languages

The above-mentioned bindings are maintained in separate GitHub repositories, but are still kept in sync with the main pmemkv distribution.

Java - https://github.com/pmem/pmemkv-java
Node.js - https://github.com/pmem/pmemkv-nodejs
Python - https://github.com/pmem/pmemkv-python
Ruby - https://github.com/pmem/pmemkv-ruby

Storage Engines

pmemkv provides multiple storage engines that share common API, so every engine can be used with all language bindings and utilities. Engines are loaded by name at runtime.

Engine Name	Description	Experimental	Concurrent	Sorted	Persistent
cmap	Concurrent hash map	No	Yes	No	Yes
vsmap	Volatile sorted hash map	No	No	Yes	No
vcmap	Volatile concurrent hash map	No	Yes	No	No
csmap	Concurrent sorted map	Yes	Yes	Yes	Yes
radix	Radix tree	Yes	No	Yes	Yes
tree3	Persistent B+ tree	Yes	No	No	Yes
stree	Sorted persistent B+ tree	Yes	No	Yes	Yes
robinhood	Persistent hash map with Robin Hood hashing	Yes	Yes	No	Yes

The production quality engines are described in the libpmemkv(7) manual and the experimental ones are described in the ENGINES-experimental.md file.

pmemkv also provides testing engines, which may be used in unit tests or for benchmarking application overhead

Engine Name	Description	Experimental	Concurrent	Sorted	Persistent
blackhole	Accepts everything, returns nothing	No	Yes	No	No
dram_vcmap	Volatile concurrent hash map placed entirely on DRAM	Yes	Yes	No	No

Contributing a new engine is easy, so feel encouraged!

Benchmarks

Experimental benchmark based on leveldb's db_bench to measure pmemkv's performance is available here: https://github.com/pmem/pmemkv-bench (previously pmemkv-tools).

Contact us

If you read the blog post and still have some questions (especially about discontinuation of the project), please contact us using the dedicated e-mail: [email protected].

pmemkv's People

Contributors

Stargazers

Watchers

Forkers

gbuella kfilipek marcinslusarz luohao sarahjelinek roblatham00 legend147 believe7028 vinser52 follitude jjykh zhouyuan umn-cris a1exwang janekmi nvsl andiry denny-zhao jealous xuning97 nanne007 fsgeek jiajiesun yufeifly xuechendi lukaszstolarczuk yogeshpandey1 kalyanvangipurapu pujahavile hyj1991 guoanwu how759 jayashreemohan29 fantastic2085 ldorau szyrom michalbiesek pradeepfn annamarcink juno-kim thejinfei igchor lplewa kkajrewicz wangep kilobyte karczex kinderriven tidesq tianandrew pmem-bot kylewu11 virgilshi lynnapan yixunzhang jerryyangsh misterarslan ishank011 rallylee lamby sterbur meeeejin wubo009 abhineet99 fengjixuchui mengranwo chorig9 nedved1 moneytech jandorniak99 feiyunwill dodng wpleonardo ch2994 tide999 dimstav23 singularity0817 zkelohub global-localhost meetmrhung cdx08222028 ewanglong isabella232 4paradigm xjqbest yuan-luo ystaticy jimchenglin patkamin cxytt brucen1030 graceleeis liuxw7 persistentmemory poolmoo yux20000304 759713112 ftyghome jeffreymu liudyboy

pmemkv's Issues

Add decent README

See screedb README for inspiration

Pre-release software statement
Downloading and installing
Running tests
Related work (FPTree, nvml containers, pmse)
Architecture diagram?
Contributing (coding standards)

Sync up with pmse project structure

This project has a lot in common with pmse so let's sync things up a little better.

Move c++ files to src directory
New code beautifier settings?

Build/test with Intel compiler

Currently using gcc only

Each & Exists operations

See RocksDB for inspiration (https://github.com/facebook/rocksdb/wiki/Basic-Operations#iteration)

Not freeing nodes during shutdown

This leaks memory, but apparently not enough to crash the unit tests. It should be possible for an application to open/close multiple times without leaking any memory due to orphaned nodes. (Original benchmarks only opened the database once, so this didn't crop up early)

Performance analysis with vTune

Depending on the outcome, this might make a good blog post too?

Expose recover method

Inner node rebalancing is not yet implemented, so is there a temporary workaround that could be made available? One idea is to expose the initial recovery method so that the inner nodes could be periodically rebuilt, but this is basically the same as closing and re-opening the database. Is there a cleaner way to handle this until rebalancing is automatic?

Download dependencies at build time

Currently pmemkv has two library dependencies -- gtest and nvml. We'd like to use stable versions of these rather than the latest build, and not have to include any code from those upstream projects in the pmemkv repo. Currently gtest code is included and the build relies on whatever version of NVML is installed on the system (which may not be a stable version).

Exists operation

Currently this is done by doing a get where the value is ignored, but this has the overhead of copying the value string. This could be done by having a specific operation that just works against the keys. This could be done by a bloom filter (which would be KeyMayExist) or by checking the inner/leaf nodes (which would be KeyExists).

Power fail safety testing

Set up a long-running test that breaks the database in random places and verifies that all data is recovered properly afterward.

Show C example in README

Only showing C++ example at present

Batched updates

RocksDB has a Write operation where a batch can be passed in:

WriteBatch batch;
batch.Delete("key1");
batch.Put("key2", "value2");
s = db->Write(WriteOptions(), &batch);

Should we provide something similar? Where's the transaction boundary?

Options for caching of leaf attributes

Currently both keys and hashes are cached in the volatile leafnodes, which gives the fastest performance, but also leads to the highest DRAM usage. Options to avoid caching keys, or avoid caching both keys and hashes (as per FPTree), would be useful to have in addition to the current default policy. Should this be done with a conditional define (to make it easy to seek out the best options to lock in) or a configuration parameter to leave more choice to the customer?

Key partitioning (analogous to column families)

RocksDB has column families where keys/values can be namespaced into different logical groups.

Convert build to produce/use shared library

The current build system compiles pmemkv sources directly into example, stress, and test programs, and links those programs with NVML shared libraries (libpmem and libpmemobj). Here's the current output:

224K pmemkv_example
229K pmemkv_stress
764K pmemkv_test (includes gtest)

Better to emit a shared library (libpmemkv.a) that bundles up our sources along with libpmem and libpmemobj. This will reduce size of our test programs and make it so a developer using pmemkv needs to only include and manage libpmemkv as a dependency.

Both make and cmake builds should be converted to this style.

Here's the original build output for comparison:

radickin@radware-ubuntu:~/work/pmemkv$ make clean example
rm -rf /dev/shm/pmemkv /tmp/pmemkv pmemkv_example pmemkv_stress pmemkv_test
g++  src/pmemkv.cc src/pmemkv_example.cc -o pmemkv_example \
3rdparty/nvml/lib/libpmemobj.a 3rdparty/nvml/lib/libpmem.a -I3rdparty/nvml/src/include \
-O2 -std=c++11 -ldl -lpthread -lrt -std=c++11 -DOS_LINUX -fno-builtin-memcmp -march=native

radickin@radware-ubuntu:~/work/pmemkv$ make clean stress
rm -rf /dev/shm/pmemkv /tmp/pmemkv pmemkv_example pmemkv_stress pmemkv_test
g++  src/pmemkv.cc src/pmemkv_stress.cc -o pmemkv_stress \
3rdparty/nvml/lib/libpmemobj.a 3rdparty/nvml/lib/libpmem.a -I3rdparty/nvml/src/include \
-DNDEBUG -O2 -std=c++11 -ldl -lpthread -lrt -std=c++11 -DOS_LINUX -fno-builtin-memcmp -march=native

radickin@radware-ubuntu:~/work/pmemkv$ make clean test
rm -rf /dev/shm/pmemkv /tmp/pmemkv pmemkv_example pmemkv_stress pmemkv_test
g++  src/pmemkv.cc src/pmemkv_test.cc -o pmemkv_test \
3rdparty/nvml/lib/libpmemobj.a 3rdparty/nvml/lib/libpmem.a -I3rdparty/nvml/src/include \
3rdparty/gtest/src/gtest-all.cc -I3rdparty/gtest/include -I3rdparty/gtest \
-O2 -std=c++11 -ldl -lpthread -lrt -std=c++11 -DOS_LINUX -fno-builtin-memcmp -march=native

Incremental growth for persistent pool

This is not yet supported by NVML -- but alternatively we could consider using a collection of fixed-size pools instead.

Versioning strategy

What if internal format ever changes? Do we need a version flag stashed somewhere?

Empty string handling policy

Match RocksDB behavior, or other? This should cover both keys and values.

Document building on different distros

We've tried this out on Ubuntu and Fedora, but those steps haven't all been written down

Miscellaneous cleanup

There are a few things held over from the original prototype that I'd hoped to clean up before this goes out to anybody externally.

Duplicate key handling policy

Match RocksDB behavior, or other? This affects keys but not values.

Announcement blog post

Initial post on pmem.io

Multi-language API proposal

What would the pmemkv API look like if it were designed for use by high-level languages? (Node, Java, Ruby, Python, etc) Let's do some prototyping to give a sense of what's possible, and end up with a concrete proposal.

Make the library, and tests less dependent on a specific revision of pmemobj

Relying on a specific version of pmemobj results in such constructs in pmemkv_test:

const int LARGE_LIMIT = 6012299;

Since the cmake build already uses whatever version of pmemobj it finds using pkg-config, this can be a problem. Having the "wrong" version of pmemobj after doing a cmake build can result in such errors:

$ ./pmemkv_test --gtest_filter=KVTest.LargeAscendingTest
Note: Google Test filter = KVTest.LargeAscendingTest
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from KVTest
[ RUN      ] KVTest.LargeAscendingTest
/home/tej/code/pmemkv/src/pmemkv_test.cc:747: Failure
Value of: kv->Put(istr, (istr + "!")) == OK
  Actual: false
Expected: true
out of memory
[  FAILED  ] KVTest.LargeAscendingTest (26001 ms)

Add metadata method to public API

Currently KVTree has GetPath and GetSize methods to return metadata about the datastore, but this pattern will pollute the API if we add a bunch of these down the road. Instead let's replace those current methods with a single Metadata method that populates a struct with all those details.

struct KVTreeMetadata {                              // add
    string path;
    size_t size;
    size_t leaves;
    size_t nodes;
}
class KVTree {
  public:
    const string& GetPath();                         // remove
    const size_t GetSize();                          // remove
    void Metadata(const KVTreeMetadata& metadata);   // add

This will be very helpful for changes like #10 that affect internal structure and are easier to validate if more internal state can be exposed easily.

Use ASSERT_EQ over ASSERT_TRUE

We should be using ASSERT_EQ when it's relevant to see the value when the test fails. (Not that we shouldn't use ASSERT_TRUE, but be mindful this doesn't provide much context on failure)

Known issues in CMake build

CMake is partially supported, with a few issues that would have to be resolved to use it officially:

CMake build does not download third-party libraries, missing make thirdparty logic
Binaries and shared libraries from CMake build are larger than those produced by make

Resolve overlap between cmake/make

The build system currently has duplicate logic for downloading 3rd party libraries and building the shared library and test programs. This logic appears in both CMakeLists.txt and the Makefile (specifically the thirdparty and sharedlib targets).

The installation section of the README (https://github.com/pmem/pmemkv#installation) should be updated accordingly to fit the new approach.

Support binary-safe keys

Although pmemkv doesn't prevent binary data in key strings, we're using strcmp internally to compare keys (which leads to bad results if a null char appears in the middle of a key string).

Document operational procedures

Using pmempool utilities
Taking a snapshot
Backing up a database / restoring from backup
Moving a database to a new machine
Migrating between versions

Inner node rebalancing

This should be done incrementally (or by a background thread) in place of whatever temporary mechanism is introduced by #9.

Remove p<> wrappers for key/value strings

Since KVString manages memory internally (using pmemobj_tx_add_range_direct), then we don't need the p<> wrappers for these in KVLeaf. This would eliminate all get_ro and get_rw calls we're making.

The only tricky point is that we'll need to implement a KVString::swap method, something like this?

void KVString::swap(KVString* target) {
    if (target->str) {                                                   // target is long string
        if (str) {                                                       // local is long too
            target->str.swap(str);
        } else if (sso[0] == 0) {                                        // local is empty
            str = target->str;
            target->str = nullptr;
            pmemobj_tx_add_range_direct(target->sso, 1);
            target->sso[0] = 0;
        } else {                                                         // local is short
            str = target->str;
            target->str = nullptr;
            pmemobj_tx_add_range_direct(target->sso, SSO_SIZE);
            strcpy(sso, target->sso);
            pmemobj_tx_add_range_direct(sso, 1);
            sso[0] = 0;
        }
    } else {                                                             // target is short string
        if (str) {                                                       // local is long
            target->str = str;
            str = nullptr;
            pmemobj_tx_add_range_direct(sso, SSO_SIZE);
            strcpy(target->sso, sso);
            pmemobj_tx_add_range_direct(target->sso, 1);
            target->sso[0] = 0;
        } else if (sso[0] == 0) {                                        // local is empty
            pmemobj_tx_add_range_direct(sso, SSO_SIZE);
            strcpy(target->sso, sso);
            pmemobj_tx_add_range_direct(target->sso, 1);
            target->sso[0] = 0;
        } else {                                                         // local is short too
            char temp[SSO_SIZE];
            strcpy(sso, temp);
            pmemobj_tx_add_range_direct(sso, SSO_SIZE);
            strcpy(target->sso, sso);
            pmemobj_tx_add_range_direct(target->sso, SSO_SIZE);
            strcpy(temp, target->sso);
        }
    }
}

Document use of libmempool utilities

We're using libpmempool under the hood, but have not documented using any libpmempool utilities against the files generated by pmemkv.

Rename MultiGet to GetList

The MultiGet method is patterned after RocksDB...but our API may grow to support different types of batched reads down the road, so Get, GetList, GetMap, etc. is probably a better convention.

Process terminates if persistent allocation fails

If we run out of space, a special status code should be returned for the offending operation, rather than crashing the process.

radickin@radware-ubuntu:~/work/pmemkv$ make stress
g++  src/pmemkv.cc src/pmemkv_stress.cc -o pmemkv_stress \
3rdparty/nvml/lib/libpmemobj.a 3rdparty/nvml/lib/libpmem.a -I3rdparty/nvml/src/include \
-DNDEBUG -O2 -std=c++11 -ldl -lpthread -lrt -std=c++11 -DOS_LINUX -fno-builtin-memcmp -march=native
rm -rf /dev/shm/pmemkv
PMEM_IS_PMEM_FORCE=1 ./pmemkv_stress

Opening for writes
   in 18 ms
Inserting 1000000 values
terminate called after throwing an instance of 'nvml::transaction_alloc_error'
  what():  failed to allocate persistent memory array
Aborted (core dumped)
Makefile:24: recipe for target 'stress' failed
make: *** [stress] Error 134

Persistent pool size should be set via parameter

Size of the persistent pool was hardcoded in the initial prototype, so let's make this configurable.

How to specify? (parameter to constructor or other means?)
What is the valid range of sizes?
What if a different size is specified when opening an existing store?

KVEmptyTest.SizeofTest failure

I hit one unit test failure in the latest pmemkv (actually I notice this failure long time ago).
The size of KVInnerNode on my machine is indeed 112 since the struct is not packed, not sure why expecting 232.

sizeof(KVInneNode) = 8 + 8 + 8 + 5*8 + 6 * 8 = 112

Erro message:

[ RUN      ] KVEmptyTest.SizeofTest
pmemkv_test.cc:140: Failure
Value of: 232
Expected: sizeof(KVInnerNode)
Which is: 112
[  FAILED  ] KVEmptyTest.SizeofTest (0 ms)

Setup:

  - i7-4770 CPU
  - CentOS 7 with kernel 3.10.0-514.2.2.el7.x86_64
  - gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)

Broken image in README

Still using the broken version where "Value48" is repeated

Fast truncate operation

Efficient way to clear all persisted keys & their values

More important when using device DAX (#60), since formatting the entire device takes a long time. Truncating can be much faster since only the used portion of the device needs to be reset.

Prefix compression

There are two levels possible -- at the leaf node, and at the inner nodes.

SNAP integration for health metrics

Certainly we'll want to be able to monitor the health of these datastores at runtime, so SNAP?

Rename metadata method to analyze

Feedback on #34 in favor of renaming KVTree::Metadata to KVTree::Analyze:

'Metadata' is not a verb, but all the other KVTree methods are actionable verbs
'Analyze' better informs that this could be a long-running/blocking action

pmkv_test should use ASSERT_TRUE, not assert

I'm guessing this was a copy/paste error by yours truly, but all gtests should be using ASSERT_TRUE for test assertions. A failed assert causes the entire test program to stop at that point, rather than just reporting the single test failed and completing the rest of the test suite.

Free key/value memory when marking slot inactive

Currently a delete sets the hash for a slot to zero but the actual key/value strings stored in persistent memory aren't freed. This was intentional for the early stage prototype (to benchmark the smallest transaction, which is flipping a single byte) -- and using an asynchronous thread to free persistent memory might have unintended consequences. Let's start with a simple version that doesn't leak (ie. freeing key/value as an inline part of the delete operation) and go from there.

Are opened/closed counts necessary?

These are being maintained but not used except to log values...if there isn't any special logic that actually triggers on a count mismatch, then maybe these should be removed?

Thread-safe implementation

Lock eliding using TSX? (assuming that key collisions will be rare)
Background maintenance thread?
Lots of new tests obviously

Zero-copy key rename operation

Since pmemkv uses a zero-copy strategy for splitting persistent leaves, can we do the same for key rename? Many other kv-stores implement rename as a remove followed by a put, but this is painful if we're copying large values to do so.

Not handling recovery of empty leaves

Leaves without any keys present aren't added to the recovery list, so these leaves are leaked.

Add random insert/update/delete tests

We're testing the Get method for both sequential and random access, so let's do the same with the other operations. It's also worth checking some of our base assumptions (like inner node size) in the context of random operations. Also test missed get (worst-case) performance.

Model hash+key+value as single slot object in KVLeaf

This would reduce the number of allocations per leaf, and reduce the number of swap-or-copy operations duing leaf splitting. (these are both pain points today)