utsaslab / recipe Goto Github PK

View Code? Open in Web Editor NEW

194.0 10.0 44.0 6.93 MB

RECIPE : high-performance, concurrent indexes for persistent memory (SOSP 2019)

License: Apache License 2.0

Makefile 1.04% C 5.56% C++ 89.91% Python 0.83% Shell 0.38% CMake 2.29%

indexing data-structures concurrency concurrent-data-structure persistent-data-structure persistent-memory ycsb

recipe's People

Contributors

Stargazers

Watchers

recipe's Issues

Search non-existing keys in P-Masstree

Hi, when I invoke a get for non-existing keys in P-Masstree, "should not enter here....." is printed.

code:

  masstree::masstree *tree = new masstree::masstree();
  auto t = tree->getThreadInfo();
  char *str = "helloworld";
  tree->get(str, t);

printed information:

should not enter here
fkey = rowolleh, key = 7522537965574647666, searched key = 0, key index = -1

Read committed

Current implementations only ensure the lowest level of isolation (Read Uncommitted) for some read operations such as scan, negative lookup, and verification for value existence, since they are based on normal CASs or temporal stores coupled with cache line flush instructions. However, it is not the fundamental limitation of RECIPE conversions. You can easily extend them, following RECIPE conversions, to guarantee the higher level of isolation (Read Committed) by replacing each final commit stores (such as pointer swap) coupled with cache line flushes with non-temporal stores coupled with memory fence for lock-based implementations including P-CLHT, P-HOT, P-ART, and P-Masstree. For lock-free implementations such as P-Bwtree, you can either add additional flushes only after loads to final commit stores or replace volatile CASs coupled with cache line flush instructions with alternative software-based atomic-persistent primitives such as either Link-and-Persist (paper, code) or PSwCAS (paper, code).

Port RECIPE data structures to libpmem

This issue involves porting the RECIPE data structures to work on libpmem. For example, converting P-CLHT to a form that uses the libpmem pointers and allocation routines.

Crash consistency issue after acquiring bucket locks

Bug

Exposed by crashing after acquiring a lock from clht_put.

RECIPE/P-CLHT/include/clht_lb_res.h

Lines 306 to 312 in fc508dd

 static inline int 

 lock_acq_chk_resize(clht_lock_t* lock, clht_hashtable_t* h) 

 { 

 char once = 1; 

 clht_lock_t l; 

 while ((l = CAS_U8(lock, LOCK_FREE, LOCK_UPDATE)) == LOCK_UPDATE) 

 {

Crashing after line 311 here causes the lock to be never released, so the restarted example waits indefinitely

Steps to reproduce

gdb --args ./example 20 1
> break clht_lb_res.h:311
> run
> next
> p *lock
# should print "$1 = 1 '\001'"
> quit
# Then, re-run
./example 20 1

The second execution should run indefinitely, waiting on acquiring the lock.

Comments

I see your comments here about locking assumptions:

RECIPE/P-CLHT/include/clht_lb_res.h

Lines 162 to 164 in fc508dd

 // Although our current implementation does not provide post-crash mechanism, 

 // the locks should be released after a crash (Please refer to the function clht_lock_initialization()) 

 clht_lock_t lock;

Does this mean this is a known issue, or does clht_lock_initialization just need to be added to clht_create? I ask because it seems that clht_lock_initialization is called in other places, just not in the recovery procedure.

Segmentation fault on YCSB with CLHT, after using libvmmalloc

I first tried running YCSB CLHT on DRAM, which worked. Then, I ran it with libvmmalloc as LD_PRELOAD, and observed a segmentation fault occasionally happening (in a nondeterministic way), both with 16 and 32 threads. I ran YCSB workloads with 2 input configurations, recordcount=operationcount=64000000, and recordcount=operationcount=1000000, both seeing the segfault:

RECIPE# LD_PRELOAD="../pmdk/src/nondebug/libvmmalloc.so.1" ./build/ycsb clht a randint uniform 32
Loaded 1000001 keys
Segmentation fault (core dumped)

Machine config:
CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
DRAM: 8*16G DDR4
DRAM emulated persistent memory: 64G, mounted with ext4-dax file system
OS: Ubuntu 18.04.3 LTS, with linux-5.1.0+ kernel

scripts/set_vmmalloc.sh:

export VMMALLOC_POOL_SIZE=$((60*1024*1024*1024))
export VMMALLOC_POOL_DIR="/mnt/pmem/test"

GDB backtrace on the core file

warning: Error reading shared library list entry at 0x7f7154c39b00
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./build/ycsb clht a randint uniform 32'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055a0f68ab202 in ssmem_mem_reclaim ()
[Current thread is 1 (Thread 0x7f71545ff700 (LWP 20200))]
(gdb) bt
#0  0x000055a0f68ab202 in ssmem_mem_reclaim ()
#1  0x000055a0f68aa72e in clht_gc_release ()
#2  0x000055a0f68a9ba0 in ht_resize_pes ()
#3  0x000055a0f68a94dc in ht_status ()
#4  0x000055a0f68a98de in clht_put ()
#5  0x000055a0f6840dc8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<ycsb_load_run_randint(int, int, int, int, int, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<int, std::allocator<int> >&, std::vector<int, std::allocator<int> >&)::{lambda()#9}> > >::_M_run() ()
#6  0x00007f7d5662166f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f7d568f46db in start_thread (arg=0x7f71545ff700) at pthread_create.c:463
#8  0x00007f7d55cde88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

some questions about the usage of Optane DC

Excuse me, I have read your paper, and I have some questions about your test:

Did you configure App Direct mode with interleaved modules?
How did you measure high throughput(77.64Mops/s for CCEH) in Optane DC?
Did you test the scalability of your P-CLHT? The scalability of other common data structure I tested is not good, I wonder it is limited by the Optane DC.
I hope that you can take the time to answer for me during your busy schedule, thank you！.

Is there any plan on supporting the iterator for masstree?

Currently, I only saw the scan method on masstree. Iterator on masstree may need some methods such as seek(key), next, seekfirst, seeklast. I'm wondering any plan on this?

P-Masstree not Working Correctly with String Keys

I've read your code a bit and try to modified example.cpp to use string keys.
I use the function void masstree::put(char *key, uint64_t value) to insert string keys to P-Masstree.
However it would not working correctly if key is overlapped with the previously inserted keys.
For example if we first insert key1 abcdefghijklmnopqrstuvwxyz and then key2: abcdefghijklmnopqrstuvwxy. Then I try to get key1 using void *masstree::get(char *key) would return an empty value.

Crash-consistency bug in P-CLHT `clht_gc_collect_cond`

The value of hashtable->ht_oldest is not persisted after the free, meaning that a post-crash execution can read the previous value and perform a double-free.

RECIPE/P-CLHT/src/clht_gc.c

Lines 183 to 196 in 05a49d7

 clht_hashtable_t* cur = hashtable->ht_oldest; 

 while (cur != NULL && cur->version < version_min) 

 { 

 gced_num++; 

 clht_hashtable_t* nxt = cur->table_new; 

 /* printf("[GCOLLE-%02d] gc_free version: %6zu | current version: %6zu\n", GET_ID(collect_not_referenced_only), */ 

 /* cur->version, hashtable->ht->version); */ 

 nxt->table_prev = NULL; 

 clht_gc_free(cur); 

 cur = nxt; 

 } 

 hashtable->version_min = cur->version; 

 hashtable->ht_oldest = cur;

Segmentation fault in ycsb

It looks like there is a segmentation fault caused by `./build/ycsb art a randint uniform 4'
I followed the the build and config procedure as described in README.md, till 'Persistent Memory environment'.

Machine config:
CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
DRAM: 8*16G DDR4
DRAM emulated persistent memory: 50G, mounted with ext4-dax file system
OS: Ubuntu 18.04.3 LTS, with linux-5.1.0+ kernel

root@RECIPE# cat ./scripts/set_vmmalloc.sh
export VMMALLOC_POOL_SIZE=$((16*1024*1024*1024))
export VMMALLOC_POOL_DIR="/mnt/pmem"

root@RECIPE# source ./scripts/set_vmmalloc.sh
root@RECIPE# LD_PRELOAD="../pmdk/src/nondebug/libvmmalloc.so.1" ./build/ycsb art a randint uniform 4
art, workloada, randint, uniform, threads 4
Loaded 0 keys
Segmentation fault (core dumped)

root@RECIPE# gdb ./build/ycsb core
warning: Error reading shared library list entry at 0x7f2ecdc39b00
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./build/ycsb art a randint uniform 4'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005629a6dc8967 in ART_ROWEX::N4::change(unsigned char, ART_ROWEX::N*) ()
[Current thread is 1 (Thread 0x7f32d0800800 (LWP 108852))]
(gdb) bt
#0  0x00005629a6dc8967 in ART_ROWEX::N4::change(unsigned char, ART_ROWEX::N*) ()
#1  0x00005629a6dcd717 in ART_ROWEX::Tree::insert(Key const*, ART::ThreadInfo&) ()
#2  0x00005629a6d73cc0 in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned long>, ycsb_load_run_randint(int, int, int, int, int, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<int, std::allocator<int> >&, std::vector<int, std::allocator<int> >&)::{lambda(tbb::blocked_range<unsigned long> const&)#1}, tbb::auto_partitioner const>::execute() ()
#3  0x00007f32cff9bb46 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#4  0x00007f32cff98790 in ?? () from /usr/lib/x86_64-linux-gnu/libtbb.so.2
#5  0x00005629a6d82db6 in ycsb_load_run_randint(int, int, int, int, int, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<int, std::allocator<int> >&, std::vector<int, std::allocator<int> >&) ()
#6  0x00005629a6d700c6 in main ()

Scalability Issues in Optane

Hello, we are trying to reproduce your results. Even through we have successfully plugged your work with the Intel Optane of our system by following exactly the procedure that you indicate, we do not observe a significant scalability as we increase the number of threads, in contrast to DRAM execution, where the scalablity is clear. Is this a known issue? Do these indexes scale on both DRAM and Intel Optane? Is there any known reason why scalability fails in Optane?
Thank you :)

Crash consistency bug in clht_gc_free

Bug

Exposed by crashing after freeing the hash table in clht_gc_free.

RECIPE/P-CLHT/src/clht_gc.c

Lines 239 to 242 in fc508dd

 PMEMoid table_oid = {pool_uuid, hashtable->table_off}; 

 pmemobj_free(&table_oid); 

 PMEMoid ht_oid = pmemobj_oid((void *)hashtable); 

 pmemobj_free(&ht_oid);

pmemobj_free sets the PMEMoid object to NULL when freeing objects.
With the current design of storing the offset in hashtable->table_off, the offset is never set to null, and so a crash can cause a double-free to occur.

Steps to reproduce

gdb --args ./example 20 20
> break clht_gc.c:241
> run
> quit
# Then, re-run
./example 20 0

Will output something like:

Simple Example of P-CLHT
operation,n,ops/s
Throughput: load, inf ,ops/us
Throughput: run, inf ,ops/us
<libpmemobj>: <1> [palloc.c:295 palloc_heap_action_exec] assertion failure: 0

FAST_FAIR Range Bug

In linear_search_range of /third-party/FAST_FAIR/btree.h, count() should be current->count(),

some questions about CLFLUSH_OPT/CLWB

Excuse me, I have read your paper and code，very interesting. But I have a question about CLFLUSH_OPT/CLWB。
Below is your implementation

inline void clflush(char *data, int len, bool front, bool back)
{
      volatile char *ptr = (char )((unsigned long)data & ~(cache_line_size - 1));
      if (front)
               mfence();
      for (; ptr < data+len; ptr += cache_line_size){
              unsigned long etsc = read_tsc() +
             (unsigned long)(write_latency_in_ns * cpu_freq_mhz/1000);
#ifdef CLFLUSH
             asm volatile("clflush %0" : "+m" ((volatile char )ptr));
#elif CLFLUSH_OPT
             asm volatile(".byte 0x66; clflush %0" : "+m" ((volatile char )(ptr)));
#elif CLWB
             asm volatile(".byte 0x66; xsaveopt %0" : "+m" ((volatile char *)(ptr)));
#endif
            while (read_tsc() < etsc) cpu_pause();
     }
     if (back) 
             mfence();
}

In fact, CLFLUSH_OPT/CLWB will be reorder. In some indexes, Should mfence() be added between cachelines instead of adding mfence() at the end? E.g. the FAST & FAIR paper clearly mentions the need to add mfence() at cacheline boundary. In the original implementation of FAST&FAIR, only CLFLUSH was used, which will not cause problems, because CLFLUSH will not be reorder.
So will using CLFLUSH_OPT/CLWB cause a bug?

	static inline int
	lock_acq_chk_resize(clht_lock_t* lock, clht_hashtable_t* h)
	{
	char once = 1;
	clht_lock_t l;
	while ((l = CAS_U8(lock, LOCK_FREE, LOCK_UPDATE)) == LOCK_UPDATE)
	{

	// Although our current implementation does not provide post-crash mechanism,
	// the locks should be released after a crash (Please refer to the function clht_lock_initialization())
	clht_lock_t lock;

	clht_hashtable_t* cur = hashtable->ht_oldest;
	while (cur != NULL && cur->version < version_min)
	{
	gced_num++;
	clht_hashtable_t* nxt = cur->table_new;
	/* printf("[GCOLLE-%02d] gc_free version: %6zu \| current version: %6zu\n", GET_ID(collect_not_referenced_only), */
	/* cur->version, hashtable->ht->version); */
	nxt->table_prev = NULL;
	clht_gc_free(cur);
	cur = nxt;
	}

	hashtable->version_min = cur->version;
	hashtable->ht_oldest = cur;

	PMEMoid table_oid = {pool_uuid, hashtable->table_off};
	pmemobj_free(&table_oid);
	PMEMoid ht_oid = pmemobj_oid((void *)hashtable);
	pmemobj_free(&ht_oid);

utsaslab / recipe Goto Github PK

recipe's People

Contributors

Stargazers

Watchers

Forkers

recipe's Issues

Bug

Steps to reproduce

Comments

Bug

Steps to reproduce

Recommend Projects

Recommend Topics

Recommend Org