Giter VIP home page Giter VIP logo

fusee's Introduction

FUSEE: A Fully Memory-Disaggregated Key-Value Store

This is the implementation repository of our FAST'23 paper: FUSEE: A Fully Memory-Disaggregated Key-Value Store.

Description

We proposes FUSEE, a FUlly memory-diSaggrEgated KV StorE that brings disaggregation to metadata management. FUSEE replicates metadata, i.e., the index and memory management information, on memory nodes, manages them directly on the client side, and handles complex failures under the DM architecture. To scalably replicate the index on clients, FUSEE proposes a client-centric replication protocol that allows clients to concurrently access and modify the replicated index. To efficiently manage disaggregated memory, FUSEE adopts a two-level memory management scheme that splits the memory management duty among clients and memory nodes. Finally, to handle the metadata corruption under client failures, FUSEE leverages an embedded operation log scheme to repair metadata with low log maintenance overhead.

Environment

  • For hardware, each machine should be equipped with one 8-core Intel processer(e.g., Intel Xeon E5-2450), 16GB DRAM and one RDMA NIC card (e.g., Mellanox ConnectX-3). Each RNIC should be connected to an Infiniband or Ethernet switch (e.g., Mellanox SX6036G). All machines are separated into memory nodes and compute nodes. At maximum 5 memory nodes and 17 compute nodes are used for the experiments in our paper. If you do not have such testbed, consider using CloudLab.

  • For software, Ubuntu 18.04 is recommended for each machine. In our experiments, 7168 HugePages of 2MB size in each memory node and 2048 ones in compute nodes is need to be allocated. You can set up this with echo 7168 > /proc/sys/vm/nr_hugepages command for memory nodes and echo 2048 > /proc/sys/vm/nr_hugepages for compute nodes.

Configurations

Configuration files for servers and clients should be provided to the program. Here are two example configuration files below.

1. Servers configuration

For each memory node, you should provide a configuration file server_config.json where you can flexibly configure the server:

{
    "role": "SERVER",
    "conn_type": "IB",
    "server_id": 0,
    "udp_port": 2333,
    "memory_num": 3,
    "memory_ips": [
        "10.10.10.1",
        "10.10.10.2",
        "10.10.10.3"
    ],
    "ib_dev_id": 0,
    "ib_port_id": 1,
    "ib_gid_idx": 0,

    "server_base_addr":  "0x10000000",
    "server_data_len":   15032385536,
    "block_size":        67108864,
    "subblock_size":     256,
    "client_local_size": 1073741824,

    "num_replication": 3,

    "main_core_id": 0,
    "poll_core_id": 1,
    "bg_core_id": 2,
    "gc_core_id": 3
}

For briefness, we call each memory node as "server i" (i = 0, 1, ...).

2. Clients configuration

For each compute node, you should provide a configuration file client_config.json where you can flexibly configure the client:

{
    "role": "CLIENT",
    "conn_type": "IB",
    "server_id": 2,
    "udp_port": 2333,
    "memory_num": 2,
    "memory_ips": [
        "128.110.96.102",
        "128.110.96.81"
    ],
    "ib_dev_id": 0,
    "ib_port_id": 1,
    "ib_gid_idx": 0,

    "server_base_addr":  "0x10000000",
    "server_data_len":   15032385536,
    "block_size":        67108864,
    "subblock_size":     1024,
    "client_local_size": 1073741824,

    "num_replication": 2,
    "num_idx_rep": 1,
    "num_coroutines": 10,
    "miss_rate_threash": 0.1,
    "workload_run_time": 10,
    "micro_workload_num": 10000,

    "main_core_id": 0,
    "poll_core_id": 1,
    "bg_core_id": 2,
    "gc_core_id": 3
}

For briefness, we call each compute node as "client i" (i = 0, 1, 2, ...).

It should be noted that, the server_id parameter of client i should be set to 2+i*8. For example, the server_id of the first three client is 2, 10, 18 respectively.

Experiments

For each node, execute the following commands to compile the entire program:

mkdir build && cd build
cmake ..
make -j

We test FUSEE with micro-benchmark and YCSB benchmarks respectively. For each experiments, you should put server_config.json in directory ./build, and then use the following command in memory nodes to set up servers:

numactl -N 0 -m 0 ./ycsb-test/ycsb_test_server [SERVER_NUM]

[SERVER_NUM] should be the serial number of this memory node, counting from 0.

1. Micro-benchmark

  • Latency

    To evaluate the latency of each operation, we use a single client to iteratively execute each operation (INSERT, DELETE, UPDATE, and SEARCH) for 10,000 times.

    Enter ./build/micro-test and use the following command in client 0

    numactl -N 0 -m 0 ./latency_test_client [PATH_TO_CLIENT_CONFIG]

    Test results will be saved in ./build/micro-test/results.

  • Throughput

    To evaluate the throughput of each operations, each client first iteratively INSERTs different keys for 0.5 seconds. UPDATE and SEARCH operations are then executed on these keys for 10 seconds. Finally, each client executes DELETE for 0.5 seconds.

    Enter ./build/micro-test and execute the following command on all client nodes at the same time:

    numactl -N 0 -m 0 ./micro_test_multi_client [PATH_TO_CLIENT_CONFIG] 8

    Number 8 indicates there are 8 client threads in each client node. You will need to use the keyboard to simultaneously send space signals to each client node for starting each operation testing synchronously.

    Test results will be displayed on each client terminal.

2. YCSB benchmarks

  • Workload preparation

    Firstly, download all the testing workloads using sh download_workload.sh in directory ./setup and unpack the workloads you want to ./build/ycsb-test/workloads.

    Here is the description of the YCSB workloads:

    Workload SEARCH UPDATE INSERT
    A 0.5 0.5 0
    B 0.95 0.95 0
    C 1 0 0
    D 0.95 0 0.05
    upd[X] 1-[X]% [X]% 0

    Then, you should execute the following command in ./build/ycsb-test to split the workloads into N parts(N is the total number of client threads):

    python split-workload.py [N]

    And then we can start testing FUSEE using YCSB benchmarks.

  • Throughput

    To show the scalability of FUSEE,we can test the throughput of FUSEE with different number of client nodes. Besides, we can evaluate the read-write performance of FUSEE by testing the throughput of FUSEE using workloads with different search-update ratios X. Here is the command of testing the throughput of FUSEE:

    numactl -N 0 -m 0 ./ycsb_test_multi_client [PATH_TO_CLIENT_CONFIG] [WORKLOAD-NAME] 8

    Execute the command on all the client nodes at the same time. [WORKLOAD-NAME] can be chosen from workloada ~ workloadd or workloadudp0 ~ workloadudp100 (indicating different search-update ratios) . Number 8 indicates there are 8 client threads in each client node. You will need to use the keyboard to simultaneously send space signals to each client node for starting each operation testing synchronously.

    Test results will be displayed on each client terminal.

fusee's People

Contributors

bernardshen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fusee's Issues

Does the FUSEE code contains SPLIT operaion?

As I know, RACE has implemented SPLIT OPERATION to support extendible hashing, however, i find the SEARCH, UPDATE, INSERT, DELETE operation in FUSEE code except SPLIT.
I read it for a long time and still haven't found it, so hope to get your answer, thank you very much!

Workload specs in `ycsb-small/` are not provided

Hello authors, I am interested in FUSEE are trying to reproduce the ycsb bench on our servers.

However, I have a question regarding the split-workload.py. In particular, line 6 of the script mentions the existence of a ycsb-small/ directory containing small workload specifications. However, I was unable to locate this directory in the repository.

Would it be possible for you to provide me with this directory or help me locate it?

Thank you in advance for your time and assistance.

wc.status==5 error, work request flushed error

Hi,你好,最近我尝试在两个节点上实现FUSEE,将nodeA作为CN,nodeB作为MN。在config.json文件里配置好IP等信息后开始编译运行。在nodeB上运行的是numactl -N 0 -m 0 ./ycsb-test/ycsb_test_server 0,以启动MN服务,在nodeA上运行的是numactl -N 0 -m 0 ./micro_test_multi_client ../client_config.json 8,以此测试吞吐。但是目前代码在client.cc的line132处遇到问题,具体原因是,while (!init_is_finished());当执行init_is_finished()函数里的ret = nm_->nm_rdma_read_from_sid(local_buf_, local_buf_mr_->lkey, sizeof(uint64_t), remote_global_meta_addr_, server_mr_info_map_[0]->rkey, 0);时,wc返回的status状态码是5,即Work Request Flushed Error,可以请您帮我一起解决这个问题吗?以及能否请教您为何会遇到这个错误。感谢。

How to know the next pointer when using slab allocation?

image
The local allocation should fit the size class. Different size classes have different freeLists. when allocating block 'a', we don't know the next allocation size ,thus don't know the pointer pointing to the header of freelist which fit next allocation size.

How to run micro-test?

Hello, I am interested in FUSEE which is an admirable work! Recently I tried to run the source code, but met some erros and hope to get your help. My experimental environment was 3 servers with Mellanox-CX5 RNIC, using the ROCE v2 protocol; Two serve as Memory Nodes and one serves as Compute Node.
server 1's server_config is placed in /FUSEE/build/ directory, and discribled as follows:
{
"role": "SERVER",
"conn_type": "ENS7F1",
"server_id": 0,
"udp_port": 2333,
"memory_num": 2,
"memory_ips": [
"192.168.0.1"]
],
"ib_dev_id": 1,
"ib_port_id": 1,
"ib_gid_idx": 0,

    "server_base_addr":  "0x10000000",
    "server_data_len":   15032385536,
    "block_size":        67108864,
    "subblock_size":     256,
    "client_local_size": 1073741824,

    "num_replication": 2,

    "main_core_id": 0,
    "poll_core_id": 1,
    "bg_core_id": 2,
    "gc_core_id": 3
    }

server 2's server_config is placed in /FUSEE/build/ directory, and discribled as follows:
{
"role": "SERVER",
"conn_type": "ENS7F1",
"server_id":1,
"udp_port": 2333,
"memory_num": 2,
"memory_ips": [
"192.168.0.2"]
],
"ib_dev_id": 1,
"ib_port_id": 1,
"ib_gid_idx": 0,

	    "server_base_addr":  "0x10000000",
	    "server_data_len":   15032385536,
	    "block_size":        67108864,
	    "subblock_size":     256,
	    "client_local_size": 1073741824,
	
	    "num_replication": 2,
	
	    "main_core_id": 0,
	    "poll_core_id": 1,
	    "bg_core_id": 2,
	    "gc_core_id": 3
    }

server 3 as Compute node, and Client_config.json is placed in /FUSEE/build/ directory, and discribled as follows:
{
"role": "CLIENT",
"conn_type": "ENS7F1",
"server_id": 0,
"udp_port": 2333,
"memory_num": 2,
"memory_ips": [
"192.168.0.1",
"192.168.0.2"
],
"ib_dev_id": 1,
"ib_port_id": 1,
"ib_gid_idx": 0,

	    "server_base_addr":  "0x10000000",
	    "server_data_len":   15032385536,
	    "block_size":        67108864,
	    "subblock_size":     256,
	    "client_local_size": 1073741824,
	
	    "num_replication": 2,
		"num_idx_rep": 1,
	    "num_coroutines": 10,
	    "miss_rate_threash": 0.1,
	    "workload_run_time": 10,
	    "micro_workload_num": 10000,
	    
	    "main_core_id": 0,
	    "poll_core_id": 1,
	    "bg_core_id": 2,
	    "gc_core_id": 3
    }

Firstly, I run memory node as READme.md at server 1:
cd /FUSEE-master/build# numactl -N 0 -m 0 ./ycsb-test/ycsb_test_server 0
===== Starting Server 0 ====
kv_area_addr: 18000000, block_size: 4000000
my_sid_: 0, num_memory_:2
num_rep_blocks: 26, num_blovks: 26, limit: 90000000
press to exit
==== Ending Server 0 ====

Secondly, I run memory node as READme.md at server 2:
cd /FUSEE-master/build# numactl -N 0 -m 0 ./ycsb-test/ycsb_test_server 1
===== Starting Server 1 ====
kv_area_addr: 18000000, block_size: 4000000
my_sid_: 1, num_memory_:2
num_rep_blocks: 26, num_blovks: 26, limit: 90000000
press to exit
==== Ending Server 1 ====

Thirdly, I run compute node as READme.md at server 3:
cd /FUSEE-master/build/micro-test# numactl -N 0 -m 0 ./lantency_test_client ../client_config.json
main process running on core: 0
server_kv_area_addr: 28000000 26
num_rep_blocks: 26
Cannot have allocated 28000001
The error is " Cannot have allocated 28000001"
So, how to solve it? Could you tell me what each parameter in the servr_config.json and client_cofig.json is used for? Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.