chrislusf / vasto Goto Github PK

A distributed key-value store. On Disk. Able to grow or shrink without service interruption.

License: Apache License 2.0

Go 89.46% Makefile 0.02% Shell 0.08% Java 10.43%

distributed-storage key-value rocksdb replication

vasto's Introduction

Vasto

A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption.

Vasto scales embedded RocksDB into a distributed key-value store, adding sharding, replication, and support operations to

create a new keyspace
delele an existing keyspace
grow a keyspace
shrink a keyspace
replace a node in a keyspace

Why

A key-value store is often re-invented. Why there is another one?

Vasto enables developers to setup a distributed key-value store as simple as creating a map object.

The operations, such as creating/deleting the store, partitioning, replications, seamlessly adding/removing servers, etc, are managed by a few commands. Client connection configurations are managed automatically.

In a sense, Vasto is an in-house cloud providing distributed key-value stores as a service, minus the need to balance performance and cloud service costs, plus consistent and low latency.

Architecture

There are one Vasto master and N number of Vasto stores, plus Vasto clients or Vasto proxies/gateways.

The Vasto stores are basically simple wrapper of RocksDB.
The Vasto master manages all the Vasto stores and Vasto clients.
Vasto clients rely on the master to connect to the right Vasto stores.
Vasto gateways use Vasto client libraries to support different APIs.

The master is the brain of the system. Vasto does not use gossip protocols, or other consensus algorithms. Instead, Vasto uses a single master for simple setup, fast failure detection, fast topology changes, and precise coordinations. The master only contains soft states and is only required when topology changes. So even if it ever crashes, a simple restart will recover everything.

The Vasto stores simply pass get/put/delete/scan requests to RocksDB. One Vasto store can host multiple db instances.

Go applications can use the client library directly.

Applications in other languages can talk to the Vasto gateway, which uses the client library and reverse proxy the requests to the Vasto stores. The number of Vasto gateways are unlimited. They can be installed on any application machines to reduce one network hop. Or can be on its dedicated machine to reduce number of connections to the Vasto stores if both the number of stores and the number of clients are very high.

Life cycle

One Vasto cluster has one master and multiple Vasto stores. When the store joins the cluster, it is just empty.

When the master receives a request to create a keyspace with x shards and replication factor = y, the master would

find x stores that meet the requirement and assign it to the keyspace
ask the stores to create the shards, including replicas.
inform the clients of the store locations

When the master receives a request to resize the keyspace from m shards to n shards, the master would

if size increased, find n-m stores that meet the requirement and assign it to the keyspace
ask the stores to create the shards, including replicas.
prepare the data to the new stores
direct the clients traffic to the new stores
remove retiring shards

Hashing algorithm

Vasto used Jumping Consistent Hash to allocate data. This algorithm

requires no storage. The master only need soft state to manage all store servers. It is OK to restart master.
evenly distribute the data into buckets.
when the number of bucket changes, it can also evenly dividing the workload.

With this jumping hash, the cluster resizing is rather simple, flexible, and efficient:

Cluster can resize up or down freely.
Resizing is well coordinated.
Data can be moved via the most efficient SSTable writes.
Clients aware of the cluster change and can redirect traffic only when the new whole new server are ready.

Eventual Consistency and Active-Active Replication

All Vasto stores can be used to read and write. The changes will be propagated to other replicas within a few milliseconds. Only the primary replica participate in the normal operations. The replica are participating when the primary replica is down, or in a different data center.

Vasto assumes the data already has the event time. It should be the time when the event really happens, not the time when the data is feed into Vasto system. If the system fails over to the replica partition, and there are multiple changes to one key, the one with latest event times will win.

Client APIs

See https://godoc.org/github.com/chrislusf/vasto/goclient/vs

Example

    // create a vasto client talking to master at localhost:8278
    vc := vs.NewVastoClient(context.Background(), "client_name", "localhost:8278")
    
    // create a cluster for keyspace ks1, with one server, and one copy of data.
    vc.CreateCluster("ks1", 1, 1)
    
    // get a cluster client for ks1
    cc := vc.NewClusterClient("ks1")

    // operate with the cluster client
    var key, value []byte
    cc.Put(key, value)    
    cc.Get(vs.Key(key))
    ...

    // change cluster size to 3 servers
    vc.ResizeCluster("ks1", 3)

    // operate with the existing cluster client
    cc.Put(key, value)    
    cc.Get(vs.Key(key))
    ...

Currently only basic go library is provided. The gateway is not ready yet.

vasto's People

Contributors

Stargazers

Watchers

vasto's Issues

Question: why client <> store communication uses plain TCP instead of GRPC?

Could you share the design rationale? Thanks!

Hash sharding scale strategy question

Using consistent hash for sharding will take more time to migrate between shards，and range key queries would be hard to implement.
Would you consider key range sharding for another sharding strategy？

go get vasto&chrislusf/gorocksdb fail

OS：ubuntu ，go version 1.10.4
go get github.com/chrislusf/vasto

github.com/chrislusf/gorocksdb

gopath/src/github.com/chrislusf/gorocksdb/options.go:1045:2: could not determine kind of name for C.rocksdb_options_set_allow_ingest_behind

I have successful go get&build tecbot/gorocksdb，
however I fail to go get chrislusf/gorocksdb and chrislusf/vasto

Distributed key-value store over RDMA

Recently I have been thinking about what to do with my Master's degree graduation project. Today, I came across this idea as mentioned in the title. Is it valuable? If so, is it feasible?

./go.test.sh error

OS:ubuntu; gcc version 7.3.0; go 1.10.4; and rocksdb master branch with shared lib: librocksdb.so.5.17
after export LD_LIBRARY_PATH=/usr/local/lib, the go.test.sh is running now,
but still fail for the below message:

ok github.com/chrislusf/vasto/storage/binlog 0.097s coverage: 82.4% of statements
ok github.com/chrislusf/vasto/storage/codec 0.027s coverage: 94.4% of statements
free(): invalid pointer
FAIL github.com/chrislusf/vasto/storage/rocks 0.083s

what you think about adding ceph crush support?

I think about distributed kv, but i want to use ceph crushmap to ave ability to specify how much replicas needed and provide failure domains. what do you think?
(i'm port some parts of ceph crushmap parsing code to go - https://github.com/unistack-org/go-crush )

Vasto primary/replica failure restore problem

I tried to test vasto primary/replica failure restore latency with create cluster ks1 dc1 2 2.
However，the created primary/replica store is both on the same store/process even I had added more than one store to the cluster，both primary/replica shard will shutdown in machine failure situation.
Is it intentional setting？
In that case，how can I test the primary/replica failure restore latency？

I manage to read the code，it seems that vasto client is only send its rocksDB requests to primary shard by default.
In case of primary hitch，vasto will update primary/replica using the PeriodicTask to synchronize the new rocksDB requests and replica router information to the client.
So the minimum failure restore latency will be time for primary/replica sync plus the PeriodicTask gap？
Could you explain the primary/replica sync mechanism if I miss some thing？

Java client

Do you have any plans for java driver?

Understanding rebalancing of keys when new node is added or existing node is removed

This is a question for understanding data movement during adding or removing a node to cluster.

In my understanding, jump consistent hashing does not hash node on the ring. It seems like their is no ring concept. The output of hash is a number in [0, nodes) range which maps to a node.

If a new node is added to cluster, how will we know which all keys need to move to this node ?
Will we have to go to all nodes, read all keys and see if they belong to this new node ?

Can you point me to code which does this ?

In ring based consistent hashing, new node will be in ring and will copy keys from next node to itself. So, it may require going over all keys on only one node. However, in real world, one node maps to many buckets on the ring, so we may have to go to multiple nodes in ring based consistent hashing as well.

some wiki guard mistake，benchmark result. possibilities of other KV engine replacement？

// start one vasto shell
vasto shell
create cluster ks1 dc1 3 1
use keyspace ks1
desc ks1 dc1

delete cluster ks1 dc1

the last line should be delete

I get my bechmark on a i7 7550/16G laptop
time ./vasto bench --keyspace=ks1 -n 1024000 -c 16
benchmarking on cluster with master localhost:8278
put : 35s [===================================================================>] 100% 29145.12 ops/sec
put,get : 29058.5 op/s
Microseconds per op:
Count: 1024000 Average: 527.6445 StdDev: 371.17
Min: 0.0000 Median: 464.7338 Max: 26443.0000
put : 35s [====================================================================] 100% 29142.38 ops/sec
get : 31s [===================================================================>] 100% 32221.78 ops/sec
put,get : 32110.0 op/s
Microseconds per op:
Count: 1024000 Average: 470.7078 StdDev: 201.14
Min: 0.0000 Median: 464.2912 Max: 15578.0000
put : 35s [====================================================================] 100% 29142.38 ops/sec
get : 31s [====================================================================] 100% 32218.41 ops/sec
real 1m7.219s
user 0m43.063s
sys 1m17.438s

By the way，is it possible to replace the embed rocksdb with other KV engine？
For most of other KY engines，there is no equivalent Merge operation as Roscksdb‘s. Only some simple put/get/delete/iterate/batch write operations

Crashed with nil pointer when putting a key to a cluster of 2 replicas and one replica is down. (not reproducible)

I0129 14:47:31.514926   23375 master_event_processor.go:56] [master] + client  from 127.0.0.1:62432 keyspace(clst)
E0129 14:50:26.104690   23375 retry.go:22] shard clst.1.0 normal follow 0.0 failed: pull changes: rpc error: code = Unavailable desc = transport is closing
E0129 14:50:26.105056   23375 retry.go:19] shard clst.1.1 normal follow 0.1 still failed: pull changes: rpc error: code = Unavailable desc = transport is closing
I0129 14:50:26.105249   23375 master_event_processor.go:61] [master] - client [store@:8379] from 127.0.0.1:62477 keyspace(clst)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x44642af]

goroutine 218 [running]:
github.com/chrislusf/vasto/cmd/store.(*storeServer).processPut(0xc000169500, 0xc00020ca00, 0xc000454820, 0x0)
	/vasto/cmd/store/process_put.go:31 +0x15f
github.com/chrislusf/vasto/cmd/store.(*storeServer).processRequest(0xc000169500, 0xc000328208, 0x4, 0xc0002f4f60, 0xc0002f4f30)
	/vasto/cmd/store/store_tcp_server.go:153 +0x2ab
github.com/chrislusf/vasto/cmd/store.(*storeServer).handleInputOutput(0xc000169500, 0xc0000ee440, 0x1e, 0x1e, 0x1e, 0x0, 0x0, 0x6c1e838, 0x0)
	/vasto/cmd/store/store_tcp_server.go:91 +0x121
github.com/chrislusf/vasto/cmd/store.(*storeServer).handleRequest(0xc000169500, 0x46a2440, 0xc000191980, 0x6c1e838, 0xc000196020, 0x0, 0x0)
	/vasto/cmd/store/store_tcp_server.go:71 +0x117
github.com/chrislusf/vasto/cmd/store.(*storeServer).handleConnection(0xc000169500, 0x46abbc0, 0xc000196020)
	/vasto/cmd/store/store_tcp_server.go:47 +0xea
github.com/chrislusf/vasto/cmd/store.(*storeServer).serveTcp.func1(0x46abbc0, 0xc000196020, 0xc000304050, 0xc000169500)
	/vasto/cmd/store/store_tcp_server.go:36 +0xf7
created by github.com/chrislusf/vasto/cmd/store.(*storeServer).serveTcp
	/vasto/cmd/store/store_tcp_server.go:27 +0x14b

compile error~~how can I compile and benchmark vasto?

github.com/chrislusf/gorocksdb

exec: "gcc": executable file not found in %PATH%

github.com/chrislusf/vasto/cmd/master

cmd\master\master_server.go:56: cannot use ms (type *masterServer) as type pb.VastoMasterServer in argument to pb.RegisterVastoMasterServer:
*masterServer does not implement pb.VastoMasterServer (wrong type for CompactCluster method)
have CompactCluster("context".Context, *pb.CompactClusterRequest) (*pb.CompactClusterResponse, error)
want CompactCluster("golang.org/x/net/context".Context, *pb.CompactClusterRequest) (*pb.CompactClusterResponse, error)

Rocks-less version?

Hi, any plans for replacing RocksDb with pure Go version like Badger, Bolt, Bitcask,... ?

go build error

my enviroment is ubuntu, go 10.4.
vasto use the current master branch.
go build shows the below message:

google.golang.org/genproto/googleapis/rpc/status

../../../google.golang.org/genproto/googleapis/rpc/status/status.pb.go:111:28: undefined: proto.InternalMessageInfo

vasto vs cassandra

@chrislusf
I would be obliged if you could compare between vasto and cassandra
-- why Vasto will be needed when we can run Cassandra with RocksDB (as engine)

vasto bench fail

I tried to use the vasto bench to test it. first open one terminal for vasto master or server，then open the other for vasto bench，but vasto bench always showed as below and won't make any progress again：

benchmarking on cluster with master localhost:8278
put : --- [--------------------------------------------------------------------] 0% NaN ops/sec

The vasto master or server side just printed out the message such as：
I1013 10:05:21.397870 8502 master_server.go:41] Vasto master starts on :8278
I1013 10:05:30.134602 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54859
E1013 10:05:30.135152 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54859
I1013 10:05:30.135119 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54861
E1013 10:05:30.152217 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54861
I1013 10:06:31.644583 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54859
I1013 10:06:31.645002 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54861
I1013 10:06:39.134168 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54864
E1013 10:06:39.134886 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54864
I1013 10:06:53.839102 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54864

I tried to use vasto bench -c1 parameter， but it did not change anything for the bench result.
Is there any settings I miss？ In my knowledge, vasto server include both vasto store and master, so it should be fine to open a vasto server for vasto bench to test.
I just want to make a benchmark test about the vasto performance，could you provide detail instructions to benchmark vasto ？

vasto start server or store fail

go.test.sh run successful now.
when start vasto sever or store，I encounter below errors：

E1013 00:06:29.768819 8411 store_in_cluster.go:30] read file /tmp/go-build169079065/cluster.config: open /tmp/go-build169079065/cluster.config: no such file or directory

does it need certrain cluster.config files？

chrislusf / vasto Goto Github PK

vasto's Introduction

Vasto

Why

Architecture

Life cycle

Hashing algorithm

Eventual Consistency and Active-Active Replication

Client APIs

Example

vasto's People

Contributors

Stargazers

Watchers

Forkers

vasto's Issues

github.com/chrislusf/gorocksdb

github.com/chrislusf/gorocksdb

github.com/chrislusf/vasto/cmd/master

google.golang.org/genproto/googleapis/rpc/status

Recommend Projects

Recommend Topics

Recommend Org