Giter VIP home page Giter VIP logo

vasto's Introduction

Vasto

Build Status GoDoc Go Report Card codecov

A distributed high-performance key-value store. On Disk. Eventual consistent. HA. Able to grow or shrink without service interruption.

Vasto scales embedded RocksDB into a distributed key-value store, adding sharding, replication, and support operations to

  1. create a new keyspace
  2. delele an existing keyspace
  3. grow a keyspace
  4. shrink a keyspace
  5. replace a node in a keyspace

Why

A key-value store is often re-invented. Why there is another one?

Vasto enables developers to setup a distributed key-value store as simple as creating a map object.

The operations, such as creating/deleting the store, partitioning, replications, seamlessly adding/removing servers, etc, are managed by a few commands. Client connection configurations are managed automatically.

In a sense, Vasto is an in-house cloud providing distributed key-value stores as a service, minus the need to balance performance and cloud service costs, plus consistent and low latency.

Architecture

There are one Vasto master and N number of Vasto stores, plus Vasto clients or Vasto proxies/gateways.

  1. The Vasto stores are basically simple wrapper of RocksDB.
  2. The Vasto master manages all the Vasto stores and Vasto clients.
  3. Vasto clients rely on the master to connect to the right Vasto stores.
  4. Vasto gateways use Vasto client libraries to support different APIs.

The master is the brain of the system. Vasto does not use gossip protocols, or other consensus algorithms. Instead, Vasto uses a single master for simple setup, fast failure detection, fast topology changes, and precise coordinations. The master only contains soft states and is only required when topology changes. So even if it ever crashes, a simple restart will recover everything.

The Vasto stores simply pass get/put/delete/scan requests to RocksDB. One Vasto store can host multiple db instances.

Go applications can use the client library directly.

Applications in other languages can talk to the Vasto gateway, which uses the client library and reverse proxy the requests to the Vasto stores. The number of Vasto gateways are unlimited. They can be installed on any application machines to reduce one network hop. Or can be on its dedicated machine to reduce number of connections to the Vasto stores if both the number of stores and the number of clients are very high.

Life cycle

One Vasto cluster has one master and multiple Vasto stores. When the store joins the cluster, it is just empty.

When the master receives a request to create a keyspace with x shards and replication factor = y, the master would

  1. find x stores that meet the requirement and assign it to the keyspace
  2. ask the stores to create the shards, including replicas.
  3. inform the clients of the store locations

When the master receives a request to resize the keyspace from m shards to n shards, the master would

  1. if size increased, find n-m stores that meet the requirement and assign it to the keyspace
  2. ask the stores to create the shards, including replicas.
  3. prepare the data to the new stores
  4. direct the clients traffic to the new stores
  5. remove retiring shards

Hashing algorithm

Vasto used Jumping Consistent Hash to allocate data. This algorithm

  1. requires no storage. The master only need soft state to manage all store servers. It is OK to restart master.
  2. evenly distribute the data into buckets.
  3. when the number of bucket changes, it can also evenly dividing the workload.

With this jumping hash, the cluster resizing is rather simple, flexible, and efficient:

  1. Cluster can resize up or down freely.
  2. Resizing is well coordinated.
  3. Data can be moved via the most efficient SSTable writes.
  4. Clients aware of the cluster change and can redirect traffic only when the new whole new server are ready.

Eventual Consistency and Active-Active Replication

All Vasto stores can be used to read and write. The changes will be propagated to other replicas within a few milliseconds. Only the primary replica participate in the normal operations. The replica are participating when the primary replica is down, or in a different data center.

Vasto assumes the data already has the event time. It should be the time when the event really happens, not the time when the data is feed into Vasto system. If the system fails over to the replica partition, and there are multiple changes to one key, the one with latest event times will win.

Client APIs

See https://godoc.org/github.com/chrislusf/vasto/goclient/vs

Example

    // create a vasto client talking to master at localhost:8278
    vc := vs.NewVastoClient(context.Background(), "client_name", "localhost:8278")
    
    // create a cluster for keyspace ks1, with one server, and one copy of data.
    vc.CreateCluster("ks1", 1, 1)
    
    // get a cluster client for ks1
    cc := vc.NewClusterClient("ks1")

    // operate with the cluster client
    var key, value []byte
    cc.Put(key, value)    
    cc.Get(vs.Key(key))
    ...

    // change cluster size to 3 servers
    vc.ResizeCluster("ks1", 3)

    // operate with the existing cluster client
    cc.Put(key, value)    
    cc.Get(vs.Key(key))
    ...

Currently only basic go library is provided. The gateway is not ready yet.

vasto's People

Contributors

chrislusf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vasto's Issues

go build error

my enviroment is ubuntu, go 10.4.
vasto use the current master branch.
go build shows the below message:

google.golang.org/genproto/googleapis/rpc/status

../../../google.golang.org/genproto/googleapis/rpc/status/status.pb.go:111:28: undefined: proto.InternalMessageInfo

Crashed with nil pointer when putting a key to a cluster of 2 replicas and one replica is down. (not reproducible)

I0129 14:47:31.514926   23375 master_event_processor.go:56] [master] + client  from 127.0.0.1:62432 keyspace(clst)
E0129 14:50:26.104690   23375 retry.go:22] shard clst.1.0 normal follow 0.0 failed: pull changes: rpc error: code = Unavailable desc = transport is closing
E0129 14:50:26.105056   23375 retry.go:19] shard clst.1.1 normal follow 0.1 still failed: pull changes: rpc error: code = Unavailable desc = transport is closing
I0129 14:50:26.105249   23375 master_event_processor.go:61] [master] - client [store@:8379] from 127.0.0.1:62477 keyspace(clst)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x44642af]

goroutine 218 [running]:
github.com/chrislusf/vasto/cmd/store.(*storeServer).processPut(0xc000169500, 0xc00020ca00, 0xc000454820, 0x0)
	/vasto/cmd/store/process_put.go:31 +0x15f
github.com/chrislusf/vasto/cmd/store.(*storeServer).processRequest(0xc000169500, 0xc000328208, 0x4, 0xc0002f4f60, 0xc0002f4f30)
	/vasto/cmd/store/store_tcp_server.go:153 +0x2ab
github.com/chrislusf/vasto/cmd/store.(*storeServer).handleInputOutput(0xc000169500, 0xc0000ee440, 0x1e, 0x1e, 0x1e, 0x0, 0x0, 0x6c1e838, 0x0)
	/vasto/cmd/store/store_tcp_server.go:91 +0x121
github.com/chrislusf/vasto/cmd/store.(*storeServer).handleRequest(0xc000169500, 0x46a2440, 0xc000191980, 0x6c1e838, 0xc000196020, 0x0, 0x0)
	/vasto/cmd/store/store_tcp_server.go:71 +0x117
github.com/chrislusf/vasto/cmd/store.(*storeServer).handleConnection(0xc000169500, 0x46abbc0, 0xc000196020)
	/vasto/cmd/store/store_tcp_server.go:47 +0xea
github.com/chrislusf/vasto/cmd/store.(*storeServer).serveTcp.func1(0x46abbc0, 0xc000196020, 0xc000304050, 0xc000169500)
	/vasto/cmd/store/store_tcp_server.go:36 +0xf7
created by github.com/chrislusf/vasto/cmd/store.(*storeServer).serveTcp
	/vasto/cmd/store/store_tcp_server.go:27 +0x14b

vasto vs cassandra

@chrislusf
I would be obliged if you could compare between vasto and cassandra
-- why Vasto will be needed when we can run Cassandra with RocksDB (as engine)

Rocks-less version?

Hi, any plans for replacing RocksDb with pure Go version like Badger, Bolt, Bitcask,... ?

Distributed key-value store over RDMA

Recently I have been thinking about what to do with my Master's degree graduation project. Today, I came across this idea as mentioned in the title. Is it valuable? If so, is it feasible?

vasto bench fail

I tried to use the vasto bench to test it. first open one terminal for vasto master or server,then open the other for vasto bench,but vasto bench always showed as below and won't make any progress again:

benchmarking on cluster with master localhost:8278
put : --- [--------------------------------------------------------------------] 0% NaN ops/sec

The vasto master or server side just printed out the message such as:
I1013 10:05:21.397870 8502 master_server.go:41] Vasto master starts on :8278
I1013 10:05:30.134602 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54859
E1013 10:05:30.135152 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54859
I1013 10:05:30.135119 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54861
E1013 10:05:30.152217 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54861
I1013 10:06:31.644583 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54859
I1013 10:06:31.645002 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54861
I1013 10:06:39.134168 8502 master_grpc_server_for_client.go:30] + client 127.0.0.1:54864
E1013 10:06:39.134886 8502 master_grpc_server_for_client.go:96] master add client benchmarker: client key is already in use: benchmark:dc1:127.0.0.1:54864
I1013 10:06:53.839102 8502 master_grpc_server_for_client.go:41] - client 127.0.0.1:54864

I tried to use vasto bench -c1 parameter, but it did not change anything for the bench result.
Is there any settings I miss? In my knowledge, vasto server include both vasto store and master, so it should be fine to open a vasto server for vasto bench to test.
I just want to make a benchmark test about the vasto performance,could you provide detail instructions to benchmark vasto ?

Vasto primary/replica failure restore problem

I tried to test vasto primary/replica failure restore latency with create cluster ks1 dc1 2 2.
However,the created primary/replica store is both on the same store/process even I had added more than one store to the cluster,both primary/replica shard will shutdown in machine failure situation.
Is it intentional setting?
In that case,how can I test the primary/replica failure restore latency?

I manage to read the code,it seems that vasto client is only send its rocksDB requests to primary shard by default.
In case of primary hitch,vasto will update primary/replica using the PeriodicTask to synchronize the new rocksDB requests and replica router information to the client.
So the minimum failure restore latency will be time for primary/replica sync plus the PeriodicTask gap?
Could you explain the primary/replica sync mechanism if I miss some thing?

Understanding rebalancing of keys when new node is added or existing node is removed

This is a question for understanding data movement during adding or removing a node to cluster.

In my understanding, jump consistent hashing does not hash node on the ring. It seems like their is no ring concept. The output of hash is a number in [0, nodes) range which maps to a node.

If a new node is added to cluster, how will we know which all keys need to move to this node ?
Will we have to go to all nodes, read all keys and see if they belong to this new node ?

Can you point me to code which does this ?

In ring based consistent hashing, new node will be in ring and will copy keys from next node to itself. So, it may require going over all keys on only one node. However, in real world, one node maps to many buckets on the ring, so we may have to go to multiple nodes in ring based consistent hashing as well.

Hash sharding scale strategy question

Using consistent hash for sharding will take more time to migrate between shards,and range key queries would be hard to implement.
Would you consider key range sharding for another sharding strategy?

some wiki guard mistake,benchmark result. possibilities of other KV engine replacement?

// start one vasto shell
vasto shell
create cluster ks1 dc1 3 1
use keyspace ks1
desc ks1 dc1

delete cluster ks1 dc1

the last line should be delete

I get my bechmark on a i7 7550/16G laptop
time ./vasto bench --keyspace=ks1 -n 1024000 -c 16
benchmarking on cluster with master localhost:8278
put : 35s [===================================================================>] 100% 29145.12 ops/sec
put,get : 29058.5 op/s
Microseconds per op:
Count: 1024000 Average: 527.6445 StdDev: 371.17
Min: 0.0000 Median: 464.7338 Max: 26443.0000
put : 35s [====================================================================] 100% 29142.38 ops/sec
get : 31s [===================================================================>] 100% 32221.78 ops/sec
put,get : 32110.0 op/s
Microseconds per op:
Count: 1024000 Average: 470.7078 StdDev: 201.14
Min: 0.0000 Median: 464.2912 Max: 15578.0000
put : 35s [====================================================================] 100% 29142.38 ops/sec
get : 31s [====================================================================] 100% 32218.41 ops/sec
real 1m7.219s
user 0m43.063s
sys 1m17.438s

By the way,is it possible to replace the embed rocksdb with other KV engine?
For most of other KY engines,there is no equivalent Merge operation as Roscksdb‘s. Only some simple put/get/delete/iterate/batch write operations

go get vasto&chrislusf/gorocksdb fail

OS:ubuntu ,go version 1.10.4
go get github.com/chrislusf/vasto

github.com/chrislusf/gorocksdb

gopath/src/github.com/chrislusf/gorocksdb/options.go:1045:2: could not determine kind of name for C.rocksdb_options_set_allow_ingest_behind

I have successful go get&build tecbot/gorocksdb,
however I fail to go get chrislusf/gorocksdb and chrislusf/vasto

compile error~~how can I compile and benchmark vasto?

github.com/chrislusf/gorocksdb

exec: "gcc": executable file not found in %PATH%

github.com/chrislusf/vasto/cmd/master

cmd\master\master_server.go:56: cannot use ms (type *masterServer) as type pb.VastoMasterServer in argument to pb.RegisterVastoMasterServer:
*masterServer does not implement pb.VastoMasterServer (wrong type for CompactCluster method)
have CompactCluster("context".Context, *pb.CompactClusterRequest) (*pb.CompactClusterResponse, error)
want CompactCluster("golang.org/x/net/context".Context, *pb.CompactClusterRequest) (*pb.CompactClusterResponse, error)

./go.test.sh error

OS:ubuntu; gcc version 7.3.0; go 1.10.4; and rocksdb master branch with shared lib: librocksdb.so.5.17
after export LD_LIBRARY_PATH=/usr/local/lib, the go.test.sh is running now,
but still fail for the below message:

ok github.com/chrislusf/vasto/storage/binlog 0.097s coverage: 82.4% of statements
ok github.com/chrislusf/vasto/storage/codec 0.027s coverage: 94.4% of statements
free(): invalid pointer
FAIL github.com/chrislusf/vasto/storage/rocks 0.083s

vasto start server or store fail

go.test.sh run successful now.
when start vasto sever or store,I encounter below errors:

E1013 00:06:29.768819 8411 store_in_cluster.go:30] read file /tmp/go-build169079065/cluster.config: open /tmp/go-build169079065/cluster.config: no such file or directory

does it need certrain cluster.config files?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.