Giter VIP home page Giter VIP logo

tikv's Introduction

TiKV is a distributed KV database powered by Rust.

Build Status

TiKV is a Distributed Key-Value Database which mainly refers to the design of Google Spanner and HBase, but much simpler (Don't depend on any distributed file system). We've implemented the Raft consensus algorithm in Rust and stored consensus state in RocksDB. It not only guarantees consistency for data but also makes use of placement driver to implement sharding (split && merge) and data migration automatically. The transaction model is similar to Google's Percolator, and with some performance improvements. In fact, we provide snapshot isolation (SI) and serializable snapshot isolation (SSI), and then externally consistent reads and writes in distributed transactions. Some details can see Tikv-server software stack. And some primary features list as follow.

  • Geo-Replication
    TiKV uses Raft to support Geo-Replication. We have ported [etcd's raft] (https://github.com/coreos/etcd/tree/master/raft) implementation to Rust. By using raft, Key-Value regions always stay consistent. Placement driver (or pd) is also important for Geo-Replication. Pd collects state information from heartbeats of all Raft groups' leaders regularly. According to load feedback and isolation judgment, pd can rebalance data automatically and smoothly. So Geo-Replication can be achieved, and makes TiKV always stable and balanced across regions.

  • Horizontal scalability
    By carefully designing Raft groups and placement driver, we enable horizontal scalability in TiKV. A region, similar to spanner’s directory, is the base unit of data movement in TiKV. A region is controlled by one Raft group and its size is limited to 64 MBs . To achieve horizontal scalability, a region can split into two regions when the data size exceeds the limit automatically. Placement driver keeps the metadata of all regions. To ensure horizontal scalability, we ensure the metadata of each region is only 100s of bytes. Together with other techniques, TiKV can easily scale up to hold 100s of TBs of data.

  • Consistent distributed transactions
    Similar to Google's Spanner, TiKV supports externally-consistent distributed transactions. Placement driver plugs in a set of TS system, therefore timestamp oracle will be allocated in global. When TiKV client send transaction request to TiKV server, server will check relavant meta from the pd's etcd, and then TS generate two valid timestamps in two separated handling phase implying start timestamps and commit timestamps. Before transaction ends, the start timestamp will be carrying on util the transaction is committed. Once transaction is committed, externally-consistent transactions succeed.

  • Coprocessor support
    TiKV will be accessiable and any application can develop on TiKV. We tend to provide a simple deployment solution soon after.

  • Working with TiDB
    TiKV implements a set horizontal scalable and externally-consistent transactions-supported distributed Key-Value Database. And TiDB is focus on supporting both traditional RDBMS and NoSQL. TiDB prodive semantic translation in accessing Database. TiKV will be the best performance accessed database system thanks to internal optimization.

Required rust version

This project requires rust nightly, otherwise project build will fail. We choose Rust instead of Go (with which we've developed TiDB ) to develop this project, because there are two critical facts: firstly, despite Go's excellent GC, it still introduces too much of latency time and instability; secondly, cgo's costs introduces another instable performance. Performance is our primary consideration, then development efficiency. So Rust comes to our sight.

Tikv-server software stack

This figure represents tikv-server software stack.

image

  • Placement driver: With tikv-server, Placement driver is the most important part which merges placement driver and zonemaster in google's Spanner. pd maintains metas of all regions via etcd. A Timestamp system plugs in pd, which provide time oracle in global.
  • Node:A physical node in cluster. Node id must be unique in global.
  • Store:A node has one or some stores. Generally a store involves one disk. Each store maps to different paths, and store id must be unique in global either. Multiple stores primarily support a plurality of disks in one node.
  • Region:Region is a logical concept. Key-Value datas are grouped by region. Region is the smallest unit of data movement, that's geographic-replication unit. Every region is supported by a raft group and region id must be unique in global.
  • Peer: Peer is a logical concept. It stands for a raft-worked participant in a raft group. A peer in raft-worked group maybe turn up three roles, which are candidate, leader and follower.

When node starts, the ids of node, store and region must be registered into pd as well as their metas. Leaders of raft groups regularly report region state to pd. Pd control split/merge between regions.

A store starts up a rocksdb. Store also record its meta and raft group information in local as well as pd has managed them all via etcd.

Contributing

See CONTRIBUTING for details on submitting patches and the contribution workflow.

License

TiKV is under the Apache 2.0 license. See the LICENSE file for details.

Acknowledgments

  • Thanks etcd for providing some great open source tools.
  • Thanks RocksDB for their powerful storage engines.
  • Thanks mio for providing metal IO library for Rust.
  • Thanks rust-clippy. We do love the great project.

tikv's People

Contributors

busyjay avatar c4pt0r avatar disksing avatar jonboulle avatar morefreeze avatar ngaut avatar overvenus avatar paladintyrion avatar qiuyesuifeng avatar shenli avatar siddontang avatar tiancaiamao avatar tigerzhang avatar xiang90 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.