Giter VIP home page Giter VIP logo

suuchi's Introduction

Build Status codecov

Suuchi - सूचि

Having inspired from tools like Uber's Ringpop and a strong desire to understand how distributed systems work - Suuchi was born.

Suuchi is toolkit to build distributed data systems, that uses gRPC under the hood as the communication medium. The overall goal of this project is to build pluggable components that can be easily composed by the developer to build a data system of desired characteristics.

This project is in beta quality and it's currently running couple of systems in production setting @indix. We welcome all kinds of feedback to help improve the library.

Read the Documentation at http://ashwanthkumar.github.io/suuchi.

Suuchi in sanskrit means an Index1.

Presentations

Following presentations / videos explain motivation behind Suuchi

Notes

If you're getting ClassNotFound exception, please run mvn clean compile once to generate from the java classes from protoc files. Also, if you're using IntelliJ it helps to close the project when running the above command. It seems to auto-detect sources in target/ at startup but not afterwards.

Release workflow

Suuchi and it's modules follow a git commit message based release workflow. Use the script make-release.sh to push an empty commit to the repository which would trigger a release workflow on travis-ci. More information can be found at docs.

License

https://www.apache.org/licenses/LICENSE-2.0

suuchi's People

Contributors

ashwanthkumar avatar brewkode avatar dependabot[bot] avatar gsriram7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

suuchi's Issues

Parallel Write Replication

As part of #27 and #23 we get a SequentialReplication strategy, we should look to improvise it to support parallel writes on the nodes.

Documentation Setup using mkdocs

Some PRs have really nice commit messages and discussions (#27 being my favourite). We should start documenting the internal decisions and workings of the system in a markdown somewhere so we could generate a documentation if needed in a later point of time.

We should surface the examples much more as something like recipes which folks could just copy-paste and take it from there. Also with #12 we should write scaladocs for external facing APIs.

A quickstart guide for general users of the library would also be great. We've already made an attempt as part of #19 but we could do better.


  • Basic Documentation Infra
  • Add examples as recipes
  • Quick Start
  • Getting Started

Provide abstraction on ConsistentHashRing

Today we've a ConsistentHashRing which supports find, add and remove. While this is great, it would really help if we can build abstractions on top that would help us visualise it as a HashRing and token ranges between the nodes in the ring.

This would help us when we try out rebalancing of data among the nodes.

Get basic Cluster membership working

  • Membership
    • Ability to register a set of nodes onto a cluster
    • Tests with members going up & down
    • Ability to query any node and check for the available members

Upgrade to Scala 2.11.x

We're still on scala 2.10.4, while scala 2.11.x is nearing it's end by end of this year. We should probably upgrade to get a lot of compiler improvements.

Integrate Cluster

  • Integrate Cluster into Server abstraction built as part of #13
  • Support for Shard related information to be used for replication / rebalancing (#4)
  • Support dynamic rebalancing of data

Compiler errors on setup

Hi! I imported this project into intellij as sbt project and ran sbt compile. I get a lot of compiler errors across the project. Please help me with the steps to setup this repo. Thanks
image

Optimize scans in VersionedStore

Today as part of scanner and versionScanner we do a full store san and then filter for Data Keys or Version keys. This is very inefficient on very large stores and takes a long time. Instead we can always push the VERSION_KEY or DATA_KEY prefix to the underlying store there by reducing the search space of the whole scans.

Service must know if they're replication request or original request

When a RPC service is routed using ReplicationHandler, the service must have an ability to know if it's the replication invocation or the original invocation. We can use Context to store and retrieve that information.

This is especially useful when we're going to send a metric for that replication request, you don't want to end up doing double counts of that.

Improve / Add ScalaDoc for public facing APIs

Today we don't have much ScalaDocs written as part of our code. One of the things we might have to do for wider adoption is to write sensible documentation what the method does so it's easy for the developers to consume them.

Node Abstraction

Following needs to be done to get a working cluster with membership in place.

  • Build a Node abstraction
  • Compose membership, partitioner & services as part of it.
  • When the node is started, it should expose a listen port & should be able to handle GET / PUT requests.

Data Rebalancing

Add ability for the cluster to

  • scale out - addition of new nodes
  • scale down - nodes going down.
  • Anti Entropy - Refer #50 for more details

Tasks

  • It's relatively easy to do data rebalancing with ConsistentHashRing, but given the generic nature of the RoutingStrategy. We need to decide on the interactions of Rebalancer with that of RoutingStrategy
  • Implement / integrate membership with the Server. Once we've Membership, think about how would it integrate with Partitioner / RoutingStrategy for maintaining the list of nodes.

Dependencies

  • While re-balancing we need to know what keys to migrate which needs an Anti-entropy implementation #50

Setup Release workflow on Travis

Now that we've moved away from SnapCI, we need a way to perform auto releases from travis. With SnapCI it was easy as a single click of a stage in the pipeline. We need a different mechanism to handle this on the travis world.

Look at Error handling

Today we don't do any try .. catch anywhere in the project. gRPC seems to be throwing a lot of RuntimeException at different places. We need to track each of them and address them. This is more of an epic which will be on-going for a while.

For any PRs or commits that has discussions related to this, please tag them with this issue so it's automatically tracked.

Completely Consistent Reads

With #27 (and #23 ) we now have a pluggable replication in place. And we've special interceptors for writes, so it should be easy to do the same for reads as well.

We can do a digest query on all the nodes for starts to give out a very consistent response.

ToDOs

  • Membership using Atomix
  • Implementation a simple HandleOrForward using consistent hashing - #2 and #11
  • Build replication
  • Build rebalancing (and dynamic scale out)

Fix the VRecord.key in VersionedStore

Today we set the key of VRecord as V_. Now this is a little useless because if we want to get back the data from versionScanner().scan. It wouldn't be possible. We need to make sure we return the actual key in the VRecord along with the list of versions for that key.

Chained Write Replication

As part of #27 and #23 we get a SequentialReplication strategy, we should look to improvise it to support chained writes across the replica nodes.

Related - #30

Make the replicator pluggable

As part of #34 we seem to have Synchronous ParallelReplication support. But the withReplication on Server still has only SequentialReplicator hard-coded. We have to make it pluggable.

Namespacing & Scoping

Ensure that all classes have appropriate access control for us to split them later into modules.

HandleOrForward functionality.

Any operation(get or put) on the cluster, should translate to a HandleOrForward operation.
based on the key, the node should be able to localize if it can handle the request or it should be able to forward the request an appropriate node.

@ashwanthkumar below is my stab at at the contract. Thoughts?
def handleOrForward(key: K): Node

PS: Is this a good way to write down issues.

Partitioner implementation

trait Partitioner {
   def shard(r: Request): Array[Byte]
   def find(key: Array[Byte], replicaCount: Int): List[NodeInfo]
   def find(key: Array[Byte]) = find(key, 1)
}

Implement in-memory store

  • Implement an in-memory store that support the operations of the Store trait.
trait Store {
  def get(key: Array[Byte]) : Array[Byte]
  def put(key: Array[Byte], data: Array[Byte]) : Unit 
}

Utility to convert Store -> ShardedStore

For users that have been using Store - a single store instance directly (like RocksDB) and want to migrate to ShardedStore (introduced as part of #53) because you can effectively parallelize the writes across multiple stores in parallel (on the same node).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.