Giter VIP home page Giter VIP logo

greenstack's Introduction

Greenstack

Project Greenstack is intended to build a unified protocol all components of Couchbase may utilize to communicate with and within the cluster. The protocol is designed to address the shortcomings of the memcached binary protocol, and still be simple and efficient. See the one-pager for a full description of the rationale behind creating a new protocol.

High level description

The protocol is full duplex, meaning that both parties may send and receive packets at all times. This differs from the memcached binary protocol where we had a notion of a client and a server, where the client would send requests to the server and the server would only send responses to the requests. Being able to send notifications from memcached to clients/ns_server is something we’ve missed from the binary protocol. Examples for use cases could be:

  • I'm starting to run out of memory, please slow down.. Today we're just accepting data until we hit a threshold and we start refusing stuff... we could have told the client earlier to back off..
  • I got a new vbucket configuration map..
  • I've initiated a shutdown of the bucket.. expect it to go away..
  • I'm currently doing warmup, I’m done doing warmup
  • Send messages back to ns_server for things to pop up in the UI (ex: we’ve had n number of incorrect logins the last minute, is the app misconfigured or is are we under “attack”)

Note: This means that client authors needs to be prepared for receiving other packets than a response for their request when they try to fetch the next frame off the network (and handle the command; which could be nothing more than sending not supported/unknown command)

Another difference from the memcached binary protocol is that the end receiving the commands may process and send responses out of order (as long as the fence bit isn’t set).

On the wire description

Everything that flows on the wire belongs to a frame, which is built up by a header, an optional extra header (referred to as Flex header) and a body. The first header describes the packet, and contains all of the information a simple proxy should have to look at. In the mandatory header there is also an optional flex header that may be utilized to build extended features or carry extra information. Finally there is a body where the payload for the command should go. There is a design decision to keep the header and the flex header within our own format, and just use Google FlatBuffers/Protobuffers in the payload. This allows proxies or command dispatcher to transparently do work with the frame without having to decode the frame.

All values in the protocol are specified in network byte order.

Frame

All data visible on the network belongs to a frame with the following layout:

# bytes Description
4 Frame length. This is the number of bytes in the frame body (specified as n in the table).
n Frame body

Frame body

The frame body consist of the following layout.

# bytes Description
4 Opaque
2 Opcode
1*n One or more flag bytes. See description below for the definition of the values (and how to determine the amount of flag bytes).
[ 2 ] Status code (see flag description)
[ 4 ] Flex header length (see flag description)
[ n ] Flex header (see flag description)
Rest Command payload

The minimum size of a packet is 7 bytes for a request and 9 bytes for a response (11 bytes and 13 including the frame header).

Opaque

The opaque field is an array of 4 bytes the “client” in a request may use as a personal reference to identify the request in the response. The “server” for a request must provide the same value in the response.

Opcode

The opcode is the actual action requested. It is defined per component:

StartStopComponent
0 1023 Generic
1024 2047 Memcached
2048 3071 Clients
3072 65k Unassigned

All opcodes should be defined in a document with a description of the opcode and its stability tag (volatile, uncommitted, committed). See http://www.lehman.cuny.edu/cgi-bin/man-cgi?attributes+5 for a description of volatile, uncommitted and committed (TODO we need to adapt those terms to our own definition and update the document with that)

Description of flags

The flag section of the frame contains fields that is needed by protocol parsers that otherwise don’t need to decode the flex header. The flag section is defined in a future extensible way by allowing additional flag bytes to be defined. There should however be a justification for adding features as flags compared to adding it to the flex header.

First flag byte

Bit Description
0 Type - If cleared this is a request, if set this is a response packet. For response packets a status code is present following the flag section
1 Presence of a flex header
2 Fence - All operations sent in the same lane (see flex header) prior to the presence of this command must be completed before the response for this packet is sent. Do not start processing more commands until this command is completed.
3 More - There will be more frames for this logical unit
4 Quiet - Do not send a response for this packet unless an error occurs
5 Unassigned
6 Unassigned
7 Presence of a next flag byte (none is currently defined)

Status code

The status code is a two byte value. It is defined per component with the following range:

StartStopComponent
01023Generic
10242047Memcached
20483071Clients
307265kUnassigned

The current set of status codes is defined in include/libgreenstack/Status.h<

Flex header length

The flex header length field is only present if the bit for the presence of a flex header is set in the flags section. The flex header length contains the number of bytes in the flex header.

Flex header

The flex header allows for a future extensible way to pass arbitrary information to each command.

Format

Each entry in the flex header contains the following three mandatory attributes (the value may however be of 0 length).

Key (2 bytes) Length (2 bytes) Value (length bytes)

Keys

The following keys are currently defined. The length field in the table below defines the legal value for the length and must be present even if it is specified as a fixed width. No knowledge of the keys should be necessary in order to parse the flex header to pick out a certain field.

NOTE: We won't implement all of these initially, they're added here when we thought of them and they may be dropped or changed

ValueKeyLengthDescription
0x0000 Lane ID Variable Specifies a logical channel (this information shall be present in a response). A logical channel shares the authentication context with the "root channel", and inherits all of the other properties from the root channel upon creation time (but may change them to it's private values. Like switching buckets within a memcached connection). A barrier bit set applies to the lane, and there is no way to synchronize lanes (apart from setting the barrier bit on all of them and wait for all of the responses.. you have no control of the ordering you receive the responses for the barrier.
0x0004 TXID Variable A transaction identifier
0x0005 Priority 1 The priority for the request. Lower is better
0x0006 DCP-ID Variable
0x0007 VBucket ID 2 The vbucket the document belongs to
0x0008 Hash 4 The raw hash value used to map the request to the vbucket id. This is used in the case where you want to co locate multiple related documents in the same vbucket. In these cases you’d hash with a common key, and this field should contain the calculated hash value.
0x0009 Ignore unless executed before 4 Ignore the command unless it is executed before the specified time (@@@ todo spec this properly @@@)
0x000a Command timings variable Ignore the command unless it is executed before the specified time (@@@ todo spec this properly @@@)

Connection lifecycle

After connecting to the advertised port the actor connecting to the port must start by sending the HELLO command to the other end to identify itself. Note that you may receive commands from the other end before (or instead of) the HELLO reply in the case the other end have other information it needs to notify you about (e.g. out of resources, not ready to accept clients at this time etc).

After a successful HELLO exchange the you should normally authenticate to the other end if applicable.

Memcached connections

After you've identified yourself to memcached with the HELLO section you're not connected to any bucket, and have to run SELECT BUCKET in order to associate the connection with a bucket. By default you only have access to the "default" bucket, but if you authenticate to the server you may gain access to more buckets. This differs from the memcached binary protocol where running SASL AUTH authenticates and select the bucket.

Generic Commands

The following section defines all commands that are considered generic and may be implemented by multiple components.

I'll be using the term client and server in the following chapters. A client is the party that initiates the connect, and the server is the party that the client connects to. It may very well be two servers communicating with each other.

HELLO

Attribute Value
Opcode 0x0001
Request payload payload/HelloRequest.fbs
Response payload payload/HelloResponse.fbs
Visibility Internal and External
Interface stability Volatile
Privileged No

SASL Auth

Attribute Value
Opcode 0x0002
Request payload payload/SaslAuthRequest.fbs
Response payload payload/SaslAuthResponse.fbs
Visibility Internal and External
Interface stability Volatile
Privileged No

Memcached Commands

The following section defines all commands memcached provides.

Mutation

Mutations in Greenstack differs from the memcached binary protocol in the way that they're all implemented through a "mutation" command with an extra field in the command specifying the actual operation to perform. The motivation for doing this is that they all share the exact same code path within the memcached core, except for when the object is inserted into the underlying hash table. It makes it easier to extend the support for new kinds of mutation support if it means that we just have to update one location rather than updating the entire state machinery with a new opcode etc.

Subcommand Descripion
Add Add this document. Fail if it already exists (cas must be set to 0)
Set Store this document unconditionally
Replace Store this document only if a document with the same identifier already exists
Append Append the content of this document to the existing document.
Prepend Prepend the content of this document to the existing document.
Patch Apply the attached patch to the existing document
Attribute Value
Opcode 0x0405
Request payload payload/MutationRequest.fbs
Response payload payload/MutationResponse.fbs
Visibility Internal and External
Interface stability Volatile
Privileged No

SELECT BUCKET

Attribute Value
Opcode 0x0400
Request payload payload/SelectBucketRequest.fbs
Response payload None
Visibility Internal and External
Interface stability Volatile
Privileged No

LIST BUCKETS

Attribute Value
Opcode 0x0401
Request payload None
Response payload payload/ListBucketsResponse.fbs
Visibility Internal and External
Interface stability Volatile
Privileged No

List buckets will only list the buckets you have access to

Milestones

In order to track progress and make it easier for external parties to integrate with Greenstack, the development of the server follows the following plan.

It is a bit hard to set dates for some of the milestones at this time. As part of moving to Greenstack we'll be creating a detailed documentation of the new commands; may have to change the engine API and write unit tests. A rough estimate would be 1 1/2 day per command in average. When I've added support for a few of them its easier to predict the future (and the work involved..)

Milestone Date Content
1 20150601 Minimal support for Greenstack. Clients may connect and authenticate, and select buckets on the server. This milestone creates the _infrastructure_ in memcached used by the following milestones.
2 20150701 Allow for storing and retrieving data
3 20150801 Support all commands specified in "Normal client access" profile.
4 TBD Support all admin commands
5 TBD Support DCP
6 TBD Support out of order replies (with barrier bits for the lanes)
7 TBD Performance measurement and optimizations

Development plan

Minimal support for Greenstack protocol in the server

Allow for configuration of a dedicated port (done)

Greenstack is enabled by using protocol=greenstack for the interface entry in memcached.json.

Extend ns_server to enable greenstack protocol

ns_server needs to enable Greenstack protocol for a new ports (plain and SSL). This is targeted for Milestone 3.

Add support for parsing Greenstack frames in memcached (milestone 1)

try_read_command needs to be aware of the Greenstack protocol and dispatch the opcodes to the right underlying protocol handler. Initially we don't bother to try to be smart with respect to buffer handling (that's planned for milestone 7 with a potential move to bufferevents in libevent)

We need to refactor the current executor pattern in the memcached core so that both protocol reuse the same internal functions to implement commands.

More to come...

TODOs

  • For encode I should allow for an iovector (so I don't have to copy the payload twice)

How to build

Unix

mkdir build
cd build
cmake ..
gmake all test install

Windows

mkdir build
cd build
cmake -G "NMake Makefiles" ..
nmake all test install

Feedback

Please send feedback to [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.