Project Greenstack is intended to build a unified protocol all components of Couchbase may utilize to communicate with and within the cluster. The protocol is designed to address the shortcomings of the memcached binary protocol, and still be simple and efficient. See the one-pager for a full description of the rationale behind creating a new protocol.
The protocol is full duplex, meaning that both parties may send and receive packets at all times. This differs from the memcached binary protocol where we had a notion of a client and a server, where the client would send requests to the server and the server would only send responses to the requests. Being able to send notifications from memcached to clients/ns_server is something we’ve missed from the binary protocol. Examples for use cases could be:
- I'm starting to run out of memory, please slow down.. Today we're just accepting data until we hit a threshold and we start refusing stuff... we could have told the client earlier to back off..
- I got a new vbucket configuration map..
- I've initiated a shutdown of the bucket.. expect it to go away..
- I'm currently doing warmup, I’m done doing warmup
- Send messages back to ns_server for things to pop up in the UI (ex: we’ve had n number of incorrect logins the last minute, is the app misconfigured or is are we under “attack”)
Note: This means that client authors needs to be prepared for receiving other packets than a response for their request when they try to fetch the next frame off the network (and handle the command; which could be nothing more than sending not supported/unknown command)
Another difference from the memcached binary protocol is that the end receiving the commands may process and send responses out of order (as long as the fence bit isn’t set).
Everything that flows on the wire belongs to a frame, which is built up by a header, an optional extra header (referred to as Flex header) and a body. The first header describes the packet, and contains all of the information a simple proxy should have to look at. In the mandatory header there is also an optional flex header that may be utilized to build extended features or carry extra information. Finally there is a body where the payload for the command should go. There is a design decision to keep the header and the flex header within our own format, and just use Google FlatBuffers/Protobuffers in the payload. This allows proxies or command dispatcher to transparently do work with the frame without having to decode the frame.
All values in the protocol are specified in network byte order.
All data visible on the network belongs to a frame with the following layout:
# bytes | Description |
---|---|
4 | Frame length. This is the number of bytes in the frame body (specified as n in the table). |
n | Frame body |
The frame body consist of the following layout.
# bytes | Description |
---|---|
4 | Opaque |
2 | Opcode |
1*n | One or more flag bytes. See description below for the definition of the values (and how to determine the amount of flag bytes). |
[ 2 ] | Status code (see flag description) |
[ 4 ] | Flex header length (see flag description) |
[ n ] | Flex header (see flag description) |
Rest | Command payload |
The minimum size of a packet is 7 bytes for a request and 9 bytes for a response (11 bytes and 13 including the frame header).
The opaque field is an array of 4 bytes the “client” in a request may use as a personal reference to identify the request in the response. The “server” for a request must provide the same value in the response.
The opcode is the actual action requested. It is defined per component:
Start | Stop | Component |
---|---|---|
0 | 1023 | Generic |
1024 | 2047 | Memcached |
2048 | 3071 | Clients |
3072 | 65k | Unassigned |
All opcodes should be defined in a document with a description of the opcode and its stability tag (volatile, uncommitted, committed). See http://www.lehman.cuny.edu/cgi-bin/man-cgi?attributes+5 for a description of volatile, uncommitted and committed (TODO we need to adapt those terms to our own definition and update the document with that)
The flag section of the frame contains fields that is needed by protocol parsers that otherwise don’t need to decode the flex header. The flag section is defined in a future extensible way by allowing additional flag bytes to be defined. There should however be a justification for adding features as flags compared to adding it to the flex header.
Bit | Description |
---|---|
0 | Type - If cleared this is a request, if set this is a response packet. For response packets a status code is present following the flag section |
1 | Presence of a flex header |
2 | Fence - All operations sent in the same lane (see flex header) prior to the presence of this command must be completed before the response for this packet is sent. Do not start processing more commands until this command is completed. |
3 | More - There will be more frames for this logical unit |
4 | Quiet - Do not send a response for this packet unless an error occurs |
5 | Unassigned |
6 | Unassigned |
7 | Presence of a next flag byte (none is currently defined) |
The status code is a two byte value. It is defined per component with the following range:
Start | Stop | Component |
---|---|---|
0 | 1023 | Generic |
1024 | 2047 | Memcached |
2048 | 3071 | Clients |
3072 | 65k | Unassigned |
The current set of status codes is defined in include/libgreenstack/Status.h<
The flex header length field is only present if the bit for the presence of a flex header is set in the flags section. The flex header length contains the number of bytes in the flex header.
The flex header allows for a future extensible way to pass arbitrary information to each command.
Each entry in the flex header contains the following three mandatory attributes (the value may however be of 0 length).
Key (2 bytes) | Length (2 bytes) | Value (length bytes) |
---|
The following keys are currently defined. The length field in the table below defines the legal value for the length and must be present even if it is specified as a fixed width. No knowledge of the keys should be necessary in order to parse the flex header to pick out a certain field.
NOTE: We won't implement all of these initially, they're added here when we thought of them and they may be dropped or changed
Value | Key | Length | Description |
---|---|---|---|
0x0000 | Lane ID | Variable | Specifies a logical channel (this information shall be present in a response). A logical channel shares the authentication context with the "root channel", and inherits all of the other properties from the root channel upon creation time (but may change them to it's private values. Like switching buckets within a memcached connection). A barrier bit set applies to the lane, and there is no way to synchronize lanes (apart from setting the barrier bit on all of them and wait for all of the responses.. you have no control of the ordering you receive the responses for the barrier. |
0x0004 | TXID | Variable | A transaction identifier |
0x0005 | Priority | 1 | The priority for the request. Lower is better |
0x0006 | DCP-ID | Variable | |
0x0007 | VBucket ID | 2 | The vbucket the document belongs to |
0x0008 | Hash | 4 | The raw hash value used to map the request to the vbucket id. This is used in the case where you want to co locate multiple related documents in the same vbucket. In these cases you’d hash with a common key, and this field should contain the calculated hash value. |
0x0009 | Ignore unless executed before | 4 | Ignore the command unless it is executed before the specified time (@@@ todo spec this properly @@@) |
0x000a | Command timings | variable | Ignore the command unless it is executed before the specified time (@@@ todo spec this properly @@@) |
After connecting to the advertised port the actor connecting to the port
must start by sending the HELLO
command to the other end to identify
itself. Note that you may receive commands from the other end before
(or instead of) the HELLO
reply in the case the other end have
other information it needs to notify you about (e.g. out of resources,
not ready to accept clients at this time etc).
After a successful HELLO
exchange the you should normally authenticate
to the other end if applicable.
After you've identified yourself to memcached with the HELLO
section
you're not connected to any bucket, and have to run SELECT BUCKET
in
order to associate the connection with a bucket. By default you only
have access to the "default" bucket, but if you authenticate to the
server you may gain access to more buckets. This differs from the
memcached binary protocol where running SASL AUTH
authenticates
and select the bucket.
The following section defines all commands that are considered generic and may be implemented by multiple components.
I'll be using the term client and server in the following chapters. A client is the party that initiates the connect, and the server is the party that the client connects to. It may very well be two servers communicating with each other.
Attribute | Value |
---|---|
Opcode | 0x0001 |
Request payload | payload/HelloRequest.fbs |
Response payload | payload/HelloResponse.fbs |
Visibility | Internal and External |
Interface stability | Volatile |
Privileged | No |
Attribute | Value |
---|---|
Opcode | 0x0002 |
Request payload | payload/SaslAuthRequest.fbs |
Response payload | payload/SaslAuthResponse.fbs |
Visibility | Internal and External |
Interface stability | Volatile |
Privileged | No |
The following section defines all commands memcached provides.
Mutations in Greenstack differs from the memcached binary protocol in the way that they're all implemented through a "mutation" command with an extra field in the command specifying the actual operation to perform. The motivation for doing this is that they all share the exact same code path within the memcached core, except for when the object is inserted into the underlying hash table. It makes it easier to extend the support for new kinds of mutation support if it means that we just have to update one location rather than updating the entire state machinery with a new opcode etc.
Subcommand | Descripion |
---|---|
Add | Add this document. Fail if it already exists (cas must be set to 0) |
Set | Store this document unconditionally |
Replace | Store this document only if a document with the same identifier already exists |
Append | Append the content of this document to the existing document. |
Prepend | Prepend the content of this document to the existing document. |
Patch | Apply the attached patch to the existing document |
Attribute | Value |
---|---|
Opcode | 0x0405 |
Request payload | payload/MutationRequest.fbs |
Response payload | payload/MutationResponse.fbs |
Visibility | Internal and External |
Interface stability | Volatile |
Privileged | No |
Attribute | Value |
---|---|
Opcode | 0x0400 |
Request payload | payload/SelectBucketRequest.fbs |
Response payload | None |
Visibility | Internal and External |
Interface stability | Volatile |
Privileged | No |
Attribute | Value |
---|---|
Opcode | 0x0401 |
Request payload | None |
Response payload | payload/ListBucketsResponse.fbs |
Visibility | Internal and External |
Interface stability | Volatile |
Privileged | No |
List buckets will only list the buckets you have access to
In order to track progress and make it easier for external parties to integrate with Greenstack, the development of the server follows the following plan.
It is a bit hard to set dates for some of the milestones at this time. As part of moving to Greenstack we'll be creating a detailed documentation of the new commands; may have to change the engine API and write unit tests. A rough estimate would be 1 1/2 day per command in average. When I've added support for a few of them its easier to predict the future (and the work involved..)
Milestone | Date | Content |
---|---|---|
1 | 20150601 | Minimal support for Greenstack. Clients may connect and authenticate, and select buckets on the server. This milestone creates the _infrastructure_ in memcached used by the following milestones. |
2 | 20150701 | Allow for storing and retrieving data |
3 | 20150801 | Support all commands specified in "Normal client access" profile. |
4 | TBD | Support all admin commands |
5 | TBD | Support DCP |
6 | TBD | Support out of order replies (with barrier bits for the lanes) |
7 | TBD | Performance measurement and optimizations |
Greenstack is enabled by using protocol=greenstack
for the interface
entry in memcached.json
.
ns_server needs to enable Greenstack protocol for a new ports (plain and SSL). This is targeted for Milestone 3.
try_read_command
needs to be aware of the Greenstack protocol and
dispatch the opcodes to the right underlying protocol handler. Initially
we don't bother to try to be smart with respect to buffer handling (that's
planned for milestone 7 with a potential move to bufferevents in libevent)
We need to refactor the current executor pattern in the memcached core so that both protocol reuse the same internal functions to implement commands.
- For encode I should allow for an iovector (so I don't have to copy the payload twice)
mkdir build
cd build
cmake ..
gmake all test install
mkdir build
cd build
cmake -G "NMake Makefiles" ..
nmake all test install
Please send feedback to [email protected]