hibari / gdss-brick Goto Github PK

Home Page: http://hibari.github.com/hibari-doc/

License: Other

Erlang 100.00%

gdss-brick's Introduction

Welcome to Hibari

A Distributed, Consistent, Ordered Key-Value Store

Hibari is a distributed, ordered key-value store with strong consistency guarantee. Hibari is written in Erlang and designed for being:

Fast, Read Optimized: Hibari serves read and write requests in short and predictable latency. Hibari has excellent performance especially for read and large value operations
High Bandwidth: Batch and lock-less operations help to achieve high throughput while ensuring data consistency and durability
Big Data: Can store Peta Bytes of data by automatically distributing data across servers. The largest production Hibari cluster spans across 100 of servers
Reliable: High fault tolerance by replicating data between servers. Data is repaired automatically after a server failure

Hibari is able to deliver scalable high performance that is competitive with leading open source NOSQL (Not Only SQL) storage systems, while also providing the data durability and strong consistency that many systems lack. Hibari's performance relative to other NOSQL systems is particularly strong for reads and for large value (> 200KB) operations.

As one example of real-world performance, in a multi-million user webmail deployment equipped with traditional HDDs (non SSDs), Hibari is processing about 2,200 transactions per second, with read latencies averaging between 1 and 20 milliseconds and write latencies averaging between 20 and 80 milliseconds.

Distinct Features

Unlike many other distributed databases, Hibari uses "chain replication methodology" and delivers distinct features.

Ordered Key-Values: Data is distributed across "chains" by key prefixes, then keys within a chain are sorted by lexicographic order
Always Guarantees Strong Consistency: This simplifies creation of robust client applications
- Compare and Swap (CAS): key timestamping mechanism that facilitates "test-and-set" type operations
- Micro-Transaction: multi-key atomic transactions, within range limits
Custom Metadata: per-key custom metadata
TTL (Time To Live): per-key expiration times

Travis CI Status

http://travis-ci.org/hibari/hibari-ci-wrapper

Branch	Erlang/OTP Versions	Remarks
master	17.5, R16B03-1
dev	18.1, 17.5, R16B03-1
hibari-gh54-thrift-api	18.1, 17.5, R16B03-1
gbrick-gh17-redesign-disk-storage	18.1, 17.5	no tests, compile only

News

Apr 5, 2015 - Hibari v0.1.11 Released. Release Notes
- Update for Erlang/OTP 17 and R16. (Note: Erlang/OTP releases prior to R16 are no longer supported)
- Update external libraries such as UBF to the latest versions
- Enhanced client API: server side rename and server side timestamp
- New logging format. Introduce Basho Lager for more traditional logging that plays nicely with Unix logging tools like logrotate and syslog
Feb 4, 2013 - Hibari v0.1.10 Released. Release Notes
- A bug fix in Python EBF Client
- Update for Erlang/OTP R15
- Support for building on Ubuntu, including ARMv7 architecture
- Remove S3 and JSON-RPC components from Hibari distribution. They will become separate projects
Older News

Quick Start

Please read Getting Started section of Hibari Application Developer Guide.

Hibari Documentation

They are a bit outdated -- sorry, but documentation rework is planned for Hibari v0.6.

Mailing Lists

Hibari Clients

As of Hibari v0.1 (since year 2010), only the native Erlang client is used in production. All other client APIs (Thrift, JSON-RPC, UBF, and S3) are still in proof of concept stage and only implement basic operations.

If you need a client library for other programming language, please feel free to post a request at the Hibari mailing list.

Supported Platforms

Hibari is written in pure Erlang/OTP and runs on many Unix/Linux platforms.

Please see the Supported Platforms page in Hibari Wiki for details.

Roadmap

Please see the Roadmap page in Hibari Wiki for the planned features for Hibari v0.3, v0.5, and v0.6.

Hibari's Origins

Hibari was originally written by Cloudian, Inc., formerly Gemini Mobile Technologies, to support mobile messaging and email services. Hibari was open-sourced under the Apache Public License version 2.0 in July 2010.

Hibari has been deployed by multiple telecom carriers in Asia and Europe. Hibari may lack some features such as monitoring, event and alarm management, and other "production environment" support services. Since telecom operator has its own data center support infrastructure, Hibari's development has not included many services that would be redundant in a carrier environment.

We hope that Hibari's release to the open source community will close those functional gaps as Hibari spreads outside of carrier data centers.

What does Hibari mean?

The word "Hibari" means skylark in Japanese; the Kanji characters stand for "cloud bird".

License

Copyright (c) 2005-2017 Hibari developers.  All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Note for License

Hibari has decided to display "Hibari developers" as the copyright holder name in the source code files and manuals. Actual copyright holder names (contributors) will be listed in the AUTHORS file.

EOF

gdss-brick's People

Contributors

Stargazers

Watchers

Forkers

alepharchives tsloughter

gdss-brick's Issues

Add attrib and exp_time directive flags to rename/6

rename/6 operation introduced by #2 does not preserve custom properties (metadata) and expiration time in old key value. This won't fit all use cases, and I personally feel more comfortable to preserve such info by default.

Add an operation flag(s) to rename/6 so that user can choose if she wants to preserve the matadata and/or expiration time.

For example:

{matadata_directive, keep} -- default
{metadata_directive, replace}
{exp_directive, keep} -- default
{exp_directive, replace}

Target milestone: v0.3

(Edit: Mar 16) Change the subject from "Add metadata directive ..." to "Add attrib and exp_time directive ..."

set and replace txn doesn't fail (as expected) ...

Generally speaking, the expectation for Hibari microtransactions is to be all or nothing. If one of the operations fails, the remaining operations should have no effect. Unfortunately, it has been confirmed that set and replace operations do not follow this expectation with certain types of timestamp inputs.

A newly added (manual) unit test illustrates the unexpected behavior:

https://github.com/hibari/gdss-admin/blob/dev/test/eunit/brick_txn_tests.erl

Further analysis is required to better understand the possible impacts to Hibari's application callers.

Experimental brick server alternative with on-disk metadata DB

Create an experimental, alternative brick server module of brick_ets that does not use an in-memory ETS table (ctab) to store metadata part of key-values but uses on-disk (HyperLevelDB) metadata DB introduced by GH17 redesigning disk storage.

The current brick_ets only reads from HyperLevelDB during start up and stores copies of all keys in in-memory ETS table, but this alternative module will not use the ETS table and read from HyperLevelDB for each read request from a client. It may cache some frequently used metadata on a 2Q cache (a refinement of LRU cache with scan resistant) available as Erlang NIF: arekinath/e2qc.

I have a project that requires low memory footprint and does not demand super low read latency nor in-memory key scan. A couple of months ago, I tried to load 50 million key-values to single-node Hibari but top command showed it took 9.149 GB for RES memory, which was not acceptable. One reason for that was the keys were big (53 bytes), so I will need to unload all or majority of metadata from the memory.

I believe an ultimate goal of redesigning disk storage will be having tired storage per Hibari table because we can use micro transaction within a brick. But for now, I want to quickly put together on-disk metadata storage based brick server module.

Store small values in store-tuples instead of blob hunk logs

For both in-memory and disk space efficiency, Hibari v0.3 should store small values (less than 64 bytes or so) in in-memory store-tuples instead of blob hunk logs.

Here are some comparisons in R16B03, 64 bit Erlang VM:

Hibari v0.3

storage location record #w:
e.g.

#w{wal_seqnum=1000, wal_hunk_pos=1000000000,
   private_seqnum=1000, private_hunk_pos=1000000000, 
   val_offset=50, val_len=200}

64 bytes (in-memory) [*1]
31 bytes (on-disk) [*2]

storage location record #p:
e.g.

#p{seqnum=1000, hunk_pos=1000000000,
   val_offset=50, val_len=200}

48 bytes (in-memory)
21 bytes (on-disk)

blob hunk:

byte_size(Value) + hunk overhead
hunk overhead varies because Hibari v0.3 will try to store multiple blob values in one hunk (minimum 4KB or so).
The overhead of one hunk (type <<"p">>) is (30 + 4 * number of blobs) bytes (including 16 bytes md5 hash)
e.g. value blob size = 4 bytes, one hunk holds 500 values.
- hunk overhead per blob = (30 + 4 * 500) / 500 = 4.06 bytes. (101.5%)
e.g. value blob size = 64 bytes, one hunk holds 60 values.
- hunk overhead per blob = (30 + 4 * 60) / 60 = 4.5 bytes. (7.0%)

(FYI) Hibari v0.1.x

storage location tuple:
e.g. {1000,1000000000}

24 bytes (in-memory)
13 bytes (on-disk)

blob hunk:

byte_size(Value) + 37 bytes as hunk overhead (including 16 bytes md5 hash)

Footnotes:

*1: Calculated by the followings:

> Wal = #w{wal_seqnum=1000, wal_hunk_pos=1000000000,
           private_seqnum=1000, private_hunk_pos=1000000000,
           val_offset=50, val_len=200}.
> erts_debug:size(Wal) * erlang:system_info(wordsize).
64

*2: Calculated by the followings:

> byte_size(term_to_binary(Wal)).
31

CL29834 - squidflash_prime1 could crash hlog common when do_sync is false

squidflash_prime1 always calls bigdata_dir_get_val with do_sync=true. However bigdata_dir_get_val could get eof when the table is configured as do_sync=false. When this happens, squidflash_prime1 calls sequnce_file_is_bad to crash the hlog common server.

Unit test brick_test0:t1 sometimes fails due to this bug.

brick_server new client API - copy/6

To help a client application to implement "copy on write" snapshot feature, add the following lightweight server-side operations. These operations should be very efficient because they are metadata-only operations and do not have to read/copy the value on a disk.

copy/6 - copy the value and user metadata to other key(s).
- copy(table(), From::key(), To::key(), exp_time(), [do_op_flag()], timeout())
- copy(table(), From::key(), To::[key()], exp_time(), [do_op_flag()], timeout())

Limitation:

From::key() and To:key() must have the same key prefix (they must be stored in the same chain.)

Design Notes:

To be a lightweight operation, use in-memory operation ?KEY_SWITCHAROO to copy only the pointer to the value.
When an hlog file containing From key is frozen, do not to use ?KEY_SWITCHAROO. Instead, do a disk-based operation; load the value of From key from an hlog on a disk and put the value to To key(s). More details on hibari/hibari#33 (comment)
Maybe we want remove rename/6 operation from brick_server? The same thing can be done by a micro-transaction: do([txn, copy(OldKey, NewKey), delete(OldKey)]) Note that the current implementation of rename/6 is not an equivalent of the micro-transaction but a simple do operation: do([copy(OldKey, NewKey), delete(OldKey)]) One good thing to keep rename/6 is that it has its own operation counter for the brick operation statistics, but it does not seem to be a must-have feature.

Refactor #state record in brick_ets

From Hibari Contributor's Guide - Draft
https://github.com/hibari/hibari-doc/blob/master/src/hibari/hibari-contributor-guide.en.txt#L729

NOTE: The #state record is probably too big. Profiling suggests
that a substantial amount of CPU time is being spent in
erlang:setelement(); I'm guessing that's related to #state record
updates, but I'm not 100% certain. There are about 43 members in that
record, so refactoring by moving some items (e.g. operation counts
like n_add) to the process dictionary or ETS is likely a good idea.

gmt_hlog families should track each hlog's location (short-term v.s. long-term)

During code review, I found that gmt_hlog_common and gmt_hlog_local don't track each hlog's location (if it's is short-term or long-term storage) so gmt_hlog:open_log_file/2 has to call file:open/2 twice if the hlog has been moved to the long-term storage.

https://github.com/hibari/gdss-brick/blob/master/src/gmt_hlog.erl#L1104

open_log_file(Dir, N, Options) ->
    Path = log_file_path(Dir, N),
    case file:open(Path, [binary, raw|Options]) of
        {error, enoent} when N > 0 ->
            %% Retry: perhaps this file was moved to long-term storage?
            ?DBG_TLOGx({open_log_file, Dir, N, Options, Path, retry}),
            open_log_file(Dir, -N, Options);
        Res ->
            ?DBG_TLOGx({open_log_file, Dir, N, Options, Path, Res}),
            Res
    end.

This is not an optimal solution. Though moving an hlog is done by a separate process and this kind of code block can't be removed completely, they should remember the locations so they can do better in next time.

brick_server new client API - rename

Rename a OldKey/Value pair to Key/Value pair in a brick, failing if OldKey does not already exist or if Key already exists. Similar to get_many, the rename API only works for keys of the same chain.

valid return formats for get_many

Spec for get_many indicates valid success return formats as:

{ok, {[{key(), ts()}], boolean()}} |
{ok, {[{key(), ts(), flags_list()}], boolean()}} |
{ok, {[{key(), ts(), val(), time_t(), flags_list()}], boolean()}} |

However, it seems like there should also be:

{ok, {[{key(), ts(), val()}], boolean()}} |

This would be the format if the request used neither the witness flag nor the get_all_attribs flag.

brick_server client API enhancement - 'reversed' option for get_many/5

As of Hibari 0.1.11, get_many/5 results are always sorted in key's alphabetical ascending order. Add 'reversed' option that makes the results to be sorted in reversed (descending) order.

duplicative argument + flag for 'get_many'

In the current implementation of get_many, the MaxNum argument is duplicative with the {max_num, integer} flag.

brick_server new client API - sync_wal (per chain)

I have not figured out what would be the best API set to do this, but I am thinking to add sync_wal operation which is analogue of fsync in file system. (wal: write ahead log, aka common hlog in Hibari)

As of v0.1.x, Hibari has option to specify whether enabling group commit or not only per table level. sync_wal operation (and maybe sync_wal do flag for write operations too) will make it possible to flush OS's dirty write buffer of the common hlog to a stable storage (HDD or SSD) only when necessary.

They might look like:

sync_wal/3
- flush dirty write buffer of the common hlog to a disk, on each chain of bricks assigned to binary key prefix(es).
- sync_wal(table(), key_prefix()|[key_prefix()], timeout())
do_op_flag sync_wal | {sync_wal, boolean()}
- flag for a do batch operation.

brick_server client API enhancement - 'include_start_key' option for get_many/5

As of Hibari 0.1.11, get_many/5 is start-key-exclusive; a result set does not include StartKey but starts with the very next key.

Add 'include_start_key' option to make get_many/5 start-key-inclusive.

migration sweep key boundary code review

I'm currently reviewing some code around the migration sweep keys. I've noticed some inconsistency for 2 particular edge cases below. As far as I can see, they are not critical issues and don't cause bad behavior.

brick_ets.erl (lines 1050 to 1062)

{_, {SweepA, SweepZ}} ->
    %% Hint: we know Rs /= [].
    %%
    %% Now we need to check ... if the first key is =< than SweepA,
    %% and if the last key is greater than SweepZ, then we need to
    %% filter out all keys >= SweepZ.  This is case 'F', see above.
    Ks = [element(1, Tuple) || Tuple <- Rs],
    case {hd(Ks) =< SweepA, lists:last(Ks) >= SweepZ} of
        {true, true} ->
            Rs2 = lists:takewhile(
                    fun(T) when element(1, T) =< SweepZ -> true;
                       (_)                              -> false
                    end, Rs),

The comment and the code are different. Note that SweepA refers to the last key that has been fully migrated and SweepZ is the last key in the range of keys currently being swept. Ie, keys being swept are those greater than SweepA and less than or equal to SweepZ.

With this in mind, I think the comment should read:
- %% filter out all keys >= SweepZ. This is case 'F', see above.
+ %% filter out all keys > SweepZ. This is case 'F', see above.

And the code should be changed as follows
- case {hd(Ks) =< SweepA, lists:last(Ks) >= SweepZ} of
+ case {hd(Ks) =< SweepA, lists:last(Ks) > SweepZ} of

brick_server.erl (lines 4273 to 4284)

chain_all_keys_outside_sweep_zone(RdWr_Keys, S) ->
if not is_record(S#state.sweepstate, sweep_r) ->
?DBG_MIGRATEx({chain_all_keys_outside_sweep_zone, S#state.name,
not_sweeping, RdWr_Keys}),
{true, RdWr_Keys};
true ->
{SweepA, SweepZ} = get_sweep_start_and_end(S),
?DBG_MIGRATEx({chain_all_keys_outside_sweep_zone, S#state.name,
SweepA, RdWr_Keys, SweepZ}),
case [x || {_RdWr, K} <- RdWr_Keys, SweepA =< K, K =< SweepZ] of
[] ->
{true, RdWr_Keys};

This code should generate the empty list in the case list comprehension at 4282 if all the keys are outside of the sweep range.

I think the below change is more accurate.

-            case [x || {_RdWr, K} <- RdWr_Keys, SweepA =< K, K =< SweepZ] of
+           case [x || {_RdWr, K} <- RdWr_Keys, SweepA < K, K =< SweepZ] of

Note: both of these cases don't seem to cause any problems as far as I can see. Ie, no data loss is going to happen. But someone else should also take a look.

Tom.

Add attrib and exp_time directive flags to replace/6 and set/6

Like issue #13 for rename/6 operation, add operation flags {attrib_directive, keep | replace} and {exp_time_directive, keep | replace} to replace/6 and set/6 operations.

Unlike rename/6, make the default values to replace for those operations. I feel replace is more natural than keep for them, and this also provides backward compatibility.

Redesign disk storage and checkpoint/scavenger processes

Redesign and re-implement disk storage and maintenance processes to address the issues Hibari is having right now (v0.3RC1).

Issues

Rename operation broke the scavenger process (hibari#33)
The scavenger process has major performance impact to read/write operations
metadata for all key-values has to be kept in memory

Disk Storage

common-log files (write-ahead-log)
local log files (brick private metadata store)
long-term log files (value store)
live-hunk location files (scavenger work files)

Maintenance Processes

common-log write-back process
checkpoint process
scavenger

Add an API function to Chain Monitor to get the current best brick of a chain

This item is related to hibari/hibari-doc#13

There will be some time that we want to know the current best brick of a chain for system administration purpose. To achieve this from Erlang code, the code has to call brick_chainmon:calculate_best_first_brick(#state{}). However this function is not exported and also requires knowledge of Chain Monitor's internal #state record.

Instead of simply exposing that function, create another function, which doesn't require the knowledge of internal #state record.

brick_chainmon:calculate_best_brick(chain_name())

Once this function is added, we could add an hibair-admin command to get the best brick.

brick_default_data_dir does not take an absolute path

Hibari v0.3 and earlier

Brick server won't start if brick_defalt_data_dir is not a relative path from Hibari's installation directory (e.g. /data/brick instead of data/brick).

sys.config

 {gdss_brick,
  [{brick_default_data_dir, "/data/brick"},

hibari-admin bootstrap

./tmp/hibari/bin/hibari-admin bootstrap
RPC(bootstrap) to '[email protected]' failed: {'EXIT',
                                              {{badmatch,
                                                {error,
                                                 {[],
                                                  [{bootstrap_copy1,
                                                    '[email protected]'}]}}},
                                               [{brick_admin,bootstrap1,9,
                                                 [{file,"src/brick_admin.erl"},
                                                  {line,2078}]},
                                                {rpc,
                                                 '-handle_call_call/6-fun-0-',
                                                 5,
                                                 [{file,"rpc.erl"},
                                                  {line,203}]}]}}

console.log

2013-03-24 18:54:40.965 [error] <0.611.0> CRASH REPORT Process commonLogServer_hlog with 0 neighbours exited with reason: no match of right hand value {error,enoent} in gmt_hlog:write_permanent_config_maybe/2 line 629 in gen_server:init_it/6 line 328
2013-03-24 18:54:40.965 [error] <0.610.0>@gmt_hlog_common:init:233 init error: error {badmatch,{error,{{badmatch,{error,enoent}},[{gmt_hlog,write_permanent_config_maybe,2,[{file,"src/gmt_hlog.erl"},{line,629}]},{gmt_hlog,init,1,[{file,"src/gmt_hlog.erl"},{line,493}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}} at [{gmt_hlog_common,init,1,[{file,"src/gmt_hlog_common.erl"},{line,196}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]
2013-03-24 18:54:40.978 [info] <0.491.0> Application gdss_brick started on node '[email protected]'
2013-03-24 18:54:40.978 [info] <0.491.0> Application gdss_client started on node '[email protected]'
2013-03-24 18:54:40.979 [info] <0.491.0> Application gdss_ubf_proto started on node '[email protected]'
2013-03-24 18:54:40.979 [info] <0.639.0>@brick_admin:start:305 normal start, Args = []
2013-03-24 18:54:41.021 [info] <0.491.0> Application gdss_admin started on node '[email protected]'
2013-03-24 18:54:50.098 [info] <0.666.0>@brick_server:init:1429 brick_server:init preprocess [#Fun<brick_server.3.7001373>]
2013-03-24 18:54:50.098 [info] <0.666.0>@brick_ets:init:283 top of init: bootstrap_copy1, [{implementation_module,brick_ets},{default_data_dir,"/data/brick"}]
2013-03-24 18:54:50.099 [info] <0.669.0>@gmt_hlog_local:get_or_start_common_log:497 Trying to start commonLogServer
2013-03-24 18:54:50.252 [error] <0.670.0>@gmt_hlog_common:init:233 init error: error {badmatch,{error,{{bad7atch,{error,enoent}},[{gmt_hlog,write_permanent_config_maybe,2,[{file,"src/gmt_hlog.erl"},{line,629}]},{gmt_hlog,init,1,[{file,"src/gmt_hlog.erl"},{line,493}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}}} at [{gmt_hlog_common,init,1,[{file,"src/gmt_hlog_common.erl"},{line,196}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]
2013-03-24 18:54:50.252 [error] <0.671.0> CRASH REPORT Process commonLogServer_hlog with 0 neighbours exited with reason: no match of right hand value {error,enoent} in gmt_hlog:write_permanent_config_maybe/2 line 629 in gen_server:init_it/6 line 328
2013-03-24 18:54:50.257 [error] <0.669.0> CRASH REPORT Process bootstrap_copy1_hlog with 0 neighbours exited with reason: no match of right hand value {error,enoent} in brick_server:replace_file_sync/3 line 5517 in gen_server:init_it/6 line 328
2013-03-24 18:54:50.259 [error] <0.666.0> CRASH REPORT Process bootstrap_copy1 with 0 neighbours exited with reason: no match of right hand value {error,{{badmatch,{error,enoent}},[{brick_server,replace_file_sync,3,[{file,"src/brick_server.erl"},{line,5517}]},{gmt_hlog_local,do_advance_seqnum,2,[{file,"src/gmt_hlog_local.erl"},{line,474}]},{gmt_hlog_local,init,1,[{file,"src/gmt_hlog_local.erl"},{line,320}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}} in brick_ets:init/1 line 321 in gen_server:init_it/6 line 328
2013-03-24 18:54:58.340 [error] <0.663.0>@brick_admin:poll_brick_status2:2200 poll_brick_status2: bootstrap_copy1 '[email protected]' -> {'EXIT',{noproc,{gen_server,call,[{bootstrap_copy1,'[email protected]'},{status},100]}}}
2013-03-24 18:54:58.345 [info] <0.663.0>@brick_admin:start_standalone_brick:1038 brick bootstrap_copy1 '[email protected]' started: {error,{'EXIT',{noproc,{...}}}}

brick_server new client API - asynchronously increment counter

Add set_counter/5 and increment_counter/5 to brick_server.

increment_counter/5 is an asynchronous, write-only operation for good performance. It does not return an updated counter value but only returns a timestamp (because it does not read the current value from a disk). You can use a separate get/4 or get_many/5to get the value, which will trigger a disk read.

set_counter/5
- set_counter(table(), key(), Value::integer()|packed_integer(), [must_exist|must_not_exist|do_op_flag()], timeout()) -> {ok, timestamp()} | {error, reason()}
increment_counter/5
- increment_counter(table(), key(), Increment::integer()|packed_integer(), [must_exist|must_not_exist|do_op_flag()], timeout()) -> {ok, timestamp()} | {error, incompatible_type|reason()}
-type packed_integer():: fixed_length_integer_array()

Notes:

If key has a non integer / non packed-integer type, it returns an error incompatible_type.
packed_integer will have a certain size limit. brick_server will not provide partial read/update to a packed_integer, so having too big packed_integer will have a performance impact.

Design Notes:

Key needs a metadata to indicate its type: integer() or packed_integer()
If the key has value_in_ram option, increment the value in place.
If not, only keep a diff from the value at the latest timestamp in RAM
(of course, each increment operation writes an entry to the WAL for recovery). Then, some time later, apply the diff to the value. (e.g. on a get/5 or wait for a certain period of time (eviction) to collates a series of increments).
Try to make packed_integer a sparse array to save storage space.

Hibari's quota roots preprocessor doesn't support zero-length values properly

Hibari's quotas_preprocess plugin cannot handle empty values (i.e. <<>>) properly. This appears to be an original defect with Hibari's quota root implementation.

Temporary workaround is to disable all preprocess methods by configuration:
%%
%% GDSS Brick config
%%
{gdss_brick,
[{brick_default_data_dir, "data/brick"},
{brick_preprocess_method, "none"} << HERE
]},

The next Hibari release will have "none" and/or "ssf_only" as the default value. This quota root implementation is already a target to be deprecated.

brick_server client API enhancement - get/4 with byte range option

Add new operation flag range to get/4 operation:

{range, Start::non_negative_integer(), Length::integer()}

Start is a zero-based offset into a value of the key and Length is the length of that part. (Will decide more details later.)

One of the major benefits of issue #17 (Redesign disk storage and checkpoint/scavenger processes) is that now value blob can be read without deserializing a log hunk (no need to apply binary_to_term/1 to it), so this feature can be easily implemented and will have no performance overhead.

private log - bad check for log file rotation

Found by code review.

There is a bad check for rotation of the private log.

Add new configuration brick_min_log_size_mb

The check for large blobs should be moved from server to client side.

Simple 2 line test:

OneHundredMB = crypto:rand_bytes(1024_1024_100).

brick_simple:set(tab1, <<"100MB">>, erlang:binary_part(OneHundredMB, {0,1024_1024_100})).
** exception exit: {{{badmatch,{hunk_too_big,104857824}},
[{brick_ets,log_mods2_b,3},
{brick_ets,handle_call,3},
{brick_server,handle_call_via_impl,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[{tab1_ch2_b1,'[email protected]'},
{do,{1302,620849,34417},
[{set,<<"100MB">>,1302620849034393,
<<232,202,76,214,196,53,55,31,26,15,92,36,
191,82,...>>,
0,[]}],
[]},
15000]}}
in function gen_server:call/3
in call from brick_server:do/5
in call from brick_simple:set/6

Apply server side timestamp and rename features to the dev branch

Review the prototype implementation of server side timestamp and server side rename features, and merge them to dev branch for Hibari v0.5.

For details of server side rename implementation, see: #2

hibari-doc: review API changes in hibari-app-developer-guide.en.txt and apply the same change to hibari-app-developer-guide.ja.txt.
Review the prototype implementation and test cases on norton_server_rename branches in the following projects. If they are ready, merge them to the dev branches.

gdss_brick
gdss_client
gdss_admin
gdss_ubf_proto
hibari_doc