Giter VIP home page Giter VIP logo

nuraft's Introduction

NuRaft

build codecov

Raft implementation derived from the cornerstone project, which is a very lightweight C++ implementation with minimum dependencies, originally written by Andy Chen.

New features that are not described in the original paper, but required for the real-world use cases in eBay, have been added. We believe those features are useful for others outside eBay as well.

Features

In the original cornerstone

  • Core Raft algorithm
    • Log replication & compaction
    • Leader election
    • Snapshot
    • Dynamic membership & configuration change
  • Group commit & pipelined write
  • User-defined log store & state machine support

New features added in this project

How to Build

1. Install cmake and openssl:

  • Ubuntu
$ sudo apt-get install cmake openssl libssl-dev libz-dev
  • OSX
$ brew install cmake
$ brew install openssl
  • Windows
    • Download and install CMake.
    • Currently, we do not support SSL for Windows.

2. Fetch Asio library:

Using git submodule
$ git submodule update --init
Other ways to fetch:
  • Linux & OSX: using the bash script
$ ./prepare.sh
  • Windows: doing it manually
    • Clone Asio asio-1-24-0 into the project directory.
C:\NuRaft> git clone https://github.com/chriskohlhoff/asio -b asio-1-24-0

3. Build static library, tests, and examples:

  • Linux & OSX
$ mkdir build
$ cd build
build$ cmake ../
build$ make

Run unit tests

build$ ./runtests.sh
  • Windows:
C:\NuRaft> mkdir build
C:\NuRaft> cd build
C:\NuRaft\build> cmake -G "NMake Makefiles" ..\
C:\NuRaft\build> nmake

You may need to run vcvars script first in your build directory. For example (it depends on how you installed MSVC):

C:\NuRaft\build> c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat

How to Use

Please refer to this document.

Example Implementation

Please refer to examples.

Benchmark

Please refer to tests/bench.

Quick Benchmark Results

Supported Platforms

  • Ubuntu (tested on 14.04 -- 20.04)
  • Centos (tested on 7)
  • OSX (tested on 10.13 -- 12.3)
  • Windows (built using MSVC 2019, not thoroughly tested)

Contributing to This Project

We welcome contributions. If you find any bugs, potential flaws and edge cases, improvements, new feature suggestions or discussions, please submit issues or pull requests.

Contact

License Information

Copyright 2017-present eBay Inc.

Author/Developer: Jung-Sang Ahn

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

3rd Party Code

  1. URL: https://github.com/datatechnology/cornerstone
    License: https://github.com/datatechnology/cornerstone/blob/master/LICENSE
    Originally licensed under the Apache 2.0 license.

  2. URL: https://github.com/stbrumme/crc32
    Original Copyright 2011-2016 Stephan Brumme
    See Original ZLib License: https://github.com/stbrumme/crc32/blob/master/LICENSE

  3. URL: https://github.com/greensky00/simple_logger
    License: https://github.com/greensky00/simple_logger/blob/master/LICENSE
    Originally licensed under the MIT license.

  4. URL: https://github.com/greensky00/testsuite
    License: https://github.com/greensky00/testsuite/blob/master/LICENSE
    Originally licensed under the MIT license.

  5. URL: https://github.com/greensky00/latency-collector
    License: https://github.com/greensky00/latency-collector/blob/master/LICENSE
    Originally licensed under the MIT license.

  6. URL: https://github.com/eriwen/lcov-to-cobertura-xml/blob/master/lcov_cobertura/lcov_cobertura.py
    License: https://github.com/eriwen/lcov-to-cobertura-xml/blob/master/LICENSE
    Copyright 2011-2012 Eric Wendelin
    Originally licensed under the Apache 2.0 license.

  7. URL: https://github.com/bilke/cmake-modules
    License: https://github.com/bilke/cmake-modules/blob/master/LICENSE_1_0.txt
    Copyright 2012-2017 Lars Bilke
    Originally licensed under the BSD license.

nuraft's People

Contributors

4kangjc avatar alesapin avatar antonio2368 avatar byronhe avatar cbucher avatar chenrui333 avatar dependabot[bot] avatar enmk avatar fletcherj1 avatar greensky00 avatar hkadayam avatar jackywoo avatar josiahwi avatar kexianda avatar leic-ss avatar lucasgonze avatar myrrc avatar nicelulu avatar ray-eldath avatar sheepgrass avatar shigui1989 avatar sldr avatar songenjie avatar szmyd avatar tcwzxx avatar tianjian600526 avatar woonhak avatar yfinkelstein avatar yong-li avatar zouyonghao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nuraft's Issues

Please help: cannot add server, error code -7

Same code deployed in 3 machines: {1,2,3}. Tries to add_srv in 1, and can add only one more machine (either 2 or 3), when adding one more, it gives error -7. Code is the same, why for example adding 2 is successful but unsuccessful when adding one more (3)?

Will nuraft do batching ?

Hi, @greensky00
Just want to confirm, when we followers receive more than 1 request from leader, or when leader has more than 1 request to send to followers, will it send one by one, or will it batch them into one, and then send?

An example use case - Multiple raft cluster with in the server cluster

I asked the same question in cornerstone github.
Just wanted to get your inputs, how can we implement a scenario
Can you please provide your inputs -
Consider I have 5 servers (S1,S2,S3,S4,S5) , and 7 tasks which are running on these servers. Each task needs 2 replicas.
For task T1 - I can have S1,S2,S3 where S1 is the leader
For task T2 - I can have S2,S1,S3 where S2 is the leader
and so on. Its possible one of the server goes down, then I should be able to add one of the existing server like say if S1 goes down, I should be able to use S4 instead.
Any guidelines or pointers will be really appreciated

Startup with fixed multinode configuration

Hi!

As far as I understood from examples and issues NuRaft cluster startup should look like this:

  1. Start raft_server on all nodes.
  2. Choose one node (S_l) to add other nodes according to whatever rules.
  3. If we are S_l call add_srv for all other nodes.
  4. If we are not S_l just wait until we will be added to the cluster by S_l.

But this algorithm has several complexities on the user-side. For example, what we should do if S_l crashed after it was chosen to add other nodes? Probably we should use some timeouts, and try to choose another node to be S_l'. Or what we should do if one of the follower nodes doesn't respond to add_srv. Probably start without it, but try to add it in the background.

It would be much simpler if there would be a way to start all nodes with some fixed configuration and all leader/follower communication will be on the RAFT's side. This approach is described in the original paper:

When servers start up, they begin as followers. A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats (AppendEntries RPCs that carry no log entries) to all followers in order to maintain their authority. If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.

Are there any fundamental reasons why it's not implemented?

Joint Consensus

Hi , Dose NuRaft support configure change by using joint consensus ? or hava a plan to support it in the future?

Difference between echo_server and bench

I made a rough diff between echo_server and bench.

  1. I notice that while doing init_raft in echo_server, asio_opt.thread_pool_size_ = 4. But in the bench, asio_opt.thread_pool_size_ = 32. Will the thread_pool_size casue a great difference to the performance?
  2. In bench program, there is asio_listener_ created, but in echo_server, there is not. So what is the use of asio_listener_? Why doesn't echo_server need it?
  3. How much is the difference between echo_server and bench regarding performance? I feel echo_server is easy to understand, and it seems that its logic is just "append the log + print the log". If I remove the "print the log", will its performance be comparable to the bench program? (Because I want to do some perf test with distributed clients, rather than coupling clients and leader into one single program. I feel echo_server is easier to tailor and understand than bench program, so I am wondering whether echo_server can also serve as a bench program)

Callback about commit result on follower

Is there any way to subscribe to commit events in the state_machine? As far as I see there is no callback for it: https://github.com/eBay/NuRaft/blob/master/include/libnuraft/callback.hxx. StateMachineExecution seems suitable, but I need to get the result of the commit (from the state_machine) in my callback.

Also, I'm not sure, is it safe to make such a callback directly from state_machine's commit method? According to https://github.com/ClickHouse-Extras/NuRaft/blob/master/src/handle_commit.cxx#L270-L272 seems like yes, but it would be much better if you'll confirm this.

Leader election priority test crashes

I've just built NuRaft on OS X 11.3. The election priority test fails as follows:

[ .... ] leader election priority test
   === TEST MESSAGE (BEGIN) ===

        time: 2021-04-28 01:07:50.099266
      thread: b79a
          in: leader_election_priority_test()
          at: /Users/Ben/dev/NuRaft/tests/unit/raft_server_test.cxx:943
    value of: s2.raftServer->is_leader()
    expected: true
      actual: false
[ FAIL ] leader election priority test (178.1 ms)
 [01:07:50.100 570] [tid b79a] [FATL] [logger.cc:634, flushAllLoggers()]
Abort
 [01:07:50.100 898] [tid b79a] [ERRO] [logger.cc:634, flushAllLoggers()]
 === Critical info (given by user): 0 bytes ===
 [01:07:50.100 970] [tid b79a] [ERRO] [logger.cc:634, flushAllLoggers()]
will not explore other threads (disabled by user)
 [01:07:51.270 971] [tid b79a] [ERRO] [logger.cc:634, flushAllLoggers()]

Thread b79a (crashed here)

#0  0x000000010f7f2c10 in SimpleLoggerMgr::logStackBacktrace(unsigned long) at logger.cc:390
#1  0x000000010f7f32b4 in SimpleLoggerMgr::handleSegAbort(int) at logger.cc:451
#2  0x00007fff2072ad7d in _sigtramp() at libsystem_platform.dylib
#3  0x0000000114416298 in 0x0() at ???
#4  0x00007fff2063a411 in abort() at libsystem_c.dylib
#5  0x000000010f7e7bd0 in TestSuite::reportTestResult(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) at test_common.h:1358
#6  0x000000010f7e1572 in main at raft_server_test.cxx:2389
#7  0x00007fff20700f3d in start() at libdyld.dylib
#8  0x0000000000000002 in 0x0() at ???

[ABORT] Flushed all logs safely.
./runtests.sh: line 9:   892 Abort trap: 6           ./tests/raft_server_test --abort-on-failure

how to implement log_store interface?

I use sqlite3 to store log entires,but I do not understand how to implement log_store interface,include issues:

  • how compact function work? inmem_log_store.h implement it by remove data from map, I implement:

private:
    static ptr<log_entry> make_clone(const ptr<log_entry> &entry);

    // Dummy entry for index 0.
    ptr<log_entry> dummy_entry_;
    db_entry_helper entry_helper; //ๆ“ไฝœๆ•ฐๆฎๅบ“
    std::atomic<ulong> start_idx_; //the start log id
    std::atomic<ulong> last_idx_;  //the last log id

bool indb_log_store::compact(ulong last_log_index) {
    info("compact log from [%u] to [%u], cur last_log_index:[%d]", start_idx_.load(), last_log_index, last_idx_.load());

    entry_helper.removeRange("id", start_idx_, last_log_index);

    // WARNING:
    //   Even though nothing has been erased,
    //   we should set `start_idx_` to new index.
    start_idx_ = last_log_index + 1;
    auto last = entry_helper.selectLast("id");
    if (!last->empty()) {
        last_idx_ = last->at(0).id_;
    } else {
        last_idx_ = last_log_index + 1;
    }
    info("start_idx:[%d], last_indx[%d]", start_idx_.load(), last_idx_.load());
}
  • The operation log is as follows
ๆ–นๆ—ฅๅฟ— 3ๅทๆ—ฅๅฟ— leader log
2019-11-12T18:51:13.792_873+08:00 [c313] [INFO] trying to sync snapshot with last index 15 to peer 1, its last log idx 9	[handle_snapshot_sync.cxx:111, create_sync_snapshot_req()]
2019-11-12T18:51:13.794_468+08:00 [7067] [WARN] peer 1 declined snapshot: p->get_next_log_idx(): 10, log_store_->next_slot(): 17	[handle_snapshot_sync.cxx:317, handle_install_snapshot_resp()]
2019-11-12T18:51:13.795_047+08:00 [a3ce] [WARN] declined append: peer 1, prev next log idx 12, resp next 12, new next log idx 11	[handle_append_entries.cxx:676, handle_append_entries_resp()]
2019-11-12T18:51:13.795_317+08:00 [a3ce] [ERRO] bad log_idx 10 for retrieving the term value, will ignore this log req	[raft_server.cxx:1063, term_for_log()]
2019-11-12T18:51:13.795_322+08:00 [a3ce] [ERRO] last snapshot 0x7ff0ec003fe0, log_idx 10, snapshot last_log_idx 15	[raft_server.cxx:1066, term_for_log()]
2019-11-12T18:51:13.795_325+08:00 [a3ce] [ERRO] log_store_->start_index() 11	[raft_server.cxx:1068, term_for_log()]
2019-11-12T18:51:13.795_412+08:00 [c313] [WARN] declined append: peer 1, prev next log idx 11, resp next 12, new next log idx 10



//ๅ‘้€ๆ–นๆ—ฅๅฟ— 1ๅทๆ—ฅๅฟ— fflower logs
2019-11-12T18:51:13.814_681+08:00 [42d1] [WARN] [LOG XX] req log idx: 11, req log term: 1, my last log idx: 11, my log (11) term: 0	[handle_append_entries.cxx:391, handle_append_entries()]
2019-11-12T18:51:13.814_683+08:00 [42d1] [WARN] deny, req term 1, my term 1, req log idx 11, my log idx 11	[handle_append_entries.cxx:398, handle_append_entries()]
2019-11-12T18:51:13.814_685+08:00 [42d1] [WARN] snp idx 15 term 1	[handle_append_entries.cxx:402, handle_append_entries()]
2019-11-12T18:51:13.814_845+08:00 [8b66] [ERRO] bad log_idx 10 for retrieving the term value, will ignore this log req	[raft_server.cxx:1063, term_for_log()]
2019-11-12T18:51:13.814_848+08:00 [8b66] [ERRO] last snapshot 0x7f213c0013c0, log_idx 10, snapshot last_log_idx 15	[raft_server.cxx:1066, term_for_log()]
2019-11-12T18:51:13.814_850+08:00 [8b66] [ERRO] log_store_->start_index() 16	[raft_server.cxx:1068, term_for_log()]
  • My trouble
  1. in compact function , how to update start_idx_ and last_idx_
  2. in compact function, Whether or not to remove log entries from db

Use unique_ptr for buffers

Another performance improvement is to use unique_ptr instead of shared_ptr for buffers, maybe worth to port as well, check this

Potential segfault on server statup

Hi! Recently, I've got the following error https://gist.github.com/alesapin/0dbbc2616a6b88d6a14467fd78dc922a on server startup. Actually, this specific error is not important, but the idea to start some threads from a partially initialized object doesn't look good:
https://github.com/eBay/NuRaft/blob/master/src/raft_server.cxx#L252-L257

So in my case background threads were fast enough to do something and call my callback which (in my case) checks that the current node is Leader. And all these things have happened before the constructor of raft_server was finished. So such behavior makes raft_server object very dangerous not only for user callbacks but also for its own code which executes in background threads. Maybe we can introduce some kind of startup() method which will start all background threads after the object already initialized?

Async append_entries handler sometimes not called

Hi, and first of all thank you for the very cool library. We've connected nuraft with rocksdb as the WAL/state machine backend. It seems to work well and is pretty fast, but we're occasionally running into some issues, which are likely our fault but we're having trouble debugging them.

After a few tens of thousands of calls of append_entries in async mode, there's an instance where the provided callback is never called. As far as we can tell the data is committed, but since we never receive a response we can't really make safe assumptions about the state of the system anymore.

We call append_entries like this:

void PeshkaCtrl::asyncAppendEntries(
    const std::shared_ptr<nuraft::buffer> &logs,
    cutils::handler_t<void(cutils::ResponseCode &rc)> handler) const
{
    LOGTRACE(m_Logger, "asyncAppendEntries start");

    m_RaftServer->append_entries({logs})->when_ready(
        [this, handler=std::move(handler)]
        (const nuraft::cmd_result<std::shared_ptr<nuraft::buffer>> &res,
         const std::shared_ptr<std::exception> &ex) mutable
        {
            if (res.get_accepted())
            {
                LOGTRACE(m_Logger, "asyncAppendEntries accepted");
                cutils::post(handler, cutils::ResponseCode());
            }
            else
            {
                std::string message = ex ? ex->what() : res.get_result_str();
                LOGTRACE(m_Logger, "asyncAppendEntries error ", message);
                cutils::post(handler, cutils::ResponseCode(enums::Response::ERROR_PERMANENT, message));
            }
        }
    );
}

I'm attaching some (trace-level) log messages of an example where this has happened: logs.zip

You will observe that the last call to asyncAppendEntries (shown above) never receives a matching "done" or "error" message. I'm not sure if the nuraft trace logs reveal where the problem is, I've stared at them for hours and I just can't see it.

It could be a bug in our state machine or WAL implementations, but as far as our unit tests go we think those are ok. We believe the last transaction is actually committed, as when we restart the leader and a new one is elected things appear to progress as expected.

Please let us know if you can spot any irregularities in those nuraft log messages. We're happy to provide any other information or code that may be useful. Your help is very much appreciated!

Edit: I should add that we build Nuraft against boost::asio, which we use quite extensively in our code base.
Edit: boost version 1.71, nuraft 1.2

The safety of dynamically adjustable Custom Quorum

Basically, it's easy to understand that the flexible quorum is safe if it's a static config.
But is the Leader Completeness guaranteed when Qc and Qe are dynamically adjustable?

For example, I have a 5 nodes cluster with Qc(3) and Qe(3) (the default algorithm) and then change the config to Qc(5) and Qe(1). How can the cluster always elect a valid leader as there are potential 2 nodes with incomplete raft logs but a leader can be established by its own vote?

move skip_initial_election_timeout_ option to raft_params

Currently, skip_initial_election_timeout_ option is in init_options struct which can only configure through raft_server constructor. It's not possible to configure it through raft_launcher. Better move skip_initial_election_timeout_ option to raft_params.

Please review my PR #142 together with some other changes

Strange situation after leader outage

I have a cluster (auto_forwarding_ enabled) with 3 nodes: node1 (priority 3), node2 (priority 2), node3 (priority 1). When I simulate network outage for node1, node2 and node3 select new leader quite fast and everything continues to work well. But very rarely something strange happens and they cannot make any progress for about 5-10 minutes.

Logs from node2 https://gist.github.com/alesapin/502b6abea98d54cf83eed3b87e7e1aa7 and from node3 https://gist.github.com/alesapin/f0abe77e6f323f9068138d5b26084af8. Seems like node3 asks node2 to append the same log entry (idx 152) again and again, but node2 successfully processes this entry and responds to node3. node3 receives the response but make the same request again?

Just trying to understand what is going on. Why it resolves without any help after 10 minutes? It happens very rarely, maybe it's related to auto_forwarding_? I'll try to reproduce without it.

Rollback of committed entries

Hi! I'm testing my ZooKeeper-like system based on NuRaft using Jepsen framework. Recently I found a very strange behavior when NuRaft decided to rollback already committed log entries:

2021.03.24 11:23:01.057713 [ 388 ] {} <Information> RaftInstance: rollback logs: 509 - 530, commit idx req 400, quick 530, sm 530, num log entries 1, current count 0                                                                          
2021.03.24 11:23:01.057726 [ 388 ] {} <Warning> RaftInstance: rollback quick commit index from 530 to 508                                                                                                                                      
2021.03.24 11:23:01.057740 [ 388 ] {} <Warning> RaftInstance: rollback sm commit index from 530 to 508                                                                                                                                  
2021.03.24 11:23:01.057751 [ 388 ] {} <Information> RaftInstance: rollback log 530                                                                                                                                                             
2021.03.24 11:23:01.057766 [ 388 ] {} <Information> RaftInstance: rollback log 529                                                                                                                                                             
2021.03.24 11:23:01.057780 [ 388 ] {} <Information> RaftInstance: rollback log 528                                                                                                                                                             
2021.03.24 11:23:01.057788 [ 388 ] {} <Information> RaftInstance: rollback log 527                                                                                                                                                             
2021.03.24 11:23:01.057794 [ 388 ] {} <Information> RaftInstance: rollback log 526                                                                                                                                                             
2021.03.24 11:23:01.057801 [ 388 ] {} <Information> RaftInstance: rollback log 525                                                                                                                                                             
2021.03.24 11:23:01.057807 [ 388 ] {} <Information> RaftInstance: rollback log 524                                                                                                                                                             
2021.03.24 11:23:01.057813 [ 388 ] {} <Information> RaftInstance: rollback log 523                                                                                                                                                             
2021.03.24 11:23:01.057828 [ 388 ] {} <Information> RaftInstance: rollback log 522
2021.03.24 11:23:01.057860 [ 388 ] {} <Information> RaftInstance: rollback log 521
2021.03.24 11:23:01.057867 [ 388 ] {} <Information> RaftInstance: rollback log 520
2021.03.24 11:23:01.057873 [ 388 ] {} <Information> RaftInstance: rollback log 519
2021.03.24 11:23:01.057879 [ 388 ] {} <Information> RaftInstance: rollback log 518
2021.03.24 11:23:01.057885 [ 388 ] {} <Information> RaftInstance: rollback log 517
2021.03.24 11:23:01.057891 [ 388 ] {} <Information> RaftInstance: rollback log 516
2021.03.24 11:23:01.057897 [ 388 ] {} <Information> RaftInstance: rollback log 515
2021.03.24 11:23:01.057903 [ 388 ] {} <Information> RaftInstance: rollback log 514
2021.03.24 11:23:01.057909 [ 388 ] {} <Information> RaftInstance: rollback log 513 
2021.03.24 11:23:01.057915 [ 388 ] {} <Information> RaftInstance: rollback log 512
2021.03.24 11:23:01.057920 [ 388 ] {} <Information> RaftInstance: rollback log 511
2021.03.24 11:23:01.057926 [ 388 ] {} <Information> RaftInstance: rollback log 510
2021.03.24 11:23:01.057931 [ 388 ] {} <Information> RaftInstance: rollback log 509
2021.03.24 11:23:01.057937 [ 388 ] {} <Information> RaftInstance: overwrite at 509
2021.03.24 11:23:01.209754 [ 388 ] {} <Information> RaftInstance: receive a config change from leader at 509
2021.03.24 11:23:01.209816 [ 388 ] {} <Debug> RaftInstance: [after OVWR] log_idx: 510, count: 1

As far as I understand this should never happen in RAFT and the comment clearly states about it https://github.com/ebay/NuRaft/blob/master/src/handle_append_entries.cxx#L631-L645. So isn't it a bug?

Also, I've found this interesting comment:
https://github.com/ebay/NuRaft/blob/master/src/handle_append_entries.cxx#L696-L717.

Full trace logs from all three nodes:

  1. Node which rolled back: https://gist.github.com/alesapin/5bcd85abd390f723cc30e0a63f0abd44
  2. Other node1: https://gist.github.com/alesapin/f6d8754f7a2f749e376c8a0522bf6c43
  3. Other node2: https://gist.github.com/alesapin/fa0e5af16152415f93942ed77bb7819a

Related ClickHouse/ClickHouse#21677

multiple_config_change_test and remove_node_test test cases occasionally failed

[ .... ] multiple config change test
   === TEST MESSAGE (BEGIN) ===

        time: 2020-10-22 16:36:46.954628
      thread: c858
          in: multiple_config_change_test()
          at: /home/development/Public/NuRaft/tests/unit/raft_server_test.cxx:788
    value of: configs_out.size()
    expected: 3
      actual: 2
[ FAIL ] multiple config change test (728.2 ms)
 [16:36:46.955 825] [tid c858] [FATL] [logger.cc:634, flushAllLoggers()]
Abort
 [16:36:46.956 037] [tid c858] [ERRO] [logger.cc:634, flushAllLoggers()]
 === Critical info (given by user): 0 bytes ===
 [16:36:46.956 167] [tid c858] [ERRO] [logger.cc:634, flushAllLoggers()]
will not explore other threads (disabled by user)
 [16:36:47.196 321] [tid c858] [ERRO] [logger.cc:634, flushAllLoggers()]
[ .... ] remove node test
   === TEST MESSAGE (BEGIN) ===

        time: 2020-10-22 16:48:42.305966
      thread: 1d15
          in: remove_node_test()
          at: /home/development/Public/NuRaft/tests/unit/raft_server_test.cxx:481
    value of: configs.size()
    expected: 2
      actual: 3
        info: id = 1
[ FAIL ] remove node test (522.2 ms)
 [16:48:42.306 822] [tid 1d15] [FATL] [logger.cc:634, flushAllLoggers()]
Abort
 [16:48:42.307 030] [tid 1d15] [ERRO] [logger.cc:634, flushAllLoggers()]
 === Critical info (given by user): 0 bytes ===
 [16:48:42.307 167] [tid 1d15] [ERRO] [logger.cc:634, flushAllLoggers()]
will not explore other threads (disabled by user)
 [16:48:42.535 961] [tid 1d15] [ERRO] [logger.cc:634, flushAllLoggers()]

Seems both cases related to remove_srv() and the operation will succeed or fail depending on timing.

Unnecessary rollback of `sm_commit_index_` on adding server

For remediating a node, we clone data and logs to new node and add it to the leader.

However, if there are uncommitted logs, the new node (to be added) first commits those logs and then joins the existing cluster.

In handle_join_cluster_req(), we reset sm_commit_index_ to initial_commit_index_ which causes the rollback of sm_commit_index_ and duplicate commits of the same log will happen. That may be ok for state machine which allows idempotent writes, but may not for the others. We should do it selectively.

is append_entries thread-safe or not?

I looked at raft_bench.cxx and notice that worker_func will call append_entries, however, the lock is only used to guard numops. When we spawned multiple threads, will these threads call append_entries concurrently? In that way, will append_entries be thread-safe (I guess so)?

Question about peer catch up action in raft_server::handle_append_entries_resp()

For the code segment https://github.com/eBay/NuRaft/blob/master/src/handle_append_entries.cxx#L856-L862 in raft_server::handle_append_entries_resp(), I have question on the else case (i.e. p->get_next_log_idx() <= resp.get_next_idx()). Why it is required to move one log backward (p->set_next_log_idx(p->get_next_log_idx() - 1)) instead of just doing same thing ( p->set_next_log_idx(resp.get_next_idx())) as the if case?

munmap_chunk(): invalid pointer, "nuraft_w_0" received signal SIGABRT, Aborted

Getting the following crash randomly on Ubuntu 18.04 . Any fix or mitigation ?

munmap_chunk(): invalid pointer

Thread 21 "nuraft_w_0" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff74ae8700 (LWP 21466)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Sun Mar 28 06:37:01 UTC 2021
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff5d4c921 in __GI_abort () at abort.c:79
#2  0x00007ffff5d95967 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff5ec2b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff5d9c9da in malloc_printerr (str=str@entry=0x7ffff5ec4720 "munmap_chunk(): invalid pointer") at malloc.c:5342
#4  0x00007ffff5da3fbc in munmap_chunk (p=0x7fff480018e0) at malloc.c:2846
#5  __GI___libc_free (mem=0x7fff480018f0) at malloc.c:3127
#6  0x00005555565fa90e in OPENSSL_free (orig_ptr=0x7fff480018f8) at external/boringssl/src/crypto/mem.c:154
#7  0x000055555657e70d in bio_free (bio=0x7fff0c0014e8) at external/boringssl/src/crypto/bio/pair.c:144
#8  0x000055555657c3a9 in BIO_free (bio=0x7fff0c0014e8) at external/boringssl/src/crypto/bio/bio.c:103
#9  0x00005555566c610c in asio::ssl::detail::engine::~engine (this=0x555558680e70, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/ssl/detail/impl/engine.ipp:66
#10 asio::ssl::detail::stream_core::~stream_core (this=0x555558680e70, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/ssl/detail/stream_core.hpp:54
#11 0x00005555566c621a in asio::ssl::stream<asio::basic_stream_socket<asio::ip::tcp>&>::~stream (this=0x555558680e68, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/ssl/strea$.hpp:120
#12 nuraft::asio_rpc_client::~asio_rpc_client (this=0x555558680e10, __in_chrg=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/src/asio_service.cxx:854
#13 0x0000555556714b3e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555558680e00) at /usr/include/c++/7/bits/shared_ptr_base.h:154
#14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684
#15 std::__shared_ptr<nuraft::rpc_client, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123
#16 std::__shared_ptr<nuraft::rpc_client, (__gnu_cxx::_Lock_policy)2>::operator= (__r=..., this=0x5555586adc50) at /usr/include/c++/7/bits/shared_ptr_base.h:1213
#17 std::shared_ptr<nuraft::rpc_client>::operator= (__r=..., this=0x5555586adc50) at /usr/include/c++/7/bits/shared_ptr.h:319
#18 nuraft::peer::recreate_rpc (this=0x5555586adc30, config=std::shared_ptr<nuraft::srv_config> (use count 3, weak count 0) = {...}, ctx=...) at /home/azureuser/barrel/thirdparty/NuRaft/src/peer.cxx:205
#19 0x000055555670eb85 in nuraft::raft_server::request_prevote (this=this@entry=0x55555872b490) at /home/azureuser/barrel/thirdparty/NuRaft/src/handle_vote.cxx:80
#20 0x000055555670a4db in nuraft::raft_server::handle_election_timeout (this=0x55555872b490) at /home/azureuser/barrel/thirdparty/NuRaft/src/handle_timeout.cxx:288
#21 0x00005555566c1034 in std::__invoke_impl<void, void (*&)(std::shared_ptr<nuraft::delayed_task>&, std::error_code), std::shared_ptr<nuraft::delayed_task>&, std::error_code const&> (__f=<optimized out>) at /usr/include/$++/7/bits/invoke.h:60
#22 std::__invoke<void (*&)(std::shared_ptr<nuraft::delayed_task>&, std::error_code), std::shared_ptr<nuraft::delayed_task>&, std::error_code const&> (__fn=@0x7fff74abcef0: 0x5555566b3990 <_timer_handler_(std::shared_ptr<$uraft::delayed_task>&, std::error_code)>) at /usr/include/c++/7/bits/invoke.h:95
#23 std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>::__call<void, std::error_code const&, 0ul, 1ul>(std::tuple<std::error_code con$t&>&&, std::_Index_tuple<0ul, 1ul>) (__args=..., this=0x7fff74abcef0) at /usr/include/c++/7/functional:467
#24 std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>::operator()<std::error_code const&, void>(std::error_code const&) (this=0x7fff$4abcef0) at /usr/include/c++/7/functional:551
#25 asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>::operator()() (this=0x7fff74abcef0) at
/home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/bind_handler.hpp:64
#26 asio::asio_handler_invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::deta$l::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, ...) (function=...) at /home/azureuser/barrel/third$arty/NuRaft/asio/asio/include/asio/handler_invoke_hook.hpp:68
#27 asio_handler_invoke_helpers::invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>, st
d::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std
::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std
::error_code)>&) (context=..., function=...) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#28 asio::detail::handler_work<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, asio::system_executor>::complete<asio::detail::bind
er1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nu
raft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nur
aft::delayed_task>&, std::error_code)>&) (this=<synthetic pointer>, handler=..., function=...) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/handler_work.hpp:81
#29 asio::detail::wait_handler<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)> >::do_complete(void*, asio::detail::scheduler_operat
ion*, std::error_code const&, unsigned long) (owner=0x55555860be90, base=0x7fff080133a0) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/wait_handler.hpp:71
#30 0x00005555566bed25 in asio::detail::scheduler_operation::complete (bytes_transferred=<optimized out>, ec=..., owner=0x55555860be90, this=<optimized out>) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/as
io/detail/scheduler_operation.hpp:39
#31 asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=..., this=0x55555860be90) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/impl/scheduler.ipp:400
#32 asio::detail::scheduler::run (this=0x55555860be90, ec=...) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/detail/impl/scheduler.ipp:153
#33 0x00005555566b3dc5 in asio::io_context::run (this=0x55555860bc20) at /home/azureuser/barrel/thirdparty/NuRaft/asio/asio/include/asio/impl/io_context.ipp:61
#34 nuraft::asio_service_impl::worker_entry (this=0x55555860bc20) at /home/azureuser/barrel/thirdparty/NuRaft/src/asio_service.cxx:1563
#35 0x00007ffff74e26df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#36 0x00007ffff7bbb6db in start_thread (arg=0x7fff74ae8700) at pthread_create.c:463
#37 0x00007ffff5e2d71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95`
(gdb) frame 7
#7  0x000055555657e70d in bio_free (bio=0x7fff0c0014e8) at external/boringssl/src/crypto/bio/pair.c:144
144     external/boringssl/src/crypto/bio/pair.c: No such file or directory.
(gdb) p *bio
$7 = {method = 0x555556f911e0 <methods_biop>, init = 0, shutdown = 1, flags = 0, retry_reason = 0, num = 0, references = 0, ptr = 0x7fff480018f8, next_bio = 0x0, num_read = 0, num_write = 0}
(gdb) frame 6
#6  0x00005555565fa90e in OPENSSL_free (orig_ptr=0x7fff480018f8) at external/boringssl/src/crypto/mem.c:154
154     external/boringssl/src/crypto/mem.c: No such file or directory.
(gdb) p orig_ptr
$8 = (void *) 0x7fff480018f8
(gdb) p (struct bio_bio_st *)orig_ptr
$9 = (struct bio_bio_st *) 0x7fff480018f8
(gdb) p *(struct bio_bio_st *)orig_ptr
$10 = {peer = 0x0, closed = 0, len = 0, offset = 0, size = 0, buf = 0x0, request = 0}
(gdb) info local
ptr = 0x7fff480018f0
size = 56
(gdb) p *(struct bio_bio_st *)0x7fff480018f0
$11 = {peer = 0x0, closed = 0, len = 0, offset = 0, size = 0, buf = 0x0, request = 0}

Question about auto_adjust_quorum_for_small_cluster_ option

Hi @greensky00

I am considering the possibility of using NuRaft for two-node cluster and see that there is a special option auto_adjust_quorum_for_small_cluster_ which would change both commit and election quorum sizes to 1 once a node is offline.

I tried this option but once the offline node becomes online again, if there is network partition issue between two nodes, both nodes will consider itself as leader.

May I know if eBay uses two-node cluster with this option and how would you handle the case for network partition? Thanks~

Auto adding followers to leader

Hi,

I see that in calculator example, after bringing up 3 nodes, from the node 1 we run add-server 2 and 3. This makes node 1 as the leader. In this case if the remote server is not up, add-server will fail. Once the cluster forms, even if nodes goes down and comes back, they will be added back to cluster automatically.

Is there any way to provide the list of remote node's details to each node and once they come up try to add them automatically until either it accepts the other node as leader or it becomes leader. Similar to the phase in the previous case, where all the nodes forms the cluster, but some node gores down and comes back and added back to cluster automatically.

Thanks

Wrong election timer config may cause never-ending leader election

Let's say we have 3 servers: S1, S2, and S3. Suppose that we don't have leader now. S1 and S2's priorities are higher than S3.

However, let's assume that we have wrong config; election timeout for S1 and S2 is much longer than S3. Below is what will happen in such case:

  1. After some timeouts, S3's target priority is properly lowered.
  2. S3 will initiate pre-vote.
  3. S1 and S2 will accept the pre-vote.
  4. S3 will initiate actual vote.
  5. Since S1 and S2 haven't encountered election timeout, they reject the vote due to their target priority.
  6. S1 and S2 update their term; so reset their election timer.

Problem: S1 and S2 will never reach the election timeout, so that never reduce their target priority.

Concurrent `save_logical_snp_obj` and `create_snapshot`.

Hi! Seems like it's possible that NuRaft will concurrently run both methods (gdb trace https://gist.github.com/alesapin/973313f4d89f76634b4b1dd7e653e8ff). My code was node ready for this and I've got a deadlock :)

Is it expected behavior? Of course, I'll fix my code, but maybe it would be simpler to manage this on the NuRaft side? Both create and save snapshot methods write data to disk. In my case, it's not a problem, but I can imagine cases when it possibly can be painful.

Leader expiration

There is an optimization in cornerstone that introduces leader expiration to prevent a node thinks itself as a leader forever if it's isolated, see this, do you want to port it?

Race condition between `create_snapshot` and `read_logical_snp_obj`

Hi!

In my state machine implementation, I'm always using the single latest snapshot. I've encountered the following case:

2021.05.05 21:35:46.899607 [ 1861 ] {} <Information> RaftInstance: creating a snapshot for index 1100
2021.05.05 21:35:46.899616 [ 1861 ] {} <Information> RaftInstance: create snapshot idx 1100 log_term 1
2021.05.05 21:35:46.899634 [ 1861 ] {} <Debug> KeeperStateMachine: Creating snapshot 1100
2021.05.05 21:35:46.899668 [ 1861 ] {} <Debug> KeeperStateMachine: In memory snapshot 1100 created, queueing task to flash to disk
2021.05.05 21:35:46.900040 [ 1861 ] {} <Information> RaftInstance: create snapshot idx 1100 log_term 1 done: 411 us elapsed
2021.05.05 21:35:46.922391 [ 1862 ] {} <Debug> RaftInstance: send snapshot peer 3, peer log idx: 465, my starting idx: 991, my log idx: 1104, last_snapshot_log_idx: 1000
2021.05.05 21:35:46.922406 [ 1862 ] {} <Debug> RaftInstance: previous sync_ctx exists 0x7f9eccdcf418, offset 1, snp idx 1000, user_ctx (nil)
2021.05.05 21:35:46.922417 [ 1862 ] {} <Debug> RaftInstance: peer: 3, obj_idx: 1, user_snp_ctx (nil)
2021.05.05 21:35:46.922428 [ 1862 ] {} <Debug> KeeperStateMachine: Reading snapshot 1000 obj_id 1
2021.05.05 21:35:46.947817 [ 1855 ] {} <Debug> KeeperStateMachine: Created persistent snapshot 1100 with path /home/robot-clickhouse/db/coordination/snapshots/snapshot_1100.bin
DB::Exception Required to read snapshot with last log index 1000, but our last log index is 1100, Stack trace (when copying this message, always include the lines below):

Thus, the leader server already finished creating 1100-idx snapshot, but was requested to read 1000-idx snapshot. What can we do in this case? Seems like the return code of read_logical_snp_obj is ignored in the library code.

Question: Reads from followers

What is the best-recommended way to execute read-only requests on the followers?

Seems likes it is unreliable to check get_committed_log_idx() == get_leader_committed_log_idx()? Is there any way to directly ask leader for it's commited_log_idx?

Auto forwarding cmd_result get_accepted bug

When using auto forwarding for appending entries, the accepted flag doesn't seem to get updated whereas when it doesn't use auto-forwarding (current node = leader) it updates the accepted flag. This leads to my code thinking there is an error because the accepted flag is false when everything is actually fine.

I think it just needs presult.accept() calling when checking if the original request is accepted to also update the result accept flag.

if (resp->get_accepted()) {
resp_ctx = resp->get_ctx();
}

changed to:

if (resp->get_accepted()) {
     resp_ctx = resp->get_ctx();
     presult->accept();
}

It could also be my understanding of auto-forwarding being wrong and that is expected behaviour?

Thanks,
James.

socket is already in use, race happened on connection to node1:44444

Hi, thank you for the great library!

I'm experimenting with NuRaft in the ClickHouse DBMS. I've implemented ZooKeeper-like prototype on top of NuRaft and now performing simple tests. In one of them we have three NuRaft servers and consistently block them from each other using iptables. After removing iptable rules server aborted in debug mode with the following error:

2021.01.28 13:51:58.114048 [ 55 ] {} <Fatal> RaftInstance: socket is already in use, race happened on connection to node1:44444
2021.01.28 13:51:58.115156 [ 121 ] {} <Fatal> BaseDaemon: ########################################
2021.01.28 13:51:58.115315 [ 121 ] {} <Fatal> BaseDaemon: (version 21.2.1.1, build id: 0EE2AC6B59408A6EC7E3DA283C8FE6471D28ADFA) (from thread 55) (no query) Received signal Aborted (6)
2021.01.28 13:51:58.115397 [ 121 ] {} <Fatal> BaseDaemon: 
2021.01.28 13:51:58.115495 [ 121 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f5aadad418b 0x7f5aadab3859 0x7f5aadab3729 0x7f5aadac4f36 0x1eb92f74 0x1eb8e029 0x1eb9b1db 0x1eba01d0 0x1eba0074 0x1eb9bfb1 0x1eb9ba3b 0x1eb9e388 0x1eb9e345 0x1eb9e322 0x1eb9e2e6 0x1eb9e2b0 0x1eb9e25d 0x1eb9e120 0x1eb9e0c9 0x1eb9ddc4 0x1eb9da0f 0x1eb7caf5 0x1eb9cc02 0x1eb7caf5 0x1eb7bf82 0x1eb7ba8e 0x1eb760ae 0x1eb72937
2021.01.28 13:51:58.115725 [ 121 ] {} <Fatal> BaseDaemon: 4. raise @ 0x4618b in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.115810 [ 121 ] {} <Fatal> BaseDaemon: 5. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.115870 [ 121 ] {} <Fatal> BaseDaemon: 6. ? @ 0x25729 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.115929 [ 121 ] {} <Fatal> BaseDaemon: 7. ? @ 0x36f36 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
2021.01.28 13:51:58.162884 [ 121 ] {} <Fatal> BaseDaemon: 8. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1110: nuraft::asio_rpc_client::set_busy_flag(bool) @ 0x1eb92f74 in /usr/bin/clickhouse
2021.01.28 13:51:58.208419 [ 121 ] {} <Fatal> BaseDaemon: 9. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1016: nuraft::asio_rpc_client::send(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&) @ 0x1eb8e029 in /usr/bin/clickhouse
2021.01.28 13:51:58.254000 [ 121 ] {} <Fatal> BaseDaemon: 10. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1162: nuraft::asio_rpc_client::connected(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>) @ 0x1eb9b1db in /usr/bin/clickhouse
2021.01.28 13:51:58.299755 [ 121 ] {} <Fatal> BaseDaemon: 11. /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/type_traits:3617: decltype(*(std::__1::forward<std::__1::shared_ptr<nuraft::asio_rpc_client>&>(fp0)).*fp(std::__1::forward<std::__1::shared_ptr<nuraft::req_msg>&>(fp1), std::__1::forward<std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&>(fp1), std::__1::forward<boost::system::error_code const&>(fp1), std::__1::forward<boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>(fp1))) std::__1::__invoke<void (nuraft::asio_rpc_client::*&)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client>&, std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&, void>(void (nuraft::asio_rpc_client::*&)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client>&, std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&) @ 0x1eba01d0 in /usr/bin/clickhouse
2021.01.28 13:51:58.344898 [ 121 ] {} <Fatal> BaseDaemon: 12. /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/functional:2857: std::__1::__bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>, __is_valid_bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&> >::value>::type std::__1::__apply_functor<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, 0ul, 1ul, 2ul, 3ul, 4ul, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&> >(void (nuraft::asio_rpc_client::*&)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >&, std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul, 4ul>, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>&&) @ 0x1eba0074 in /usr/bin/clickhouse
2021.01.28 13:51:58.390103 [ 121 ] {} <Fatal> BaseDaemon: 13. /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/functional:2890: std::__1::__bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>, __is_valid_bind_return<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::tuple<std::__1::shared_ptr<nuraft::asio_rpc_client>, std::__1::shared_ptr<nuraft::req_msg>, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>, std::__1::placeholders::__ph<1>, std::__1::placeholders::__ph<2> >, std::__1::tuple<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&> >::value>::type std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&>::operator()<boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&>(boost::system::error_code const&, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp> const&) @ 0x1eb9bfb1 in /usr/bin/clickhouse
2021.01.28 13:51:58.435141 [ 121 ] {} <Fatal> BaseDaemon: 14. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/impl/connect.hpp:565: boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >::operator()(boost::system::error_code, int) @ 0x1eb9ba3b in /usr/bin/clickhouse
2021.01.28 13:51:58.481436 [ 121 ] {} <Fatal> BaseDaemon: 15. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/bind_handler.hpp:66: boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>::operator()() @ 0x1eb9e388 in /usr/bin/clickhouse
2021.01.28 13:51:58.530046 [ 121 ] {} <Fatal> BaseDaemon: 16. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/handler_invoke_hook.hpp:70: void boost::asio::asio_handler_invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, ...) @ 0x1eb9e345 in /usr/bin/clickhouse
2021.01.28 13:51:58.575332 [ 121 ] {} <Fatal> BaseDaemon: 17. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_invoke_helpers.hpp:39: void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&>&) @ 0x1eb9e322 in /usr/bin/clickhouse
2021.01.28 13:51:58.620160 [ 121 ] {} <Fatal> BaseDaemon: 18. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/impl/connect.hpp:614: void boost::asio::detail::asio_handler_invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::executor, boost::asio::ip::tcp, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >*) @ 0x1eb9e2e6 in /usr/bin/clickhouse
2021.01.28 13:51:58.664685 [ 121 ] {} <Fatal> BaseDaemon: 19. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_invoke_helpers.hpp:39: void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> > >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >&) @ 0x1eb9e2b0 in /usr/bin/clickhouse
2021.01.28 13:51:58.709550 [ 121 ] {} <Fatal> BaseDaemon: 20. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/bind_handler.hpp:108: void boost::asio::detail::asio_handler_invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>*) @ 0x1eb9e25d in /usr/bin/clickhouse
2021.01.28 13:51:58.754456 [ 121 ] {} <Fatal> BaseDaemon: 21. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_invoke_helpers.hpp:39: void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&) @ 0x1eb9e120 in /usr/bin/clickhouse
2021.01.28 13:51:58.802236 [ 121 ] {} <Fatal> BaseDaemon: 22. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/io_object_executor.hpp:120: void boost::asio::detail::io_object_executor<boost::asio::executor>::dispatch<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>, std::__1::allocator<void> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&&, std::__1::allocator<void> const&) const @ 0x1eb9e0c9 in /usr/bin/clickhouse
2021.01.28 13:51:58.847350 [ 121 ] {} <Fatal> BaseDaemon: 23. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/handler_work.hpp:74: void boost::asio::detail::handler_work<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::asio::detail::io_object_executor<boost::asio::executor>, boost::asio::detail::io_object_executor<boost::asio::executor> >::complete<boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code> >(boost::asio::detail::binder1<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::system::error_code>&, boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >&) @ 0x1eb9ddc4 in /usr/bin/clickhouse
2021.01.28 13:51:58.893035 [ 121 ] {} <Fatal> BaseDaemon: 24. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/reactive_socket_connect_op.hpp:102: boost::asio::detail::reactive_socket_connect_op<boost::asio::detail::iterator_connect_op<boost::asio::ip::tcp, boost::asio::executor, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>, boost::asio::detail::default_connect_condition, std::__1::__bind<void (nuraft::asio_rpc_client::*)(std::__1::shared_ptr<nuraft::req_msg>&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)>&, std::__1::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>), std::__1::shared_ptr<nuraft::asio_rpc_client> const&, std::__1::shared_ptr<nuraft::req_msg> const&, std::__1::function<void (std::__1::shared_ptr<nuraft::resp_msg>&, std::__1::shared_ptr<nuraft::rpc_exception>&)> const&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&> >, boost::asio::detail::io_object_executor<boost::asio::executor> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) @ 0x1eb9da0f in /usr/bin/clickhouse
2021.01.28 13:51:58.936458 [ 121 ] {} <Fatal> BaseDaemon: 25. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/scheduler_operation.hpp:41: boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long) @ 0x1eb7caf5 in /usr/bin/clickhouse
2021.01.28 13:51:58.980078 [ 121 ] {} <Fatal> BaseDaemon: 26. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/impl/epoll_reactor.ipp:778: boost::asio::detail::epoll_reactor::descriptor_state::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) @ 0x1eb9cc02 in /usr/bin/clickhouse
2021.01.28 13:51:59.025735 [ 121 ] {} <Fatal> BaseDaemon: 27. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/scheduler_operation.hpp:41: boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long) @ 0x1eb7caf5 in /usr/bin/clickhouse
2021.01.28 13:51:59.072552 [ 121 ] {} <Fatal> BaseDaemon: 28. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/impl/scheduler.ipp:447: boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) @ 0x1eb7bf82 in /usr/bin/clickhouse
2021.01.28 13:51:59.116394 [ 121 ] {} <Fatal> BaseDaemon: 29. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/detail/impl/scheduler.ipp:200: boost::asio::detail::scheduler::run(boost::system::error_code&) @ 0x1eb7ba8e in /usr/bin/clickhouse
2021.01.28 13:51:59.160091 [ 121 ] {} <Fatal> BaseDaemon: 30. /home/alesap/code/cpp/ClickHouse/contrib/boost/boost/asio/impl/io_context.ipp:63: boost::asio::io_context::run() @ 0x1eb760ae in /usr/bin/clickhouse
2021.01.28 13:51:59.203132 [ 121 ] {} <Fatal> BaseDaemon: 31. /home/alesap/code/cpp/ClickHouse/contrib/NuRaft/src/asio_service.cxx:1557: nuraft::asio_service_impl::worker_entry() @ 0x1eb72937 in /usr/bin/clickhouse
2021.01.28 13:52:00.008916 [ 121 ] {} <Fatal> BaseDaemon: Calculated checksum of the binary: F8D26D56372D0BFB6D3F28EA91D86317. There is no information about the reference checksum.
2021.01.28 13:52:08.255602 [ 1 ] {} <Fatal> Application: Child process was terminated by signal 6.

Port 44444 passed to launcher.init. The error is not stable, reproduces rarely. Also, found some thread sanitizer alerts:
https://gist.github.com/alesapin/560f5efadbf0e622d725cb7cdca7b7c2 and https://gist.github.com/alesapin/7e68b299a678489a04590f975e08753e.

Is it normal if only one node is alive, the leader id becomes -1?

I have two nodes in the cluster {1,2}, the leader is 1. Now 2 is disconnected, and some error message is shown like this (on 1):
Error: raft_server.cxx:check_leadership_validity:856: 1 nodes (out of 2, 2 including learners) are not responding longer than 2500 ms, at least 2 nodes (including leader) should be alive to proceed commit
Error: raft_server.cxx:check_leadership_validity:858: will yield the leadership of this node
Is it by design that raft cannot work with only one node?

is NuRaft support dynamic find peer?

Let's say we have 2 servers: S1, S2
and S1 is up and it is the leader. S2 is not up now ( a new Server ).
add S2 to the system, this time S2 is not up.
when S2 up, how can S1 know S2 up or S2 know S1 up?

Thanks

How to make cluster config persistent?

Hello, how can I make the cluster config persistent? For example, if I have 3 servers instances online that are already part a cluster and all of them get offline at the same time. There is a way to make them reconnect to the same cluster? The first one to reconnect become master, and the others followers?
To achieve that, the only thing I need to do is implement the in_memory state_mgr.hxx file to save to the disk instead of use memory?

test benchmark result question

in the benchmark result file, it suggest that as the payload size increases, the replication throughput is increased as well. But I noticed that the unit becomes MB/s instead of previously ops/s. Can you explain how this value is calculated? thank you.

coredump when run calc_server

#0 0x0000000000418961 in SimpleLoggerMgr::logStackBacktrace(unsigned long) at /home/xmly/NuRaft/examples/logger.cc:390
#1 0x0000000000418f74 in SimpleLoggerMgr::handleSegAbort(int) at /home/xmly/NuRaft/examples/logger.cc:451
#2 0x00000000000366d0 in __restore_rt() at sigaction.c:?
#3 0x00007f381686164b in gsignal() at ??:0
#4 0x00007f3816863450 in abort() at ??:0
#5 0x00007f38173da055 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv() at ??:0
#6 0x000000000008fc46 in __cxxabiv1::__terminate(void ()()) at /usr/src/debug/gcc-7.1.1-20170622/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/libsupc++/../../../../libstdc++-v3/libsupc++/eh_terminate.cc:51
#7 0x000000000008fc91 in std::terminate() at ??:?
#8 0x000000000008fed4 in __cxa_throw() at ??:?
#9 0x000000000043b1ae in void asio::detail::throw_exceptionstd::system_error(std::system_error const&) at ??:?
#10 0x000000000043b482 in asio::detail::do_throw_error(std::error_code const&, char const
) at /home/xmly/NuRaft/asio/asio/include/asio/detail/impl/throw_error.ipp:49
#11 0x00000000004341c3 in asio::detail::throw_error(std::error_code const&, char const*) at /home/xmly/NuRaft/asio/asio/include/asio/detail/throw_error.hpp:41
#12 0x0000000000423aba in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_swap(std::__shared_count<(__gnu_cxx::_Lock_policy)2>&) at /usr/include/c++/7/bits/shared_ptr_base.h:710
#13 0x000000000040adc7 in calc_server::init_raft(std::shared_ptrnuraft::state_machine) at /home/xmly/NuRaft/examples/example_common.hxx:183
#14 0x0000000000409148 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() at /usr/include/c++/7/bits/shared_ptr_base.h:681
#15 0x00007f381684b4da in __libc_start_main() at ??:0
#16 0x000000000040977a in _start() at ??:?

Way to reject client request

Is there a way to reject a clients request? For example, they try to perform an operation that is invalid or wrong?

API for tracking cluster membership

I see that list command lists the configured servers in the cluster, but they are not necessarily up at that time.
How to list the servers that are only up?

Is there any API to get the details of down followers and track cluster membership?
How can I add a call-back function to be called when any follower goes down or comes up?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.