baidu / braft Goto Github PK
View Code? Open in Web Editor NEWAn industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
License: Apache License 2.0
An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
License: Apache License 2.0
When I was loading massive snapshot through braft, I saw the server got killed. Only one node in the raft group.
W0924 05:53:46.894353 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:53:49.880662 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:53:59.887006 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:54:12.745045 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:54:30.896412 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:54:45.414910 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:54:54.228179 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:55:20.424473 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:55:30.647188 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!
W0924 05:56:18.754196 8056 external/com_github_pdu_brpc/src/brpc/server.cpp:325] UpdateDerivedVars is too busy!
W0924 05:56:19.068273 8049 external/com_github_pdu_brpc/src/brpc/global.cpp:207] GlobalUpdate is too busy!
Killed
Is it possible to add peers after the cluster starts? The documentation does not mention that.
const int64_t prev = _value.fetch_add(detal_value, butil::memory_order_relaxed); //detal_value=7
请问下 butil::memory_order_relaxed 这个代表的是日志里面的实时数据吗,
这样累加后得到的怎么会是前一次的数据?
192.168.109.128:8100:0 value=0 latency=1922 //第一次累计 之后应该是 7
192.168.109.128:8100:0 value=7 latency=4504 //第二次累计 之后应该是 14
warning: core file may not match specified executable file.
[New LWP 12604]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./demo_server.bin'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f1d2ae23c16 in pthread_mutex_lock_impl (mutex=0x11dfdc0) at src/bthread/mutex.cpp:545
545 return sys_pthread_mutex_lock(mutex);
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 gflags-2.1.1-6.el7.x86_64 glibc-2.17-196.el7_4.2.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-12.el7_2.x86_64 leveldb-1.12.0-11.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 libuuid-2.23.2-26.el7_2.2.x86_64 openssl-libs-1.0.1e-51.el7_2.7.x86_64 pcre-8.32-15.el7_2.1.x86_64 protobuf-2.5.0-8.el7.x86_64 protobuf-compiler-2.5.0-8.el7.x86_64 snappy-1.1.0-3.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) where
#0 0x00007f1d2ae23c16 in pthread_mutex_lock_impl (mutex=0x11dfdc0) at src/bthread/mutex.cpp:545
#1 pthread_mutex_lock (__mutex=0x11dfdc0) at src/bthread/mutex.cpp:803
看doc里边提到了节点分级功能
https://github.com/brpc/braft/blob/master/docs/cn/raft_protocol.md
在conf和代码中没找到相应的设置,这个功能是否有开源的计划?
声明一下cmake 不熟悉,gflags,protobuf,leveldb 使用源码进行编译,而不是使用rpm安装的方式,编译时貌似缺少链接protobuf 的目录及库。
注释似乎不够清晰
// Bootstrap a non-empty raft node,
int bootstrap(const BootstrapOptions& options);
W0607 19:49:05.888779 32357 node.cpp:1276] node 0e3382b2-7c31-45df-b50c-98cebdb90986:172.24.156.86:8011:0 request PreVote from 172.24.156.86:8012:0 error: [E22][0.0.0.0:8012][E22]Invalid argument
当前发送数据给 peer 端的时候都是 Batch 一批一批的发送的,为啥要加上一个 WaitMeta 的概念,直接调用 peer replicator 一批一批的发送不是更直观吗 ?还是设计上有其他考量吗 ?
请帮忙看看是什么情况引起这种错误,需要怎样处理。
I0621 11:22:57.423517 21025 /home/zhangxf/github/braft/src/braft/node.cpp:846] node licenserpc:192.168.109.128:8100:0 shutdown, current_term 3 state LEADER
I0621 11:22:57.423598 21025 /home/zhangxf/github/braft/src/braft/replicator.cpp:1152] Fail to find the next candidate
I0621 11:22:57.423629 21025 src/brpc/server.cpp:1033] Server[braft::RaftStatImpl+indigo::LicenseServiceImpl+braft::FileServiceImpl+braft::RaftServiceImpl+braft::CliServiceImpl] is going to quit
W0621 11:22:57.423711 21036 src/brpc/policy/baidu_rpc_protocol.cpp:257] Fail to write into fd=11 [email protected]:33055@8100: Got EOF
W0621 11:22:57.423711 21034 src/brpc/policy/baidu_rpc_protocol.cpp:257] Fail to write into fd=12 [email protected]:33060@8100: Got EOF
W0621 11:22:57.424027 21036 src/brpc/policy/baidu_rpc_protocol.cpp:257] Fail to write into fd=13 [email protected]:33062@8100: Got EOF
W0621 11:22:57.424027 21034 src/brpc/policy/baidu_rpc_protocol.cpp:257] Fail to write into fd=14 [email protected]:33065@8100: Got EOF
braft服务端口和brpc服务端口使用同一个,在大量的brpc请求是否对braft选举等操作有影响,是否会导致raft节点之间数据传输过慢?如果不使用同一端口能否避免?
when i run the example counter,i got some problem like "". i think it is caused by undefine conf. so how do i define the conf ? please tell me ,thank you.
DEFINE_string(conf, "", "Initial configuration of the replication group");
the error :
node Counter:127.0.0.1:8100:0 term 1 start pre_vote
W0320 16:40:10.782916 13281 /home/robin/github/braft/src/braft/node.cpp:1305] node Counter:127.0.0.1:8100:0 can't do pre_vote as it is not in 192.168.109.128:8100:0,192.168.109.128:8101:0,192.168.109.128:8102:0
测试的对这个场景有疑问。
1,先有一个节点,自己是leader,写入了很多数据
2,加入一个节点,leader对这个节点同步数据
3,同时还在向leader写入数据
如果场景是对读只要弱一致,什么时候才能知道新节点已经同步了所有的历史数据,对外提供服务呢?
启动3个节点,选主(128是主节点)正常,后续停128,剩下两个节点(129,130),重新启动128后,其中一个节点出现如下错误信息:
I0613 14:25:00.753657 9132 /home/zhangxf/github/braft/src/braft/replicator.cpp:623] node licenserpc:192.168.109.130:8100:0 send InstallSnapshotRequest to 192.168.109.128:8100:0 term 10 last_included_term 10 last_included_index 3 uri remote://192.168.109.130:8100/-19242672944717
I0613 14:25:00.753845 9132 /home/zhangxf/github/braft/src/braft/replicator.cpp:668] received InstallSnapshotResponse from licenserpc:192.168.109.128:8100:0 last_included_index 3 last_included_term 10 error: [E112]Not connected to 192.168.109.128:8100 yet
W0607 19:48:16.614872 31763 node.cpp:1119] node 0e3382b2-7c31-45df-b50c-98cebdb90986:172.24.156.86:8012:0 got error={type=StateMachineError, error_code=10002, error_text=`StateMachine meet critical error when applying one or more tasks since index=6417014, none'}
remove_group has not implement???
Fail to define ESTOP(-20) which is already defined as `未知的错误 -20', abort.
帮忙看看需要怎么处理,从哪个方面去检查这个错误
因为开发环境已经升级protobuf3了,但是编译brpc+braft需要protobuf2,折腾起来很麻烦。
如果用bazel的话,可以指定源码的版本,比较方便。
在两天机器上运行 counter server, 报错了
第一台虚拟机 ip 192.168.109.129 报错信息 :
node Counter:127.0.0.1:8100:0 term 1 start pre_vote
W0321 15:14:56.005779 13732 /home/zhang/github/braft/src/braft/node.cpp:1305] node Counter:127.0.0.1:8100:0 can't do pre_vote as it is not in 192.168.109.128:8100:0
第二台虚拟机 ip 192.168.109.128 报错信息 :
node Counter:127.0.0.1:8100:0 term 1 start pre_vote
W0321 15:15:02.566098 12401 /home/zhang/github/braft/src/braft/node.cpp:1305] node Counter:127.0.0.1:8100:0 can't do pre_vote as it is not in 192.168.109.129:8100:0
run_server.sh :
DEFINE_string crash_on_fatal 'true' 'Crash on fatal log'
DEFINE_integer bthread_concurrency '18' 'Number of worker pthreads'
DEFINE_string sync 'true' 'fsync each time'
DEFINE_string valgrind 'false' 'Run in valgrind'
DEFINE_integer max_segment_size '8388608' 'Max segment size'
DEFINE_integer server_num '1' 'Number of servers'
DEFINE_boolean clean 1 'Remove old "runtime" dir before running'
DEFINE_integer port 8100 "Port of the first server"
#parse the command-line
FLAGS "$@" || exit 1
eval set -- "${FLAGS_ARGV}"
#The alias for printing to stderr
alias error=">&2 echo counter: "
#hostname prefers ipv6
IP=hostname -i | awk '{print $NF}'
#IP=127.0.0.1
if [ "$FLAGS_valgrind" == "true" ] && [ $(which valgrind) ] ; then
VALGRIND="valgrind --tool=memcheck --leak-check=full"
fi
IP2=192.168.109.128
raft_peers=""
for ((i=0; i<$FLAGS_server_num; ++i)); do
raft_peers="${raft_peers}${IP2}:$((${FLAGS_port}+i)):0,"
done
Hello
Is there any particular reason for using protobuf instead of flatbuffer or capnproto
If I want to switch to flatbuffer or capnproto , how do you suggest as going on updating the code (ie I want to do step by step , so I can test at each step), any suggestions are welcome
1.atomic_test后面多了一个空格,运行的时候应该会报错
c/exec "./atomic_test "
2.atomic_set! get! cas!这些函数都是在db node上执行实际binary的。这样在做网络分区测试的时候,client本身也会受影响。放在控制节点是否更合理?
比如在k8s里面用braft,stateful应用只能保证hostname不变,但是IP每次重启就变了。
In my usage scenario, I used the braft synchronous multi-node file database.
i want to record the last applied index instead of taking a snapshot. when a node
is started, it stats with the last applied index, but I don't know how to start
a braft node with the last applied index and how to get the last applied index.
i look for sdk api about how to get the last applied index and how to
a braft node start with the last applied index, but i can't find it.
当前的Replicator::_send_entries是在发送完成后的回调中再次发送下一个message,这样会让所有消息的发送全部是串行的,是否考虑支持pipeline的处理?还是内部实际做个这样的优化,发现效果不好才没有进行处理?
在当前的 jepsen 的测试中 atomic get 当 read nil
的时候返回的是 0, 而 jepsen 测试使用的 model :model (model/cas-register)
,这个 model read nil
返回的是 nil
(defn cas-register
"A compare-and-set register"
([] (cas-register nil))
([value] (CASRegister. value)))
会造成 jepsen 测试中,如果第一个是 read,那么返回 0
,将会和 model 输出不一致,model 输出是 nil
Hi,搜索全部代码,发现 StepdownTimer,在 become_leader()
时 start 之后,就没有在处理过了,感觉 StepdownTimer 貌似没有进行处理,在 StepdownTimer run() 函数加日志发现,节点成 Leader 之后,一直触发 StepdownTimer timeout,目前这块逻辑没有完善还是 ?
下面给的打印日志,
I0712 19:56:23.524367 52422 /root/data/braft-master/example/block/server.cpp:398] Node becomes leader
E0712 19:56:28.523725 52432 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
E0712 19:56:33.525770 52430 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
E0712 19:56:38.525977 52432 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
E0712 19:56:43.526221 52419 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
I0712 19:56:48.370575 52428 /root/data/braft-master/example/block/server.cpp:337] Saving snapshot to ./data/snapshot/temp/data
I0712 19:56:48.375188 52428 /root/data/braft-master/src/braft/snapshot.cpp:589] Deleting ./data/snapshot/snapshot_00000000000000000001
I0712 19:56:48.375222 52428 /root/data/braft-master/src/braft/snapshot.cpp:595] Renaming ./data/snapshot/temp to ./data/snapshot/snapshot_00000000000000000001
I0712 19:56:48.375242 52428 /root/data/braft-master/src/braft/snapshot_executor.cpp:210] snapshot_save_done, last_included_index=1 last_included_term=2
E0712 19:56:48.526525 52419 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
E0712 19:56:53.526775 52419 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
E0712 19:56:59.222747 52428 /root/data/braft-master/src/braft/node.cpp:2475] StepdownTimer timeout 5000
修改代码:
void StepdownTimer::run() {
+ LOG(ERROR) << "StepdownTimer timeout" << " " << _timeout_ms;
_node->handle_stepdown_timeout();
}
_timeout_ms 改成了 public 成员变量
参考设计文档描述
Leader 在 ElectionTimeout 内没有写多数成功,通过 logic clock 检查实现(1个ElectionTimeout内会有10个HeartBeat)
StepdownTimer 应该是长时间写没成功,判断 peer 是不是还是活的,或者自己是不是被隔离,然后决定是否 Stepdown,但是在代码中并没有看到这部分逻辑 ?
在 SegmentLogStorage 中,log entry 的 index 实际上可以通过 log 命的后缀推到出来,那为什么还需要特意的保存 first_log_index
到 log_meta
文件中,感觉没有必要,还是有特殊设计考量 ?
#define BRAFT_SEGMENT_OPEN_PATTERN "log_inprogress_%020ld"
#define BRAFT_SEGMENT_CLOSED_PATTERN "log_%020ld_%020ld"
关于counter的example的实现中,snapshot创建之后,是braft自动去做了log compaction吗?
如果是的话,如何保证的snapshot的时间戳和log里面的一致的?因为on_apply里面可能会有CPU heavy的操作导致application status和log差好几分钟的数据。
另外,是否只有leader创建snapshot就可以了?
https://github.com/brpc/braft/blob/master/example/counter/server.cpp#L242
如果强制逻辑走到这一行,会crash。
如果在on_snapshot_save最后sleep(3)就不会,怀疑这个传进来的::braft::Closure* done是不是函数退出就被delete了?
另外,调用snapshot(::braft::Closure* done)函数去主动做snapshot,这里面的done和on_snapshot_save里面的done不是一个。是否可以在这里用同一个Closure?那样可以让调用snapshot的地方知道最终snapshot的执行状态。
The process to reproduce the issue:
It's quite easy to cause this case if you deploy the raft groups in k8s statefulset and also using AWS spot instances because of the network isolation.
The wanted behavior is:
I've hacked a quick fix which can pass the above use case, but I am not sure if it will cause other side effects. Please help to check about the code here at https://github.com/pdu/braft/blob/master/src/braft/node.cpp#L482
`zhou@ubuntu:~/brpc/braft/build$ cmake .. && make
CMake Error at CMakeLists.txt:58 (message):
Fail to find brpc
-- Configuring incomplete, errors occurred!
See also "/home/zhou/brpc/braft/build/CMakeFiles/CMakeOutput.log".
See also "/home/zhou/brpc/braft/build/CMakeFiles/CMakeError.log".
zhou@ubuntu:~/brpc/braft/build$
`
你们在 https://github.com/brpc/braft/blob/master/docs/cn/benchmark.md 中提到通过“尽可能的降低锁临界区大小, 关键路径上采用lock-free/wait-free算法.”来提升性能。但是NodeImpl的实现中,几乎每一个主要的raft协议过程的开头都要加锁**_mutex**。那么这意味着,两个客户端请求无法并行执行,而是通过抢锁来串行化。我看到你们有node_manager,用来管理多个raft node。应该是想用多个raft node的来达到并行化,但是没有看到相关的例子。那么,一个server(core number = N)节点部署多少个raft node呢?N/4?N/2?还是N, 2N呢?在raft node较多时,是否会面临线程频繁调度的问题?
我拉下代码,brpc没有安装到公有目录,但是在braft的cmake里指定了brpc的目录,编译通过了,./run_client.sh 和 ./run_server.sh 都报错了 ERROR: unknown command line flag 'crash_on_fatal_log'
brpc在Mac上已经可以编译通过,braft也需要在Mac上编译支持
braft 带的 3 个 example 默认的 snapshot_interval 都是 30s,相对其他的 raft 库,snapshot 比较频繁。
snapshot 过程中,raft 的性能明显会降低,braft 在生产环境里,snapshot interval 配置多少?也是 30s 这么频繁吗?
请教下 braft 是怎么考虑的。
如题,还是有什么机制可以自己修复.
明明功能完全都不一样,而且一个apply要掉用另一个,为什么要弄成一样的名字。。。人类为什么要互相伤害。。。
测试步骤如下:
1,先启动一台节点,并设置为leader
2,插入1000条数据,每条数据大小26KB,看日志有3到4个log segment产生
3,做snapshot
4,再插入1000条数据,每条数据大小为26KB,看日志有3到4个log segment产生
5,新启动一个节点,加入raft group成为leader
6,leader同步数据到follower,用的全是这些log segemnt,没有用snapshot
有2个问题:
1,什么时候leader会用snapshot去同步数据到follower呢?感觉上述测试里面,更倾向是snapshot + log segment
2,做了snapshot之后,为什么没有删除旧的log segment?删除的机制是怎么样的?
当前的braft的实现中,冷启动时会从没有快照的日志数据全部重新回放,而这样的处理在一些场景下其实并不是十分科学,比如用braft结合rocksdb实现一个简单的nosql,如果这些日志数据已经被执行,其实就没有回放的必要了,这个处理的过程中,是否可以在SM类中添加一个get_checkpoint的虚函数,当node启动加载时,更加获得的checkpoint的值来和快照的执行点两者一起来确认日志回放的起始点,这样就可以缩短冷启动的时间
BRAFT_VLOG 这种日志怎么打印出来?要修改CMakeLists.txt吗
复现步骤:
原因:
https://github.com/brpc/braft/blob/master/src/braft/configuration.h#L61
int parse(const std::string& str) {
reset();
char ip_str[64];
if (2 > sscanf(str.c_str(), "%[^:]%*[:]%d%*[:]%d", ip_str, &addr.port, &idx)) {
reset();
return -1;
}
if (0 != butil::str2ip(ip_str, &addr.ip)) {
reset();
return -1;
}
return 0;
}
sscanf对":0:0"的case处理有问题,unittest里面也缺少相应的case:https://github.com/brpc/braft/blob/master/test/test_configuration.cpp#L30
BTW:需要发PR修复吗?如果需要,流程是什么?
问题描述:
比如有3个节点,1个Leader(a),2个Follower(b,c),同一条log,a和b都可以正常执行,c执行时出错(比如磁盘满了或者其它原因),那么该如何处理呢,是否需要把该节点从group中移除?我看例子里是调用set_error_and_rollback这个函数,但是好像并不能解决问题。我看一些raft的讲解中,也没有对这部分进行描述。
1.环境:
ubuntu版本: 14.04
gcc/g++: 4.9.4
Make: 3.81
libprotoc: 3.6.0
2.问题描述
按官网说明执行以下步骤:
1.编译brpc: 出现过问题,处理后能编译成功.见下面的附文。
2.编译braft:正常
3.编译braft示例example/counter:编译counter报以下错误:
$ cd counter && cmake . && make
-- Configuring done
-- Generating done
-- Build files have been written to: /home/liujg/dev/lib/braft-master/example/counter
[ 16%] Running C++ protocol buffer compiler on counter.proto
Scanning dependencies of target counter_client
[ 33%] Building CXX object CMakeFiles/counter_client.dir/client.cpp.o
[ 50%] Building CXX object CMakeFiles/counter_client.dir/counter.pb.cc.o
Linking CXX executable counter_client
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../lib/libbrpc.a(span.o): In function brpc::SpanDB::Open()': /home/liujg/dev/lib/brpc/src/brpc/span.cpp:470: undefined reference to
leveldb::Options::Options()'
/home/liujg/dev/lib/brpc/src/brpc/span.cpp:486: undefined reference to `leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)'
请问如何解决?
附:关于编译brpc过程的情况:
(1)由于已经安装了protobuf 3.6.0,而libprotobuf-dev是2.5.0的,所以在编译brpc执行
$sudo apt-get install libgflags-dev libprotobuf-dev libprotoc-dev protobuf-compiler libleveldb-dev
后卸除了libprotobuf-dev,libprotoc-dev,protobuf-compiler.
(2)编译src/idl_options.pb.cc报错.
$make
Compiling src/brpc/policy/weighted_round_robin_load_balancer.o
Packing libbrpc.a
Linking protoc-gen-mcpack
In file included from /usr/include/c++/4.9/mutex:35:0,
from /usr/include/google/protobuf/stubs/mutex.h:33,
from /usr/include/google/protobuf/stubs/common.h:52,
from src/idl_options.pb.h:9,
from src/idl_options.pb.cc:4:
/usr/include/c++/4.9/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
处理:修改Makefile文件,增加-std=c++11
protoc-gen-mcpack: src/idl_options.pb.cc src/mcpack2pb/generator.o libbrpc.a
@echo "Linking $@"
ifeq ($(SYSTEM),Linux)
@$(CXX) -o
再执行make成功.
当前的快照是以时间为自动触发的(也可以在外部主动触发),并且在进行快照时是使用applied_index作为快照的结束index,而在我们的raft中实际只要两个节点(总共3副本情况)同步完成就会将applied_index设置为对应的值,这样在实际情况中如果出现其中有一个节点数据同步比较慢,和其他两节点有一定差距,这样会leader在后续的同步中就会需要install_snapshot.
上述的实现存在一个问题,如果落后的节点稍微落后其他两个节点也有可能操作snapshot的安装,因为快照后会进行truncate_prefix这样前面的index就会变得无法获取,从而重新发送全部快照,而这个成本是非常搞的,且可能在实际情况中频繁出现。
这里在进行快照时,是否可以已保留多少条log作为快照的依据,这样就能让用户更具应用场景来调节日志的大小和快照频率及解除延时导致的频繁获取日志的问题
when a raft node is crashed, select leader is empty, why no fllower node become the leader?
静默模式
RAFT的Leader向Follower的心跳间隔一般都较小,在100ms粒度,当复制实例数较多的时候,心跳包的数量就呈指数增长。通常复制组不需要频繁的切换Leader,我们可以将主动Leader Election的功能关闭,这样就不需要维护Leader Lease的心跳了。复制组依靠业务Master进行被动触发Leader Election,这个可以只在Leader节点宕机时触发,整体的心跳数就从复制实例数降为节点数。社区还有一种解决方法是MultiRAFT,将复制组之间的心跳合并到节点之间的心跳。
这里的主动Leader Election的功能关闭是指使用reset_election_timeout_ms将超时时间变长吗?如果是会有什么负面影响?
Master进行被动触发Leader Election这里是指重新设置选主超时时间开始选主操作吗?
Leader节点宕机触发,是符合感知Leader节点宕机的呢?通过业务Master吗?这里的业务Master是需要在每个节点上都有有一个leader吗?如果是业务Master的leader出现漂移怎么处理?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.