Comments (7)
Thanks for reporting, @yfei-z. I added some more coverage to view changes during the election, but this scenario might still be an issue. I'll run a few tests to confirm it and open a PR with a fix if necessary.
from jgroups-raft.
There is another situation, if the coordinator itself leave the group before it send the elected message out, then the cluster will have no leader for this term, and the new coordinator won't start the election process for next term.
from jgroups-raft.
I think view change during election process will cause a lot of problems. Another one is if 2/3 of nodes are electing, before coordinator set itself to be the leader, view has changed to 1/3, only coordinator is online, then the cluster is working without majority nodes. Although stopVotingThread
will wait at most 100ms before set the leader to null under lost
case, but I think it still possible.
from jgroups-raft.
How about letting the election thread handle view change events serially
from jgroups-raft.
I created some tests for all the examples you mentioned, and they all resulted in a liveness problem, no safety issues, even the one split in (1/3 and 2/3). I believe we don't need to go all in on handling the changes serially. I'll give some more context and possible solutions.
For the first, we can synchronize the voting thread stop/start. The start/stop uses the intrinsic lock. Maybe we'll need to add a synchronized block before sending the message and stopping to check the thread interrupt state. Another option is if the view result is leader_lost,
we can stop and start the voting thread.
The other two scenarios you mentioned happen because the ELECTION
does not consider who was the past coordinator. This issue causes the network partition or the coordinator leaving cases to result in a no_change.
In the network partition, you don't end with two working partitions. Instead, it would end with two leaderless partitions that don't take commands, even though one has a majority. On the other hand, ELECTION2
also calculates the previous coordinator. However, it has an additional pre-voting phase before the election.
I have the tests in place, so I'll see something simpler to solve it. And, also take the chance and see if I think of some problems.
from jgroups-raft.
I'll release a new version this week.
from jgroups-raft.
Related Issues (20)
- Remove AsyncCounter and SyncCounter interfaces HOT 1
- withOption method should create a new instance
- Add membership operations to RaftHandle
- Provide quorum reads for `ReplicatedStateMachine`
- `Client` and `ClientStub` issues during membership operations
- Liveness issue with ELECTION HOT 3
- can't remove leader by method of RaftHandle's removeServer HOT 9
- this.raftHandle.channel().disconnect() and reconnect trigger error "not found in retransmission table" HOT 7
- Questions HOT 2
- Ensure a single leader per term
- Configurable class loader for ReplicatedStateMachine block
- CounterTest.testIgnoreReturnValue test failure HOT 5
- Remove ant and use only maven HOT 2
- Restarting node after membership change
- Leader stepping down with membership change HOT 2
- Fix longest log check during election
- Liveness issue joining during election thread execution
- Concurrency issue leads to nodes voting twice in same term
- Election problem HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jgroups-raft.