Starting Leader elections
- Every process periodically communicates with its coordinator to ensure the coordinator is alive.
- When the coordinator does not respond, the node starts a Leader Election
- Note that multiple nodes could detect this, and start the Leader Election
- It may also be initiated when a node joins the group
- Leader Election algorithms exchange multiple rounds of messages
- If there are no more failures – that is, the system state is stable,
Ending Leader Elections
- then the leaders are stable, and there is only one leader per connected group of nodes
- Under stable state, the system may partition, and each partition may have a unique leader.
- Once there is more instability in the system, leader election starts again.
Bully Algorithm applies under two situations
- Each node is able to compute the winner of a comparison using the same node invariant – Eg nodeid
- There are only server failures in the system, and no communication failures.
- These failures can be detected using Timeouts
- Each node in the cluster, when it detects that its leader is down does the following
- Send a broadcast <ELECT,Id> message to each of the nodes
- Each node, when it receives the message,
- Responds with an <OK,Id> message
- Starts the Leader Election step, if it has not already done so.
- The sender of the message in Step (2), when it receives a reply will know whether it is a leader
- If it has the lowest Id, then it broadcasts a <LEADER,Id> message to each of the cohorts
- If it does not have the lowest Id, then it withdraws from the election silently.
(If there are no more failures, the node with the lower-id wins the election)
An example of Bully Algorithm
The Invite Algorithm is an enhancement to the “Bully Algorithm” in the presence of communication failures. Because of this, communication nodes may get partitioned, and may have to rejoin.
The algorithm has four components
As part of this operation,
- Each node sends a CHECK message
- On receiving this message, every other node responds with <id,is_coord>
If the node is not a coordinator, and there are no other coordinators
- The node promotes itself to a coordinator, and executes INVITE(neighbors)
If the node is already a coordinator, and there is other coordinators discovered
- The node executes INVITE(coordinators)
Invite at non Coordinators
At the non coordinator nodes,
- The node verifies whether the current node can join the specified group
(The invariants of group leadership need to be obeyed – leader has a lower Id )
- If so, the node responds with an ACCEPT(id)
Invite at Coordinators
At the coordinator nodes,
- The coordinator verifies if it can join the existing nodes
- If so, the node sends a message to its group, that their group membership should change