Crisp Reading Notes on Latest Technology Trends and Basics

References

Overview

Starting Leader elections

  1. Every process periodically communicates with its coordinator to ensure the coordinator is alive.
  2. When the coordinator does not respond, the node starts a Leader Election
  3. Note that multiple nodes could detect this, and start the Leader Election
  4. It may also be initiated when a node joins the group
  5. Leader Election algorithms exchange multiple rounds of messages
  6. If there are no more failures – that is, the system state is stable,

Ending Leader Elections

  • then the leaders are stable, and there is only one leader per connected group of nodes
  1. Under stable state, the system may partition, and each partition may have a unique leader.
  2. Once there is more instability in the system, leader election starts again.

Bully Algorithm

Applicability

Bully Algorithm applies under two situations

  1. Each node is able to compute the winner of a comparison using the same node invariant – Eg nodeid
  2. There are only server failures in the system, and no communication failures.
  • These failures can be detected using Timeouts

The Algorithm

  1. Each node in the cluster, when it detects that its leader is down does the following
  2. Send a broadcast <ELECT,Id> message to each of the nodes
  3. Each node, when it receives the message,
    1.  Responds with an <OK,Id> message
    2. Starts the Leader Election step, if it has not already done so.
  4. The sender of the message in Step (2), when it receives a reply will know whether it is a leader
    1. If it has the lowest Id, then it broadcasts a <LEADER,Id> message to each of the cohorts
    2. If it does not have the lowest Id, then it withdraws from the election silently.

(If there are no more failures, the node with the lower-id wins the election)

An example of Bully Algorithm

Invite Algorithm

Overview

The Invite Algorithm is an enhancement to the “Bully Algorithm” in the presence of communication failures. Because of this, communication nodes may get partitioned, and may have to rejoin.

The Algorithm

The algorithm has four components

Find Coordinators

As part of this operation,

  1. Each node sends a CHECK message
  2. On receiving this message, every other node responds with <id,is_coord>

Trigger Election

If the node is not a coordinator, and there are no other coordinators

  1. The node promotes itself to a coordinator, and executes INVITE(neighbors)

Merge Groups

If the node is already a coordinator, and there is other coordinators discovered

  1. The node executes INVITE(coordinators)

Invite at non Coordinators

At the non coordinator nodes,

  1. The node verifies whether the current node can join the specified group

(The invariants of group leadership need to be obeyed – leader has a lower Id )

  1. If so, the node responds with an ACCEPT(id)

Invite at Coordinators

At the coordinator nodes,

  1. The coordinator verifies if it can join the existing nodes
  2. If so, the node sends a message to its group, that their group membership should change

Advertisements

Comments on: "Leader Election in Distributed Systems – First cut" (2)

  1. Chitta Haty said:

    I have not seen the very popular leader election paper (PASOX) from Microsoft. http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf ZooKeeper’s architecture is influenced by this paper.

    • Very good comment.
      I classified Paxos as a “Distributed Consensus Algorithm”, since it has far more applications.
      I am in the process of doing my writeup on Paxos. Stay tuned.

      Thanks for the feedback,
      – Ranjan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: