Crisp Reading Notes on Latest Technology Trends and Basics

Overview

What is Zookeeper

Zookeeper is a “Name Server” in the Hadoop suite of products, with the following characteristics.

  1. Names form a hierarchical name-space
  2. It has functionality create, read, update and delete names
  3. It has functionality to send updates to registered listeners on different machines in the same order in which it received them.

The last two features enable it to be used for co-ordination and synchronization.

References

Simple Use Case

External Monitoring Service

Suppose we wanted to use an external monitoring service like Munin

  1. The external program will register a “Zookeeper Watch” to be informed whenever there is a change in the tree location.
  2. The existing services, such as apache may register a node.

For example, the node below represents an Apache server at www32.mydomain.com at port 80.

/services/www/www32.mydomain.com/80
  1. The munin system can periodically get a dump of all services under /services/www .  and load them into its special file – munin.conf
  2. Once this is done, the particular WebServer can be monitored by the Monitoring System.

Why not use a Database

Zookeeper is a superior interface to the database, because of the guarantees made

  1. The watch is ordered with respect to other events, other watches, and other asynchronous replies. The events are all propagated by the client library in the right order.
  2. The client will see the node creation event before it sees the value for the node
  3. The order in which events are seen by the client is the same as the order in which these are being seen by the Zookeeper service.

Usage in Hadoop

Managing Configuration Changes

  • When there are hundreds and thousands of nodes in a cluster, it becomes difficult to push configuration changes to the machines.
  • Zookeeper enables the configuration changes to be pushed.

Implementing Reliable Messaging

  • With Zookeeper, we can implement reliable producer-consumer queues
  • even if a few consumers and some Zookeeper servers fail.

Implement Redundant Services

  • Several identical nodes may provide a service.
  • One of these may elect itself as the leader (using a leader election algorithm), and may start providing the service.

Synchronize Process Execution

  • Multiple nodes can coordinate the start and end of a process or calculation.
  • This ensures that any follow-up processing is done only after all nodes have finished their calculations.

Usage in a Data-Center

Complex Ad-Serving environment

Zookeeper is also useful in a complex Data-Center environment

  • Let us consider the case of a Complex Ad-Serving system. It consists of several components
    • Database for Campaign data and Fiscal transactions
    • Ad-serving engines for serving the best Advertisements for the customers
    • Campaign planners for advertisers to run campaigns and simulations
    • Log collection engines for Data Warehousing, and data planning.
    • Data analytics and modeling systems
    • Fraud detection systems
    • Beacons and fault management systems
    • Failover servers

Bootstrap Configuration

One of the most important uses of Zookeeper in these cases is as a “Bootstrap Server”.

  • It contains the way to contact the “Services”, when all of the services are not running
  • It can store the primary and secondary configurations.

Distributed Service Locator

The Distributed Service Locator allows a way for services to access other services

  • Services may come up, and use “Leader Election” to decide the configurations.
  • They can store their status, which can then be queried.

Distributed System State

Zookeeper is usually used to maintain top-level system states, so that

  • An upto-date directory of which machine is running which service may be maintained
  • This directory may be used by the Monitoring Software to decide which machines should be monitored and how.

Configuration Push

To make configuration changes, and push them to the servers that use them,

  • Zookeeper can pro-actively push the configuration, if a new configuration is created
  • It can also be used to push software onto each of the clusters.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: