CO-ORDINATION WITH ZOOKEEPER PRESENTED BY: 1. PRATAP CHANDRA DAS - - PowerPoint PPT Presentation

co ordination with zookeeper
SMART_READER_LITE
LIVE PREVIEW

CO-ORDINATION WITH ZOOKEEPER PRESENTED BY: 1. PRATAP CHANDRA DAS - - PowerPoint PPT Presentation

CO-ORDINATION WITH ZOOKEEPER PRESENTED BY: 1. PRATAP CHANDRA DAS 2. SHORAJ TOMER 3. SOUGATA BHATTACHARYA CONTENT Distributed Computing : A brief Introduction Problems of manageability in distributed computing Solution: Apache


slide-1
SLIDE 1

CO-ORDINATION WITH ZOOKEEPER

PRESENTED BY: 1. PRATAP CHANDRA DAS 2. SHORAJ TOMER 3. SOUGATA BHATTACHARYA

slide-2
SLIDE 2

CONTENT

  • Distributed Computing : A brief Introduction
  • Problems of manageability in distributed computing
  • Solution: Apache Zookeeper
  • What Zookeeper does?
  • A brief History
  • Framework of Zookeeper
  • Data Model and Hierarchical namespace of Zookeeper
  • Different modes for znodes
  • Zookeeper Quorums
  • Zookeeper Sessions, Requests and Transactions
  • Zab - ZooKeeper Atomic Broadcast protocol
  • Zookeeper Snapshots
  • Leader and Follower Prorocol
  • Projects which uses ZooKeeper
  • Tutorial – Installation, Setup, znode (creating, editing, deleting), subnode
slide-3
SLIDE 3

DISTRIBUTED COMPUTING : INTRODUCTION

network

slide-4
SLIDE 4

INTRODUCTION

 A distributed system

is a model in which components located

  • n networked

computers communicate and coordinate their actions by passing messages.

slide-5
SLIDE 5

WHY ‘DISTRIBUTED’?

 The word distributed

means data being spread out over more than one computer in a network.

slide-6
SLIDE 6

Single le Machine ne

 Simple Architecture  Takes hours to complete a huge,

complex task

 All the processes stop if the system

crashes

Distrib ibut uted d Applicat atio ion

 Complex Architecture  Huge tasks can be done within

minutes

 If one system crashes, other

systems keep running and may take

  • ver the faulty process

SINGLE VS DISTRIBUTED

slide-7
SLIDE 7

A DISTRIBUTED APPLICATION

 Client Software  Server Software

slide-8
SLIDE 8

CLUSTER AND NODE

 Cluster: A group of

systems

 Node: Each System

in a cluster

slide-9
SLIDE 9

ADVANTAGES

 Scalability: The system can easily be expanded by adding

more machines as needed.

 Redundancy: Several machines can provide the same

services, so if one is unavailable, work does not stop.

 Ease of development and maintenance  Coordination of autonomous actions

slide-10
SLIDE 10

DISADVANTAGES

 Complexity: More complex than centralized systems.  Network reliance: Messages can be lost in the

communication network.

 Security: More susceptible to external attacks.  Multiple Point of Failure: Much more prone to error due to

huge number of machines.

 Manageability: More effort is required for system

management.

slide-11
SLIDE 11

MAJOR ISSUES ON MANAGING DISTRIBUTED SYSTEMS

 Race Condition: Performing two or more operations at the

same time.

 Deadlock: Two or more machines trying to access the same

shared resources at the same time.

 Partial Failure of Process: Leads to inconsistency of data.

slide-12
SLIDE 12

SOLUTION : ZOOKEEPER

slide-13
SLIDE 13

SOLVING THE MANAGEABILITY ISSUES

 Race Condition: Serialization property of Zookeeper  Deadlock: Synchronization property of Zookeeper  Partial Failure of Process: Handled through atomicity

slide-14
SLIDE 14

WHAT IS APACHE ZOOKEEPER?

  • Zookeeper is a distributed, open-source coordination

service for distributed applications. It is also called as 'King of Coordination'

  • Zookeeper is a centralized service for
  • 1. Maintaining configuration information,
  • 2. Naming,
  • 3. Providing distributed synchronization and
  • 4. Providing group services,

for distributed applications.

slide-15
SLIDE 15

FORMA L DEFINITION: : It is a distributed open source Configuration, Synchronization service along with naming registry for distributed applications. It is used to manage and coordinate large cluster of machines.

  • It is centralized repository where distributed application can be put data and get

data out of it.

  • Used to keep the distributed system functioning together as single unit, using its

synchronization, Serialization and coordination goals.

  • It is Hadoop admin tool used for managing the jobs in the cluster.
slide-16
SLIDE 16
  • Must be able to tolerate failures
  • Must be able to recover from correlated recoverable failures (power
  • utages)
  • Must be correct
  • Must be easy to implement correctly
  • Must be fast (high throughput, low latency)

DESIGN GOALS OF ZOOKEEPER

slide-17
SLIDE 17

In the good old past, each application software was a single program running on a single computer with a single CPU. Today, things have changed. In the Big Data world, application software are made up of many independent programs running on an ever-changing set of computers. These applications are known as Distributed Application. A distributed application can run on multiple systems in a network simultaneously by coordinating among themselves to complete a particular task in a fast and efficient manner.

WHY APACHE ZOOKEEPER?

slide-18
SLIDE 18

Coordinating the actions of the independent programs in a distributed systems is far more difficult than writing a single program to run on a single computer. It is easy for developers to get mired in coordination logic and lack the time to write their application logic properly or perhaps the converse, to spend little time with the coordination logic and simply to write a quick-and-dirty master coordinator that is fragile and becomes an unreliable single point of failure. Zookeeper is an important part of Hadoop that take care of these small but important issues so that developer can focus more on functionality of the application.

slide-19
SLIDE 19

NAME ME SERVICE ICE:- Zoo-Keeper exposes a simple interface for Naming service which identifies the nodes in a cluster by name similar to DNS. LOCKING ING:- Zoo-Keeper provides for an easy way for you to implement distributed mutexes to allow for serialized access to a shared resource in your distributed system.

WHAT DOES A ZOOKEEPER DO?

slide-20
SLIDE 20
  • CONFI

NFIGU GURATION ION MANA NAGEM GEMEN ENT:- You can use Zoo-Keeper to centrally store and

manage the configuration of your distributed system. This means that any new nodes joining will pick up the up-to-date centralized configuration from Zoo-Keeper as soon as they join the system. This also allows you to centrally change the state of your distributed system by changing the centralized configuration through one of the Zoo- Keeper clients.

  • LEADER

DER ELECT CTION ION:- Zoo-Keeper provides off-the-shelf support for leader election

which will deal with the problem of nodes going down.

  • SYNC

NCHR HRON ONIZA IZATIO TION:- Hand in hand with distributed mutexes is the need for

synchronizing access to shared resources. Whether implementing a producer-consumer queue or a barrier, Zoo-Keeper provides for a simple interface to implement that.

slide-21
SLIDE 21
  • Previously distributed systems have implemented components like

distributed lock managers or have used distributed databases for coordination.

  • While it's possible to design and implement all of these services from

scratch, it's extra work and difficult to debug any problems, race conditions, or deadlocks.

  • There was a need that people shouldn't go around writing their own name

services or leader election services from scratch every time they need it.

WORLD WITHOUT ZOOKEEPER

slide-22
SLIDE 22

Moreover, you could hack together a very simple group membership service relatively easily, but it would require much more work to write it to provide reliability, replication, and scalability. This led to the development and open sourcing of Apache Zoo-Keeper, an out-of-the box reliable, scalable, and high-performance coordination service for distributed systems.

MOTIVATION BEHIND ZOOKEEPER

slide-23
SLIDE 23
  • Zoo-Keeper, in fact, borrows a number of concepts from these prior systems. It does

not expose a lock interface or a general purpose interface for storing data, however. The design of Zoo-Keeper is specialized and very focused on coordination tasks.

  • It is certainly possible to build distributed systems without using Zoo-Keeper.
  • Zoo-Keeper, however, offers developers the possibility of focusing more on

application logic rather than on arcane distributed systems concepts.

  • Programming distributed systems without Zoo-Keeper is possible, but more difficult.
slide-24
SLIDE 24

The Origin of the Name “Zoo-Keeper”

Zoo-Keeper was developed at Yahoo! Research. Yahoo had been working on Zoo- Keeper for a while and pitching it to other groups. At the time the Zoo-Keeper group had been working with the Hadoop team and had started a variety of projects with the names of animals, Apache Pig being the most well known. As the group started talking about different possible names, one of the group members mentioned that they should avoid another animal name because it started to sound like a zoo. That is when it clicked: distributed systems are a zoo. They are chaotic and hard to manage, and Zoo-Keeper is meant to keep them under control.

HISTORY

slide-25
SLIDE 25

It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

FRAMEWORK OF ZOOKEEPER

Zoo-Keeper, while being a coordination service for distributed systems, is a distributed application on its own.

slide-26
SLIDE 26
  • It follows a simple client-server model where clients are nodes (i.e.,

machines) that make use of the service, and servers are nodes that provide the service.

  • Applications make calls to Zoo-Keeper through a client library. The

client library is responsible for the interaction with Zoo-Keeper servers. Each client imports the client library, and then can communicate with any Zoo-Keeper node.

slide-27
SLIDE 27
  • Zoo-Keeper servers run in two modes: standalo

dalone e and qu quor

  • rum.
  • Standalone

dalone mode de is pretty much what the term says: there is a single server, and Zoo-Keeper state is not replicated.

  • Quor
  • rum

m mode, a group of Zoo-Keeper servers, which we call a Zoo- Keeper ensemble, replicates the state, and together they serve client requests.

slide-28
SLIDE 28
  • At any given time, one Zoo-Keeper client is connected to one Zoo-

Keeper server. Each Zoo-Keeper server can handle a large number of client connections at the same time. Each client periodically sends pings to the Zoo-Keeper server it is connected to let it know that it is alive and connected. The Zoo-Keeper server responds with an acknowledgment of the ping, indicating the server is alive as well.

  • When the client doesn't receive an acknowledgment from the server

within the specified time, the client connects to another server in the ensemble, and the client session is transparently transferred over to the new Zoo-Keeper server.

slide-29
SLIDE 29
  • A coordination service can be thought of, containing a list of primitives,

also expose calls to create instances of each primitive, and manipulate these instances directly. But Zoo-Keeper does not expose primitives directly.

  • Instead, Zoo-Keeper has a file system-like data model composed of
  • znodes. Think of znodes (Zoo-Keeper data nodes) as files in a

traditional UNIX-like system, except that they can have child nodes.

slide-30
SLIDE 30

Another way to look at them is as directories that can have data associated with themselves. Each of these directories is called a znode.

slide-31
SLIDE 31
  • A name is a sequence of path elements separated by a slash (/). Every

node in ZooKeeper's name space is identified by a path. Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory.

  • ZooKeeper was designed to store coordination data: status information,

configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.

DATA MODEL AND THE HIERARCHICAL NAMESPACE OF ZOOKEEPER

slide-32
SLIDE 32
  • The data stored at each znode in a namespace is read and written
  • atomically. Reads get all the data bytes associated with a znode and a

write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

  • Znodes maintain a stat structure that includes version numbers for data

changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

slide-33
SLIDE 33
  • The znode hierarchy is stored in memory within each of the ZooKeeper servers. This

allows for scalable and quick responses to reads from the clients.

  • Each ZooKeeper server also maintains a transaction log on the disk, which logs all

write requests. This transaction log is also the most performance critical part of ZooKeeper because a ZooKeeper server must sync transactions to disk before it returns a successful response.

  • The default maximum size of data that can be stored in a znode is 1 MB.
  • Consequently, even though ZooKeeper presents a file system-like hierarchy, it

shouldn't be used as a general-purpose file system. Instead, it should only be used as a storage mechanism for the small amount of data required for providing reliability, availability, and coordination to your distributed application.

slide-34
SLIDE 34

The absence of data often conveys important information about a znode.

  • The /workers znode is the parent znode to all znodes representing a worker available

in the system. If a worker becomes unavailable, its znode should be removed from /workers.

  • The /tasks znode is the parent of all tasks created and waiting for workers to execute
  • them. Clients of the master-worker application add new znodes as children of /tasks to

represent new tasks and wait for znodes representing the status of the task.

  • The /assign znode is the parent of all znodes representing an assignment of a task to

a worker. When a master assigns a task to a worker, it adds a child znode to /assign

slide-35
SLIDE 35

When creating a new znode, you also need to specify a mode. The different modes determine how the znode behaves.

  • Persis

isten ent t and ephemeral eral znodes es

  • Sequential znodes
  • Persistent,
  • ephemeral,
  • persistent_sequential,
  • ephemeral_sequential

So, we have following four modes of znodes, as

DIFFERENT MODES FOR ZNODES

slide-36
SLIDE 36
  • A persistent znode /path can be deleted only through a call to delete. It

is useful when znodes store some data on behalf of an application and this data need to be preserved even after its creator is no longer part of the system.

  • An ephemeral znode, in contrast, is deleted if the client that created

it crashes or simply closes its connection to ZooKeeper. Contain some information about some aspect of the application that must exist only while the session of its creator is valid.

PERSISTENT AND EPHEMERAL ZNODES

slide-37
SLIDE 37

SEQUENTIAL ZNODES

  • Is assigned a unique monotonically increasing integer. This sequence

Number is appended to the path used to create the Znode.

  • Sequential znodes provide an easy way to create znodes with unique

names.

  • They also provide a way to easily see the creation order of znodes.
slide-38
SLIDE 38

VERSIONS

  • Number assosciated with znodes which incremented every time its

data changes.

  • A operation can performed only if the version passed by the client

matches the current version on the server.

  • It is important when multiple zookeeper clients might be trying to

perform operations over the same znode.

slide-39
SLIDE 39

ZOOKEEPER WATCHES

  • Accessing a znode every time a client need to know its content would

be very expensive.

  • Watches are mechanism for the client to get notification about the

change in the zookeeper ensemble.

  • Watch is a one shot operation which means it triggers one notification.
  • To receive multiple notification client need to set a new watch upon

receiving each notification.

slide-40
SLIDE 40

ZOOKEEPER QUORUMS : IN QUORUM MODE ZOOKEEPER

  • In quorum mode, ZooKeeper replicates its data tree across all servers in the
  • ensemble. But if a client had to wait for every server to store its data before

continuing, the delays might be unacceptable.

  • Quorum is the minimum number of legislators required to be present for a vote.
  • Similarly in ZooKeeper, it is the minimum number of servers that have to be

running and available in order for ZooKeeper to work.

  • This number is also the minimum number of servers that have to store a client’s

data before telling the client it is safely stored.

slide-41
SLIDE 41

HOW TO CHOOSE AN ADEQUATE SIZE FOR THE ZOOKEEPER QUORUM

  • It is important to choose an adequate size for the quorum. For reliable ZooKeeper

service, we deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available.

  • Because Zookeeper requires a majority, it is best to have more than half of the

znodes in quorum.

  • The number of servers in the ensemble is not mandatorily odd, but an even number

actually makes the system more fragile. So its better to have odd number of znodes.

slide-42
SLIDE 42

ZOOKEEPER SESSIONS

  • Before executing any request to a ZooKeeper ensemble, a client must

establish a session with the service.

  • Sessions are very important for the operation of ZooKeeper. All
  • perations a client submits to ZooKeeper are associated to a session.

When a session ends for any reason, the ephemeral nodes created during that session disappear.

  • When a client creates a ZooKeeper handle, it establishes a session

with the service. The client initially connects to any server in the ensemble, and only to a single server.

slide-43
SLIDE 43
  • It uses a TCP connection to communicate with the server, but the session may be

moved to a different server if the client has not heard from its current server for some time.

  • Moving a session to a different server is handled transparently by the ZooKeeper

client library.

  • Sessio

ions ns offer r order er guarant rantee ees : Requests in a session are executed in FIFO (first in, first out) order. Once a client connects to a server, the session will be established and a session id is assigned to the client. The client sends heartbeats to keep the session valid. If the ZooKeeper ensemble does not receive heartbeats from a client for more than the period agreed at the starting of the service, zooKeeper decides that the client is dead. When this happens session is ended and followed by deletion

  • f ephemeral znodes created during that session.
slide-44
SLIDE 44

STATES AND THE LIFETIME OF A SESSION

The lifetime of a session is the period between its creation and its end, whether it is closed gracefully or expires because of a timeout. The possible states of a session are : CONNECTING, CONNECTED, CLOSED, and NOT_CONNECTED.

slide-45
SLIDE 45

A session starts at the NOT_CONNECTED state and transitions to CONNECTING with the initialization of the ZooKeeper client. The connection to a ZooKeeper server succeeds and the session transitions to CONNECTED . When the client loses its connection to the ZooKeeper server or doesn’t hear from the server, it transitions back to CONNECTING and tries to find another ZooKeeper server. If it is able to find another server or to reconnect to the original server, it transitions back to CONNECTED once the server confirms that the session is still valid. Otherwise, it declares the session expired and transitions to CLOSED.

slide-46
SLIDE 46
  • One important parameter to set when creating a session is the session timeout,

which is the amount of time the ZooKeeper service allows a session before declaring it expired.

  • If the service does not see messages associated to a given session during time

t, it declares the session expired.

  • On the client side, if it has heard nothing from the server at 1/3 of t, it sends a

heartbeat message to the server. At 2/3 of t, the ZooKeeper client starts looking for a different server, and it has another 1/3 of t to find one.

  • When trying to connect to a different server, it is important for the ZooKeeper

state of this server to be at least as fresh as the last ZooKeeper state the client has.

slide-47
SLIDE 47

ZOOKEEPER REQUESTS AND TRANSACTIONS

  • ZooKeeper servers process read requests (exists, getData, and

getChildren) locally. When a server receives, say, a getData request from a client, it reads its state and returns it to the client. Because it serves requests locally, ZooKeeper is pretty fast at serving read dominated workloads.

  • The leader executes the request, producing a state update that we call a

transaction.

  • Whereas the request expresses the operation the way the client
  • riginates it, the transaction comprises the steps taken to modify the

ZooKeeper state to reflect the execution of the request.

slide-48
SLIDE 48
  • A transaction is treated as a unit, in the sense that all changes it contains must be

applied atomically.

  • when a ZooKeeper ensemble applies transactions, it makes sure that all changes are

applied atomically and there is no interference from other transactions.

  • A transaction is also idempotent. That is, we can apply the same transaction twice

and we will get the same result.

  • When the leader generates a new transaction, it assigns to the transaction an

identifier that we call a ZooKeeper transaction ID (zxid). Zxids identify transactions so that they are applied to the state of servers in the order established by the leader. Servers also exchange zxids when electing a new leader, so they can determine which nonfaulty server has received more transactions and can synchronize their states. A zxid is a long (64-bit) integer split into two parts: the epoch and the counter. Each part has 32 bits.

slide-49
SLIDE 49

ZAB - ZOOKEEPER ATOMIC BROADCAST PROTOCOL

  • When the client sends a write request, a follower forwards it to the leader. The leader

executes the request and broadcasts the result of the execution as a state update, in the form of a transaction.

  • The server determines that a transaction has been committed by following a protocol

called Zab-the ZooKeeper Atomic Broadcast protocol. Assuming that there is an active leader and it has a quorum of followers supporting its leadership, the protocol to commit a transaction is very simple:

  • The leader sends a PROPOSAL message, p, to all followers.
  • Upon receiving p, a follower responds to the leader with an ACK, informing the

leader that it has accepted the proposal.

  • Upon receiving acknowledgments from a quorum (the quorum includes the leader

itself), the leader sends a message informing the followers to COMMIT it.

slide-50
SLIDE 50

Before acknowledging a proposal, the follower needs to perform a couple of additional checks. The follower needs to check that the proposal is from the leader it is currently following, and that it is acknowledging proposals and committing transactions in the same order that the leader broadcasts them in.

slide-51
SLIDE 51

The first property guarantees that transactions are delivered in the same order across servers, whereas the second property guarantees that servers do not skip transactions. Given that the transactions are state updates and each state update depends upon the previous state update, skipping transactions could create inconsistencies. The two phase commit guarantees the ordering of transactions.

Zab guar aran antees ees a couple le of impor mportan tant t prop

  • per

erties ties:

  • If the leader broadcasts T and Tʹ in that order, each server must commit T before

committing Tʹ .

  • If any server commits transactions T and Tʹ in that order, all other servers must also

commit T before Tʹ .

slide-52
SLIDE 52

ZOOKEEPER SNAPSHOTS

  • Snapshots are copies of the ZooKeeper data tree.
  • Each server frequently takes a snapshot of the data tree by serializing the whole

data tree and writing it to a file. The servers do not need to coordinate to take snapshots, nor do they have to stop processing requests. Because servers keep executing requests while taking a snapshot, the data tree changes as the snapshot is

  • taken. We call such snapshots fuzzy, because they do not necessarily reflect the

exact state of the data tree at any particular point in time.

  • Let’s say that a data tree has only two znodes: /z and /z'. Initially, the data of both /z

and /z' is the integer 1. Now consider the following sequence of steps:

slide-53
SLIDE 53

This snapshot contains /z = 1 and /z' = 2. However, there has never been a point in time in which the values of both znodes were like that. It tags each snapshot with the last transaction that has been committed when the snapshot starts—call it TS. If the server eventually loads the snapshot, it replays all transactions in the transaction log that come after TS.

  • Start a snapshot.
  • Serialize and write /z = 1 to the snapshot.
  • Set the data of /z to 2 (transaction T).
  • Set the data of /z' to 2 (transaction Tʹ ).
  • Serialize and write /z' = 2 to the snapshot.
slide-54
SLIDE 54

In this case, they are T and Tʹ . After replaying T and Tʹ on top of the snapshot, the server obtains /z = 2 and /z' = 2, which is a valid state. There is no problem with applying Tʹ again because, transactions are idempotent. So as long as we apply the same transactions in the same order, we will get the same result even if some of them have already been applied to the snapshot.

slide-55
SLIDE 55

HOW IS A LEADER ELECTED IN APACHE ZOOKEEPER?

  • The leader is a server that has been chosen by an ensemble of servers and that

continues to have support from that ensemble.

  • The purpose of the leader is to order client requests that change the ZooKeeper

state: create, setData, and delete. The leader transforms each request into a transaction, and proposes to the followers that the ensemble accepts and applies them in the order issued by the leader.

  • When a process starts, it enters the ELECTION state. While in this state the

process tries to elect a new leader or become a leader. If the process finds an elected leader, it moves to the FOLLOWING state and begins to follow the leader. Processes in the FOLLOWING state are followers.

slide-56
SLIDE 56

ESTABLISHING A NEW LEADER

Each server starts in the LOOKING state, where it must either elect a new leader or find the existing one. If a leader already exists, other servers inform the new one which server is the leader. At this point, the new server connects to the leader and makes sure that its own state is consistent with the state of the leader. If an ensemble of servers, however, are all in the LOOKING state, they must communicate to elect a leader. They exchange messages to converge on a common choice for the leader. The server that wins this election enters the LEADING state, while the other servers in the ensemble enter the FOLLOWING state. The leader election messages are called leader election notifications, or simply notifications.

slide-57
SLIDE 57
  • The protocol is extremely simple. When a server enters the LOOKING state, it sends

a batch of notification messages, one to each of the other servers in the ensemble. The message contains its current vote, which consists of the server’s identifier (sid) and the zxid (zxid) of the most recent transaction it executed. Upon receiving a vote, a server changes its vote according to the following rules:

  • Let vote Id and vote Zxid be the identifier and the zxid in the current vote of the

receiver, whereas my Zxid and my Sid are the values of the receiver itself.

  • If (vote Zxid > my Zxid) or (vote Zxid = my Zxid and vote Id > my Sid), keep the

current vote.

  • Otherwise, change my vote by assigning my Zxid to vote Zxid and my Sid to vote

Zxid.

slide-58
SLIDE 58
  • In short, the server that is most up to date wins, because it has the

most recent zxid. This simplifies the process of restarting a quorum when a leader dies. If multiple servers have the most recent zxid, the

  • ne with the highest sid wins.
  • Once a server receives the same vote from a quorum of servers, the

server declares the leader elected. If the elected leader is the server itself, it starts executing the leader role. Otherwise, it becomes a follower and tries to connect to the elected leader

slide-59
SLIDE 59

LEADING

  • A leader proposes operations by queuing them to all connected followers.
  • If a connection to a given follower closes, then the proposals queued to the

connection are discarded and the leader considers the corresponding follower down.

  • To mutually detect crashes in a fine-grained and convenient manner, avoiding
  • perating system reconfiguration, leader and followers exchange periodic

heartbeats.

  • If the leader does not receive heartbeats from a quorum of followers within a

timeout interval, the leader renounces leadership of the epoch, and transitions to the ELECTION state and the whole process starts all over again.

slide-60
SLIDE 60

LEADER PROTOCOL IN A NUTSHELL

  • At startup wait for a quorum of followers to connect
  • Sync with a quorum of followers
  • Tell the follower to delete any transaction that the leader doesn't

have

  • Send any transactions that the follower doesn't have
  • Continually
  • Assign an zxid to any message to be proposed and broadcast

proposals to followers

  • When a quorum has acked a proposal, broadcast a commit
slide-61
SLIDE 61

FOLLOWING

  • When a follower emerges from leader election, it connects to the leader.
  • To support a leader, a follower acknowledges its new epoch proposal, and it only

does so if the new epoch proposed is later than its own.

  • A follower only follows one leader at a time and stays connected to a leader as long

as it receives heartbeats within a timeout interval.

  • If there is an interval with no heartbeat or the TCP connection closes, the follower

abandons the leader, transitions to ELECTION and proceeds to elections Phase to start all over again.

slide-62
SLIDE 62

FOLLOWER PROTOCOL IN A NUTSHELL

Livene ness To sustain leadership, a leader process needs to be able to send messages to and receive messages from followers. In fact, leader process requires that a quorum of followers are up and select it as their leader to maintain its leadership

  • Connect to a leader
  • Delete any transactions in the

transactions log that the leader says to delete

  • Continually
  • Log to the transactions log and

send an ack to leader

  • Deliver any committed

transactions

slide-63
SLIDE 63

PROJECTS WHICH USES ZOOKEEPER

  • Apache BookKeeper: BookKeeper(ZooKeeper subproject) is a replicated service to reliably log

streams of records.

  • Apache

e Hadoop

  • p MapReduc

Reduce: e: The next generation of Hadoop MapReduce (colled "Yarn") uses ZooKeeper.

  • Apache

e Hbase: HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable. HBase uses ZooKeeper for master election, server lease management, bootstrapping, and coordination between servers.

  • Apache

e Ka Kafka: a: Kafka is a distributed publish/subscribe messaging system. Kafka queue consumers uses Zookeeper to store information on what has been consumed from the queue.

  • Apache

e Storm: Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.

slide-64
SLIDE 64

TUTORIAL

slide-65
SLIDE 65

INSTALLATION

 Download and Install JDK, if not already installed. This

is required because Zookeeper runs on JVM

 Download the Zookeeper 3.4.5 tar.gz tarball from the

path below and un-tar it to an appropriate location.

Path: https://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/

slide-66
SLIDE 66

SETUP

Path we used to un-tar the tar file: D:\ZookeeperTutorial\zookeeper-3.4.5

 Create a directory for storing some state associated with the Zookeeper server.

slide-67
SLIDE 67

SETUP

 Setting up the configuration file

Create a zoo.conf file inside /conf folder with the below contents.

slide-68
SLIDE 68

SETUP

 Add the following environment variables

  • 1. ZOOKEEPER_HOME =

D:\ZookeeperTutorial\zookeeper-3.4.5

  • 2. Path = D:\ZookeeperTutorial\zookeeper-

3.4.5\bin

 Create a myid file in

D:\ZookeeperTutorial\zookeeper-3.4.5\data

slide-69
SLIDE 69

STARTING THE ZOOKEEPER SERVER

 Start the Zookeeper servers.  Start a CLI client from one of the machines you are running the Zookeeper server.

slide-70
SLIDE 70

CREATING A ZNODE

 Create a znode at /mynode.  Verify and retrieve the data at /mynode.

slide-71
SLIDE 71

 Removing the znode.  Creating another znode.  Verify and retrieve the data at /mysecondnode.

This time an optional parameter 1 is supplied.

slide-72
SLIDE 72

 Change the value of the data associated with /mysecondnode.  Getting a watch notification.  Create subnode.

slide-73
SLIDE 73

THANKYOU