CO-ORDINATION WITH ZOOKEEPER
PRESENTED BY: 1. PRATAP CHANDRA DAS 2. SHORAJ TOMER 3. SOUGATA BHATTACHARYA
CO-ORDINATION WITH ZOOKEEPER PRESENTED BY: 1. PRATAP CHANDRA DAS - - PowerPoint PPT Presentation
CO-ORDINATION WITH ZOOKEEPER PRESENTED BY: 1. PRATAP CHANDRA DAS 2. SHORAJ TOMER 3. SOUGATA BHATTACHARYA CONTENT Distributed Computing : A brief Introduction Problems of manageability in distributed computing Solution: Apache
PRESENTED BY: 1. PRATAP CHANDRA DAS 2. SHORAJ TOMER 3. SOUGATA BHATTACHARYA
DISTRIBUTED COMPUTING : INTRODUCTION
network
A distributed system
is a model in which components located
computers communicate and coordinate their actions by passing messages.
The word distributed
means data being spread out over more than one computer in a network.
Single le Machine ne
Simple Architecture Takes hours to complete a huge,
complex task
All the processes stop if the system
crashes
Distrib ibut uted d Applicat atio ion
Complex Architecture Huge tasks can be done within
minutes
If one system crashes, other
systems keep running and may take
Client Software Server Software
Cluster: A group of
systems
Node: Each System
in a cluster
Scalability: The system can easily be expanded by adding
more machines as needed.
Redundancy: Several machines can provide the same
services, so if one is unavailable, work does not stop.
Ease of development and maintenance Coordination of autonomous actions
Complexity: More complex than centralized systems. Network reliance: Messages can be lost in the
communication network.
Security: More susceptible to external attacks. Multiple Point of Failure: Much more prone to error due to
huge number of machines.
Manageability: More effort is required for system
management.
Race Condition: Performing two or more operations at the
same time.
Deadlock: Two or more machines trying to access the same
shared resources at the same time.
Partial Failure of Process: Leads to inconsistency of data.
Race Condition: Serialization property of Zookeeper Deadlock: Synchronization property of Zookeeper Partial Failure of Process: Handled through atomicity
service for distributed applications. It is also called as 'King of Coordination'
for distributed applications.
FORMA L DEFINITION: : It is a distributed open source Configuration, Synchronization service along with naming registry for distributed applications. It is used to manage and coordinate large cluster of machines.
data out of it.
synchronization, Serialization and coordination goals.
In the good old past, each application software was a single program running on a single computer with a single CPU. Today, things have changed. In the Big Data world, application software are made up of many independent programs running on an ever-changing set of computers. These applications are known as Distributed Application. A distributed application can run on multiple systems in a network simultaneously by coordinating among themselves to complete a particular task in a fast and efficient manner.
Coordinating the actions of the independent programs in a distributed systems is far more difficult than writing a single program to run on a single computer. It is easy for developers to get mired in coordination logic and lack the time to write their application logic properly or perhaps the converse, to spend little time with the coordination logic and simply to write a quick-and-dirty master coordinator that is fragile and becomes an unreliable single point of failure. Zookeeper is an important part of Hadoop that take care of these small but important issues so that developer can focus more on functionality of the application.
NAME ME SERVICE ICE:- Zoo-Keeper exposes a simple interface for Naming service which identifies the nodes in a cluster by name similar to DNS. LOCKING ING:- Zoo-Keeper provides for an easy way for you to implement distributed mutexes to allow for serialized access to a shared resource in your distributed system.
NFIGU GURATION ION MANA NAGEM GEMEN ENT:- You can use Zoo-Keeper to centrally store and
manage the configuration of your distributed system. This means that any new nodes joining will pick up the up-to-date centralized configuration from Zoo-Keeper as soon as they join the system. This also allows you to centrally change the state of your distributed system by changing the centralized configuration through one of the Zoo- Keeper clients.
DER ELECT CTION ION:- Zoo-Keeper provides off-the-shelf support for leader election
which will deal with the problem of nodes going down.
NCHR HRON ONIZA IZATIO TION:- Hand in hand with distributed mutexes is the need for
synchronizing access to shared resources. Whether implementing a producer-consumer queue or a barrier, Zoo-Keeper provides for a simple interface to implement that.
distributed lock managers or have used distributed databases for coordination.
scratch, it's extra work and difficult to debug any problems, race conditions, or deadlocks.
services or leader election services from scratch every time they need it.
Moreover, you could hack together a very simple group membership service relatively easily, but it would require much more work to write it to provide reliability, replication, and scalability. This led to the development and open sourcing of Apache Zoo-Keeper, an out-of-the box reliable, scalable, and high-performance coordination service for distributed systems.
not expose a lock interface or a general purpose interface for storing data, however. The design of Zoo-Keeper is specialized and very focused on coordination tasks.
application logic rather than on arcane distributed systems concepts.
The Origin of the Name “Zoo-Keeper”
Zoo-Keeper was developed at Yahoo! Research. Yahoo had been working on Zoo- Keeper for a while and pitching it to other groups. At the time the Zoo-Keeper group had been working with the Hadoop team and had started a variety of projects with the names of animals, Apache Pig being the most well known. As the group started talking about different possible names, one of the group members mentioned that they should avoid another animal name because it started to sound like a zoo. That is when it clicked: distributed systems are a zoo. They are chaotic and hard to manage, and Zoo-Keeper is meant to keep them under control.
It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.
Zoo-Keeper, while being a coordination service for distributed systems, is a distributed application on its own.
machines) that make use of the service, and servers are nodes that provide the service.
client library is responsible for the interaction with Zoo-Keeper servers. Each client imports the client library, and then can communicate with any Zoo-Keeper node.
dalone e and qu quor
dalone mode de is pretty much what the term says: there is a single server, and Zoo-Keeper state is not replicated.
m mode, a group of Zoo-Keeper servers, which we call a Zoo- Keeper ensemble, replicates the state, and together they serve client requests.
Keeper server. Each Zoo-Keeper server can handle a large number of client connections at the same time. Each client periodically sends pings to the Zoo-Keeper server it is connected to let it know that it is alive and connected. The Zoo-Keeper server responds with an acknowledgment of the ping, indicating the server is alive as well.
within the specified time, the client connects to another server in the ensemble, and the client session is transparently transferred over to the new Zoo-Keeper server.
also expose calls to create instances of each primitive, and manipulate these instances directly. But Zoo-Keeper does not expose primitives directly.
traditional UNIX-like system, except that they can have child nodes.
Another way to look at them is as directories that can have data associated with themselves. Each of these directories is called a znode.
node in ZooKeeper's name space is identified by a path. Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory.
configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.
write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.
changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.
allows for scalable and quick responses to reads from the clients.
write requests. This transaction log is also the most performance critical part of ZooKeeper because a ZooKeeper server must sync transactions to disk before it returns a successful response.
shouldn't be used as a general-purpose file system. Instead, it should only be used as a storage mechanism for the small amount of data required for providing reliability, availability, and coordination to your distributed application.
The absence of data often conveys important information about a znode.
in the system. If a worker becomes unavailable, its znode should be removed from /workers.
represent new tasks and wait for znodes representing the status of the task.
a worker. When a master assigns a task to a worker, it adds a child znode to /assign
When creating a new znode, you also need to specify a mode. The different modes determine how the znode behaves.
isten ent t and ephemeral eral znodes es
So, we have following four modes of znodes, as
is useful when znodes store some data on behalf of an application and this data need to be preserved even after its creator is no longer part of the system.
it crashes or simply closes its connection to ZooKeeper. Contain some information about some aspect of the application that must exist only while the session of its creator is valid.
Number is appended to the path used to create the Znode.
names.
data changes.
matches the current version on the server.
perform operations over the same znode.
be very expensive.
change in the zookeeper ensemble.
receiving each notification.
continuing, the delays might be unacceptable.
running and available in order for ZooKeeper to work.
data before telling the client it is safely stored.
service, we deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available.
znodes in quorum.
actually makes the system more fragile. So its better to have odd number of znodes.
establish a session with the service.
When a session ends for any reason, the ephemeral nodes created during that session disappear.
with the service. The client initially connects to any server in the ensemble, and only to a single server.
moved to a different server if the client has not heard from its current server for some time.
client library.
ions ns offer r order er guarant rantee ees : Requests in a session are executed in FIFO (first in, first out) order. Once a client connects to a server, the session will be established and a session id is assigned to the client. The client sends heartbeats to keep the session valid. If the ZooKeeper ensemble does not receive heartbeats from a client for more than the period agreed at the starting of the service, zooKeeper decides that the client is dead. When this happens session is ended and followed by deletion
The lifetime of a session is the period between its creation and its end, whether it is closed gracefully or expires because of a timeout. The possible states of a session are : CONNECTING, CONNECTED, CLOSED, and NOT_CONNECTED.
A session starts at the NOT_CONNECTED state and transitions to CONNECTING with the initialization of the ZooKeeper client. The connection to a ZooKeeper server succeeds and the session transitions to CONNECTED . When the client loses its connection to the ZooKeeper server or doesn’t hear from the server, it transitions back to CONNECTING and tries to find another ZooKeeper server. If it is able to find another server or to reconnect to the original server, it transitions back to CONNECTED once the server confirms that the session is still valid. Otherwise, it declares the session expired and transitions to CLOSED.
which is the amount of time the ZooKeeper service allows a session before declaring it expired.
t, it declares the session expired.
heartbeat message to the server. At 2/3 of t, the ZooKeeper client starts looking for a different server, and it has another 1/3 of t to find one.
state of this server to be at least as fresh as the last ZooKeeper state the client has.
getChildren) locally. When a server receives, say, a getData request from a client, it reads its state and returns it to the client. Because it serves requests locally, ZooKeeper is pretty fast at serving read dominated workloads.
transaction.
ZooKeeper state to reflect the execution of the request.
applied atomically.
applied atomically and there is no interference from other transactions.
and we will get the same result.
identifier that we call a ZooKeeper transaction ID (zxid). Zxids identify transactions so that they are applied to the state of servers in the order established by the leader. Servers also exchange zxids when electing a new leader, so they can determine which nonfaulty server has received more transactions and can synchronize their states. A zxid is a long (64-bit) integer split into two parts: the epoch and the counter. Each part has 32 bits.
executes the request and broadcasts the result of the execution as a state update, in the form of a transaction.
called Zab-the ZooKeeper Atomic Broadcast protocol. Assuming that there is an active leader and it has a quorum of followers supporting its leadership, the protocol to commit a transaction is very simple:
leader that it has accepted the proposal.
itself), the leader sends a message informing the followers to COMMIT it.
Before acknowledging a proposal, the follower needs to perform a couple of additional checks. The follower needs to check that the proposal is from the leader it is currently following, and that it is acknowledging proposals and committing transactions in the same order that the leader broadcasts them in.
The first property guarantees that transactions are delivered in the same order across servers, whereas the second property guarantees that servers do not skip transactions. Given that the transactions are state updates and each state update depends upon the previous state update, skipping transactions could create inconsistencies. The two phase commit guarantees the ordering of transactions.
Zab guar aran antees ees a couple le of impor mportan tant t prop
erties ties:
committing Tʹ .
commit T before Tʹ .
data tree and writing it to a file. The servers do not need to coordinate to take snapshots, nor do they have to stop processing requests. Because servers keep executing requests while taking a snapshot, the data tree changes as the snapshot is
exact state of the data tree at any particular point in time.
and /z' is the integer 1. Now consider the following sequence of steps:
This snapshot contains /z = 1 and /z' = 2. However, there has never been a point in time in which the values of both znodes were like that. It tags each snapshot with the last transaction that has been committed when the snapshot starts—call it TS. If the server eventually loads the snapshot, it replays all transactions in the transaction log that come after TS.
In this case, they are T and Tʹ . After replaying T and Tʹ on top of the snapshot, the server obtains /z = 2 and /z' = 2, which is a valid state. There is no problem with applying Tʹ again because, transactions are idempotent. So as long as we apply the same transactions in the same order, we will get the same result even if some of them have already been applied to the snapshot.
continues to have support from that ensemble.
state: create, setData, and delete. The leader transforms each request into a transaction, and proposes to the followers that the ensemble accepts and applies them in the order issued by the leader.
process tries to elect a new leader or become a leader. If the process finds an elected leader, it moves to the FOLLOWING state and begins to follow the leader. Processes in the FOLLOWING state are followers.
Each server starts in the LOOKING state, where it must either elect a new leader or find the existing one. If a leader already exists, other servers inform the new one which server is the leader. At this point, the new server connects to the leader and makes sure that its own state is consistent with the state of the leader. If an ensemble of servers, however, are all in the LOOKING state, they must communicate to elect a leader. They exchange messages to converge on a common choice for the leader. The server that wins this election enters the LEADING state, while the other servers in the ensemble enter the FOLLOWING state. The leader election messages are called leader election notifications, or simply notifications.
a batch of notification messages, one to each of the other servers in the ensemble. The message contains its current vote, which consists of the server’s identifier (sid) and the zxid (zxid) of the most recent transaction it executed. Upon receiving a vote, a server changes its vote according to the following rules:
receiver, whereas my Zxid and my Sid are the values of the receiver itself.
current vote.
Zxid.
most recent zxid. This simplifies the process of restarting a quorum when a leader dies. If multiple servers have the most recent zxid, the
server declares the leader elected. If the elected leader is the server itself, it starts executing the leader role. Otherwise, it becomes a follower and tries to connect to the elected leader
connection are discarded and the leader considers the corresponding follower down.
heartbeats.
timeout interval, the leader renounces leadership of the epoch, and transitions to the ELECTION state and the whole process starts all over again.
have
proposals to followers
does so if the new epoch proposed is later than its own.
as it receives heartbeats within a timeout interval.
abandons the leader, transitions to ELECTION and proceeds to elections Phase to start all over again.
Livene ness To sustain leadership, a leader process needs to be able to send messages to and receive messages from followers. In fact, leader process requires that a quorum of followers are up and select it as their leader to maintain its leadership
transactions log that the leader says to delete
send an ack to leader
transactions
streams of records.
e Hadoop
Reduce: e: The next generation of Hadoop MapReduce (colled "Yarn") uses ZooKeeper.
e Hbase: HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable. HBase uses ZooKeeper for master election, server lease management, bootstrapping, and coordination between servers.
e Ka Kafka: a: Kafka is a distributed publish/subscribe messaging system. Kafka queue consumers uses Zookeeper to store information on what has been consumed from the queue.
e Storm: Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.
Download and Install JDK, if not already installed. This
is required because Zookeeper runs on JVM
Download the Zookeeper 3.4.5 tar.gz tarball from the
path below and un-tar it to an appropriate location.
Path: https://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/
Path we used to un-tar the tar file: D:\ZookeeperTutorial\zookeeper-3.4.5
Create a directory for storing some state associated with the Zookeeper server.
Setting up the configuration file
Create a zoo.conf file inside /conf folder with the below contents.
Add the following environment variables
D:\ZookeeperTutorial\zookeeper-3.4.5
3.4.5\bin
Create a myid file in
D:\ZookeeperTutorial\zookeeper-3.4.5\data
Start the Zookeeper servers. Start a CLI client from one of the machines you are running the Zookeeper server.
Create a znode at /mynode. Verify and retrieve the data at /mynode.
Removing the znode. Creating another znode. Verify and retrieve the data at /mysecondnode.
This time an optional parameter 1 is supplied.
Change the value of the data associated with /mysecondnode. Getting a watch notification. Create subnode.