[PPT] - Agreement in Distributed Systems CS 188 Distributed Systems PowerPoint Presentation

SLIDE 1

Lecture 12 Page 1 CS 188,Winter 2015

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

SLIDE 2

Lecture 12 Page 2 CS 188,Winter 2015

Introduction

We frequently want to get a set of

nodes in a distributed system to agree

Commitment protocols and mutual

exclusion are particular cases

The approaches we discussed for those

work in limited situations

In general, when can we reach

agreement in a distributed system?

SLIDE 3

Lecture 12 Page 3 CS 188,Winter 2015

Basics of Agreement Protocols

What is agreement?
What are the necessary conditions for

agreement?

SLIDE 4

Lecture 12 Page 4 CS 188,Winter 2015

What Do We Mean By Agreement?

In simplest case, can n processors

agree that a variable takes on value 0

r 1?

– Only non-faulty processors need agree

More complex agreements can be built

from this simple agreement

SLIDE 5

Lecture 12 Page 5 CS 188,Winter 2015

Conditions for Agreement Protocols

Consistency

– All participants agree on same value and decisions are final

Validity

– Participants agree on a value at least

ne of them wanted
Termination/Progress

– All participants choose a value in a finite number of steps

SLIDE 6

Lecture 12 Page 6 CS 188,Winter 2015

Challenges to Agreement

Delays

– In message delivery – In nodes responding to messages

Failures

– And recovery from failures

Lies by participants

– Or innocent errors that have similar effects

SLIDE 7

Lecture 12 Page 7 CS 188,Winter 2015

Failures and Agreement

Failures make agreement difficult

– Failed nodes don’t participate – Failed nodes sometimes recover at inconvenient times – At worst, failed nodes participate in harmful ways

Real failures are worse than fail-stop

SLIDE 8

Lecture 12 Page 8 CS 188,Winter 2015

Types of Failures

Fail-stop

– A nice, clean failure – Processor stops executing anything

Realistic failures

– Partitionings – Arbitrary delays

Adversarial failures

– Arbitrary bad things happen

SLIDE 9

Lecture 12 Page 9 CS 188,Winter 2015

Election Algorithms

If you get everyone to agree a particular

node is in charge,

Future consensus is easy, since he makes

the decisions

How do you determine who’s in charge?

– Statically – Dynamically

SLIDE 10

Lecture 12 Page 10 CS 188,Winter 2015

Static Leader Selection Methods

Predefine one process/node as the

leader

Simple

– Everyone always knows who’s the leader

Not very resilient

– If the leader fails, then what?

SLIDE 11

Lecture 12 Page 11 CS 188,Winter 2015

Dynamic Leader Selection Methods

Choose a new leader dynamically

whenever necessary

More complicated
But failure of a leader is easy to handle

– Just elect a new one

Election doesn’t imply voting

– Not necessarily majority-based

SLIDE 12

Lecture 12 Page 12 CS 188,Winter 2015

Election Algorithms vs. Mutual Exclusion Algorithms

Most mutual exclusion algorithms don’t

care much about failures

Election algorithms are designed to handle

failures

Also, mutual exclusion algorithms only

need a winner

Election algorithms need everyone to know

who won

SLIDE 13

Lecture 12 Page 13 CS 188,Winter 2015

A Typical Use of Election Algorithms

A group of processes wants to

periodically take a distributed snapshot

They don’t want multiple simultaneous

snapshots

So they want one leader to order them

to take the snapshot

SLIDE 14

Lecture 12 Page 14 CS 188,Winter 2015

Problems in Election Algorithms

Some of the nodes may have failed

before the algorithm starts

Some of the nodes may fail during the

algorithm

Some nodes may recover from failure

– Possible at inconvenient times

What about partitions?

SLIDE 15

Lecture 12 Page 15 CS 188,Winter 2015

Election Algorithms and the Real Work

The election algorithm is usually overhead
There’s a real computation you want to

perform

The election algorithm chooses someone to

lead it

Having two leaders while real computation

is going on is bad

SLIDE 16

Lecture 12 Page 16 CS 188,Winter 2015

The Bully Algorithm

The biggest kid on the block gets to be

the leader

But what if the biggest kid on the block

is taking his piano lesson?

The next biggest kid gets to be leader

– Until the piano lesson is over . . .

SLIDE 17

Lecture 12 Page 17 CS 188,Winter 2015

Electing a Bully

The kids come out to play Hey, Spike! Spike’s Mom hasn’t let him

ut yet

Hey, Butch! I’m here, who else is? Peewee! Cuthbert! I’m the leader, let’s play tag! The piano lesson ends Cuthbert Peewee! Butch! I’m the leader, and we’re playing baseball! Hey, Spike! Hey, Spike! I’m here, where are you sissies?

SLIDE 18

Lecture 12 Page 18 CS 188,Winter 2015

Assumptions of the Bully Algorithm

A static set of possible participants

– With an agreed-upon order

All messages are delivered with Tm seconds
All responses are sent within Tp seconds of

delivery

These last two imply synchronous behavior

SLIDE 19

Lecture 12 Page 19 CS 188,Winter 2015

The Basic Idea Behind the Bully Algorithm

Possible leaders try to take over
If they detect a better leader, they agree

to its leadership

Keep track of state information about

whether you are electing a leader

Only do real work when you agree on a

leader

SLIDE 20

Lecture 12 Page 20 CS 188,Winter 2015

The Bully Algorithm and Timeouts

Call out the biggest kid’s name

– If he doesn’t answer soon enough, call out the next biggest kid’s name – Until you hear an answer – Or the caller is the biggest kid – Then take over, by telling everyone else you’re the leader

SLIDE 21

Lecture 12 Page 21 CS 188,Winter 2015

The Bully Algorithm At Work

One node is currently the coordinator
It expects a certain set of nodes to be up and

participating

The coordinator asks all other nodes
If an expected node doesn’t answer, start an

election – Also if it answers in the negative

If an unexpected node answers, start an

election

SLIDE 22

Lecture 12 Page 22 CS 188,Winter 2015

The Practicality of the Bully Algorithm

The bully algorithm works reasonably

well if the timeouts are effective – A timeout occurring really means the site in question is down

And there are no partitions at all

– If there are, what happens?

SLIDE 23

Lecture 12 Page 23 CS 188,Winter 2015

The Invitation Algorithm

More practical than bully algorithm

– Doesn’t depend on timeouts

But its results are not as definitive
An asynchronous algorithm

SLIDE 24

Lecture 12 Page 24 CS 188,Winter 2015

The Basic Idea Behind the Invitation Algorithm

A current coordinator tries to get all
ther nodes to agree to his leadership
If more than one coordinator around,

get together and merge groups

Use timeouts only to allow progress,

not to make definitive decisions

No set priorities for who will be

coordinator

SLIDE 25

Lecture 12 Page 25 CS 188,Winter 2015

The Invitation Algorithm and Group Numbers

The invitation algorithm recruits a

group of nodes to work together – More than one group can exist simultaneously

Group numbers identify the group
Why not identify with coordinator ID?

– Because one node can serially coordinate many groups

SLIDE 26

Lecture 12 Page 26 CS 188,Winter 2015

The Basic Operation of the Invitation Algorithm

Coordinators in a normal state

periodically check all other nodes

If any other node is a coordinator, try

to merge the groups

If timeouts occur, don’t worry about it

– Also don’t worry if a response to check comes from this or earlier request

SLIDE 27

Lecture 12 Page 27 CS 188,Winter 2015

Merging in the Invitation Algorithm

Merging always requires forming new

group – May have same coordinator, but different group number

Coordinator who initiates merge asks

all other known coordinators to merge – They ask their group members – Original group members also asked

SLIDE 28

Lecture 12 Page 28 CS 188,Winter 2015

A Simplified Example

1 1 1 2 3 3 3 4 Node 1 checks for other coordinator

AreYouCoordinator? AreYouCoordinator?

Yes No So node 1 finds another coordinator Node 1 asks the other coordinator and his old node to join his group Invite Invite Invite on behalf of node 1 1 1 1 1 Accept Accept UP ={1,2,3,4} Ready Ready

If all members of UP{} respond, we’re fine

Node 1 forms a new group

SLIDE 29

Lecture 12 Page 29 CS 188,Winter 2015

The Reorganization State

Nodes enter the reorganization state

after getting their answer

What’s the point of this state?

– Why not just start up the group? – After all, we all know who’s going to be a member

Or do we?

SLIDE 30

Lecture 12 Page 30 CS 188,Winter 2015

Why We Need Another Round of Messages

1 2 3 4 1 1 1 1 Invitation Invitation Who does 1 think will join the group, at this point? 2 and 3 Invitation Assuming no timeouts, 4 will also join And 2 needs to know that And what if someone crashes? Presumably not accepting the invitation?

SLIDE 31

Lecture 12 Page 31 CS 188,Winter 2015

Timeouts in the Merge

Don’t worry too much about them
Some nodes respond before the timeout

– Some don’t

If you don’t catch them this time, you

might the next

SLIDE 32

Lecture 12 Page 32 CS 188,Winter 2015

Straggler Messages

This algorithm is asynchronous

– So messages may come in late

What do we do when messages arrive

late?

Mostly, reject them
How do we tell?

– Messages contain group number

SLIDE 33

Lecture 12 Page 33 CS 188,Winter 2015

Multiple Simultaneous Groups

The invitation algorithm allows

multiple simultaneous groups to exist – Each with a proper coordinator

Is this a good thing?

– No, but what are the alternatives?

No node ever belongs to more than one

group, at least

SLIDE 34

Lecture 12 Page 34 CS 188,Winter 2015

Paxos

A family of algorithms that allow a

distributed system to reach agreement

In the face of delays and failures
Can’t perfectly guarantee progress

– But makes progress in realistic conditions

Does guarantee consistency
Usually defined to reach consensus on some

value v

SLIDE 35

Lecture 12 Page 35 CS 188,Winter 2015

Paxos Assumptions

Processors are of variable speed and may

fail – Might recover after failure – But they don’t lie

Any processor can send a message to any
ther processor
Messages can be lost, arbitrarily delayed,

reordered, or duplicated – But never corrupted

SLIDE 36

Lecture 12 Page 36 CS 188,Winter 2015

Paxos Processor Roles

Client

– Issues a request, waits for a response

Acceptor/voter

– Remembers things for the protocol

Proposer (simpler if there’s only one)

– Assists client in getting a response

Learner

– Actually executes a request

Leader

– One of the proposers that leads the process

One processor can play several roles

– Usually, all processes are acceptors, proposers, and learners

SLIDE 37

Lecture 12 Page 37 CS 188,Winter 2015

Paxos Quorums

Collections of acceptors that make decisions

– Several different quorums in system

Messages are sent to quorums, not single

acceptors – Messages only effective if all quorum members receive it – Similarly, all acceptors in a quorum must send a message for to be effective

If any member of the quorum survives, its

decisions survive

SLIDE 38

Lecture 12 Page 38 CS 188,Winter 2015

Quorum Membership

All quorums must contain a majority of

all acceptors in the system

Any two quorums must share at least
ne acceptor
E.g., if there are four acceptors

{1,2,3,4}, quorums might be: – {1,2,3}, {1,2,4}, {2,3,4}, {1,3,4}

SLIDE 39

Lecture 12 Page 39 CS 188,Winter 2015

Paxos Rounds

Paxos proceeds in rounds
In response to a client request
If the round reaches agreement, the

client gets a response

If not, you start another round
Continue till a round reaches

agreement

SLIDE 40

Lecture 12 Page 40 CS 188,Winter 2015

A Simple Paxos Round

C P A1 A2 A3 L1 L2

1. request
2. prepare(N)
3. promise(N,null)
3. promise(N,null)
3. promise(N, null)
4. accept(N,Vres)
5. accepted(N,Vmax)

Vres is a result chosen by P, if no promise had a value

6. response

N is a bigger number than P has ever used or seen before If an acceptor ever promised on this item before, it returns the generation and value from that run

f Paxos, not null

SLIDE 41

Lecture 12 Page 41 CS 188,Winter 2015

The Point of Different Paxos Roles

C P A1 A2 A3 L1 L2

The client wants to get something done The proposer coordinates protocol activities The acceptors ensure proper concurrent behavior and handle proposer failures The learners ensure redundant memory of the result of a decision

Remember! One machine can play multiple roles

SLIDE 42

Lecture 12 Page 42 CS 188,Winter 2015

Paxos Error Handling

Some cases simple, some complex
A simple case:

– One of the acceptors fails – If there’s still a quorum, no problem – Go ahead without him

Another simple case:

– One of the learners failed – If any learners are left, they’ll provide the right response to the client

SLIDE 43

Lecture 12 Page 43 CS 188,Winter 2015

More Complex Error Cases

Things like failure of proposer in

middle of a round

Paxos chooses a new leader and uses

him from this point

What if old leader comes back?
Even more complex, but it works out

SLIDE 44

Lecture 12 Page 44 CS 188,Winter 2015

Paxos and Overheads

Generally quite expensive

– In messages and thus delays

Many optimizations possible

– Some don’t alter the protocol characteristics – Some trade off handling some error conditions for better performance

SLIDE 45

Lecture 12 Page 45 CS 188,Winter 2015

Byzantine Agreement

Life can be a lot worse than merely

being unable to rely on timeouts

What if one of the nodes we’re

working with is lying?

How can we reach agreement if we

can’t trust all the participants?

SLIDE 46

Lecture 12 Page 46 CS 188,Winter 2015

The Purpose of Byzantine Agreement

Well, why would one of our distributed

system components lie?

It probably wouldn’t
But it might contain a bug
If it contains the worst possible bug,

what can it do? – Essentially, inadvertently lie

SLIDE 47

Lecture 12 Page 47 CS 188,Winter 2015

The Realism of Byzantine Agreement

It isn’t realistic
It doesn’t really happen
No one really uses it
But it demonstrates a limit on how

badly things can go while still allowing agreement

SLIDE 48

Lecture 12 Page 48 CS 188,Winter 2015

Why Is It Called Byzantine?

After the fall of Rome itself, the

empire lived on in the east – Called Byzantium

Byzantium survived for around 1000

years

The Byzantines were famous for their

treachery and double-dealing

SLIDE 49

Lecture 12 Page 49 CS 188,Winter 2015

The Byzantine General Problem

Several Byzantine generals each command

their own army

They are far apart and communicate with

messengers

The emperor wants to attack the Turks
If all generals attack, they’ll win

– Even if a majority attack, they’ll win – Retreating is OK, if everyone does it

But the Turks may have bribed some

generals

SLIDE 50

Lecture 12 Page 50 CS 188,Winter 2015

The Complete Problem Statement

Messages are point-to-point
Messages are reliably delivered, with a

predictable timeout – Failure to receive message in time means sender is a traitor

Traitors can send any messages they

please – But cannot forge their identities

SLIDE 51

Lecture 12 Page 51 CS 188,Winter 2015

How Many Traitors Is Too Many?

Can all the loyal generals reach

agreement on whether to attack or retreat?

Or can the traitors prevent them from

reaching any agreement?

How many generals must the Turks

bribe before no agreement is possible?

SLIDE 52

Lecture 12 Page 52 CS 188,Winter 2015

The Answer

If the Turks bribe 1/3 of the generals,

the remaining 2/3’s cannot reach agreement

How can that be?
Why not just a majority?
Easiest to consider in the case of a

commander

SLIDE 53

Lecture 12 Page 53 CS 188,Winter 2015

The 3-General Byzantine Problem

Commander What if they’re all loyal? Attack Attack Everyone attacks and the Turk is vanquished But what if the commander is a traitor? Attack Retreat One general attacks, one retreats, the traitor pockets the bribe, and the Turks win

SLIDE 54

Lecture 12 Page 54 CS 188,Winter 2015

Can’t the Loyal Generals Check Their Orders?

Commander Attack Retreat 1 2 3 Generals 2 and 3 check their orders Retreat Attack They figure out 1 is a traitor and come to their own agreement

SLIDE 55

Lecture 12 Page 55 CS 188,Winter 2015

But What if the Commander Wasn’t the Traitor?

Commander Attack Attack 1 2 3 3 is the traitor, this time Generals 2 and 3 check their orders Retreat Attack They figure out 1 is a traitor and come to their own agreement But 1 isn’t the traitor, 3 is the traitor He convinces 2 to retreat, 1 is slaughtered attacking, and 3 pockets the bribe

SLIDE 56

Lecture 12 Page 56 CS 188,Winter 2015

Can General 2 Tell Which Scenario Is Occurring?

When 1 was the traitor, 2 saw: When 3 was the traitor, 2 saw: 1 2 3 Retreat Attack 1 2 3 Retreat Attack 2 can’t tell the difference, so he can’t decide whether to attack or retreat

SLIDE 57

Lecture 12 Page 57 CS 188,Winter 2015

What If There Were 4 Generals?

1 2 Commander 3 4 What if the commander (1) is the traitor? If he doesn’t send some messages, he’ll be seen as the traitor But what can he send? Attack Attack Retreat

SLIDE 58

Lecture 12 Page 58 CS 188,Winter 2015

Can the Three Loyal Generals Reach Agreement?

1 2 Commander 3 4 Attack Attack Retreat They can exchange all the messages and let the majority rule Since there are only two messages, the commander must have sent the same message to two nodes If the commander is loyal and someone else is lying, the majority represents the loyal commander’s will

SLIDE 59

Lecture 12 Page 59 CS 188,Winter 2015

But What if There Were Five Generals?

1 Commander 2 3 4 Attack Attack Retreat 5 Retreat Pre-arrange a tie-breaker E.g., always retreat on ties All the loyal generals then retreat And the traitor must explain his failure to the Turks

SLIDE 60

Lecture 12 Page 60 CS 188,Winter 2015

What If You Don’t Want a Commander?

What if you want everyone to vote?
And accept the majority?

– With the guarantee that all loyal nodes abide by the majority?

Serially treat each node as the

commander – Reach agreement on his vote – Then move on to the next node

SLIDE 61

Lecture 12 Page 61 CS 188,Winter 2015

The Trick Behind Byzantine Agreement

Everyone must know what everyone

else thinks about everything else

Not just what I think the commander

said, but what everyone else claims the commander said

Resulting algorithms are tricky and

expensive – But it could be (and will be) worse

SLIDE 62

Lecture 12 Page 62 CS 188,Winter 2015

Authenticated Byzantine Agreement

What if the messages are signed in an

unforgeable way?

Then dishonest generals can’t lie about

what honest general told them

In this case, honest generals reach