When Is Agreement Possible? CS 188 Distributed Systems February - - PowerPoint PPT Presentation

when is agreement possible cs 188 distributed systems
SMART_READER_LITE
LIVE PREVIEW

When Is Agreement Possible? CS 188 Distributed Systems February - - PowerPoint PPT Presentation

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015 Lecture 13 Page 1 CS 188,Winter 2015 Introduction Basics of agreement protocols Impossibility of agreement in asynchronous system with failures When is


slide-1
SLIDE 1

Lecture 13 Page 1 CS 188,Winter 2015

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

slide-2
SLIDE 2

Lecture 13 Page 2 CS 188,Winter 2015

Introduction

  • Basics of agreement protocols
  • Impossibility of agreement in

asynchronous system with failures

  • When is agreement possible?
slide-3
SLIDE 3

Lecture 13 Page 3 CS 188,Winter 2015

Basics of Agreement Protocols

  • What is agreement?
  • What are the necessary conditions for

agreement?

slide-4
SLIDE 4

Lecture 13 Page 4 CS 188,Winter 2015

What Do We Mean By Agreement?

  • In simplest case, can n processors

agree that a variable takes on value 0

  • r 1?

– Only non-faulty processors need agree

  • More complex agreements can be built

from this simple agreement

slide-5
SLIDE 5

Lecture 13 Page 5 CS 188,Winter 2015

Conditions for Agreement Protocols

  • Consistency

– All participants agree on same value and decisions are final

  • Validity

– Participants agree on a value at least

  • ne of them wanted
  • Termination

– All participants choose a value in a finite number of steps

slide-6
SLIDE 6

Lecture 13 Page 6 CS 188,Winter 2015

Impossibility of Agreement in Async System With Failures

  • Assume a reliable, but asynchronous,

message passing system – Any message may face arbitrary delays

  • Can a set of processors reach

agreement if one of the processors fails?

slide-7
SLIDE 7

Lecture 13 Page 7 CS 188,Winter 2015

Agreement Isn’t Always Possible

  • In the general case for arbitrary

systems

  • Adding some special properties to the

system may change that result

  • But without those properties, provably

impossible – A result sometimes abbreviated FLP

  • For Fischer, Lynch, and Patterson,

who proved it

slide-8
SLIDE 8

Lecture 13 Page 8 CS 188,Winter 2015

Model of the System

  • The system consists of n processors
  • The goal is for all non-faulty

processors to agree on value 0 or 1

  • Rule out the trivial case of always

agreeing on 0 (or 1)

  • Agreement depends on protocol, initial

state, and inputs to each processor

slide-9
SLIDE 9

Lecture 13 Page 9 CS 188,Winter 2015

Bivalent and Univalent States

  • A bivalent state is a system state that

could lead to either value being decided

  • A univalent state can only lead to one
  • f the values being decided

– 0-valent or 1-valent

  • Valency must take allowable failures

into account!

slide-10
SLIDE 10

Lecture 13 Page 10 CS 188,Winter 2015

System Configuration

  • Processors have internal state
  • State of network is the set of messages

sent, but not yet received

  • Event e is the receipt of message m by

a processor – Which can lead to sending one or more new messages – Events are deterministic

  • A schedule is a sequence of events
slide-11
SLIDE 11

Lecture 13 Page 11 CS 188,Winter 2015

Proving the Result

  • Let’s assume the result is false

– That we can reach agreement with

  • ne failure in these conditions
  • Use an adversarial model

– Within rules of behavior, assume adversary can force any legal event

  • Look for contradictions
slide-12
SLIDE 12

Lecture 13 Page 12 CS 188,Winter 2015

What Can the Adversary Do?

  • Force any processor to perform an

event at any moment

  • Choose any message to be delivered to

any processor when it requests a message

  • Delay any message arbitrarily long
  • Once, it can kill one processor

permanently

slide-13
SLIDE 13

Lecture 13 Page 13 CS 188,Winter 2015

The Necessity of Bivalency

  • There has to be an initial bivalent

configuration for the system

  • Why?
  • If all processors started with value 1,

the system would decide 1

  • If all processors started with value 0,

the system would decide 0

slide-14
SLIDE 14

Lecture 13 Page 14 CS 188,Winter 2015

Intermediate Initial States

  • If some processors start with value 0

and some with value 1 – Some initial states lead to result 1 – Some initial states lead to result 0 – All initial states lead to one or the

  • ther
  • So there is a 1-valent initial state that

differs from a 0-valent initial state by

  • ne processor’s initial value
slide-15
SLIDE 15

Lecture 13 Page 15 CS 188,Winter 2015

A Graphical Representation

0-valent initial states

1-valent initial states

What’s in these states?

Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0

They differ in only one value

State x State y

slide-16
SLIDE 16

Lecture 13 Page 16 CS 188,Winter 2015

Why Does This Imply Bivalence?

  • What if that one differing processor is

the processor that fails?

  • The system must still reach agreement

from the remaining states – Which are identical, now

  • But on what value?
slide-17
SLIDE 17

Lecture 13 Page 17 CS 188,Winter 2015

Is This Possible?

0-valent initial states 1-valent initial states

Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0

State x

State y Does the system decide

  • n 0?

Then State y wasn’t 1-valent, after all

Does the system decide

  • n 1?

Then State x wasn’t 0-valent, after all

Looks like x and y must be bivalent

slide-18
SLIDE 18

Lecture 13 Page 18 CS 188,Winter 2015

So What?

  • So there has to be at least one bivalent

initial state

  • Why’s that so bad?
  • If the system never leaves a bivalent

state, it never makes a decision

  • We must show our adversary can’t

perpetually force bivalency

slide-19
SLIDE 19

Lecture 13 Page 19 CS 188,Winter 2015

The Persistence of Bivalency

  • Let’s assume bivalency doesn’t persist
  • At some point, some bivalent state

must transition to a univalent state – Implying at least two events

  • One to go to 0-valent
  • One to go to 1-valent
  • With no events leading to bivalent

states

slide-20
SLIDE 20

Lecture 13 Page 20 CS 188,Winter 2015

A Graphical Representation

C e D e’ D’

Remember, these events are each delivery of a message So m and m’ must have been in the message delivery system state simultaneously

slide-21
SLIDE 21

Lecture 13 Page 21 CS 188,Winter 2015

Looking Closely at Events e and e’

  • What would happen if we executed e

first, then e’?

  • What would happen if we executed

them in the opposite order?

  • Well, why should I care?
  • Would executing them in either order

lead to the same state?

  • If so, there’s a contradiction
slide-22
SLIDE 22

Lecture 13 Page 22 CS 188,Winter 2015

Order of Events e and e’

C e D e’ D’ e’ e

slide-23
SLIDE 23

Lecture 13 Page 23 CS 188,Winter 2015

Why Should They Lead to the Same State?

  • What if e and e’ occur on different

processors?

  • Then they’re independent events
  • So they should produce the same result

if executed in either order

  • So e and e’ could not have occurred on

different processors

slide-24
SLIDE 24

Lecture 13 Page 24 CS 188,Winter 2015

Could the Events Occur on the Same Processor P?

  • If e was first, the state became 0-valent
  • If e’ was first, the state became 1-

valent

  • But what if P then fails?
  • Since the event happened only at P,
  • nly P sees the effects
  • So we’re still in a bivalent state
slide-25
SLIDE 25

Lecture 13 Page 25 CS 188,Winter 2015

Recapitulating the Argument

  • It’s possible to start in a bivalent state
  • There must be some point at some

processor P at which the bivalent state changes to univalent

  • If P fails before anyone knows the

valency, the system becomes bivalent – And can never settle to univalency

  • Perpetual bivalency implies no

agreement

slide-26
SLIDE 26

Lecture 13 Page 26 CS 188,Winter 2015

When Is Agreement Possible?

  • Didn’t we show in the last class that we

can reach agreement if less than 1/3 of

  • ur processors are faulty?
  • Yes, but only if the message passing

system is synchronous

  • Whether agreement is possible in a

system depends on certain parameters

slide-27
SLIDE 27

Lecture 13 Page 27 CS 188,Winter 2015

Parameters for Agreement In Distributed Systems

  • Synchronous vs. asynchronous

processors

  • Bounded vs. unbounded

communications delay

  • Ordered vs. unordered messages
  • Point-to-point vs. broadcast

communications

slide-28
SLIDE 28

Lecture 13 Page 28 CS 188,Winter 2015

Synchronous vs. Asynchronous Processors

  • Synchronous processors imply that all

processors make progress predictably

  • More precisely, there is a constant s

such that – for every s+1 steps taken by Pi – all Pj will take at least one step

slide-29
SLIDE 29

Lecture 13 Page 29 CS 188,Winter 2015

Bounded vs. Unbounded Communications Delay

  • Delay is bounded if and only if all

messages arrive at their destination within t steps – Implies no lost messages

  • Doesn’t imply messages arrive in the
  • rder sent
slide-30
SLIDE 30

Lecture 13 Page 30 CS 188,Winter 2015

Ordered vs. Unordered Messages

  • Messages are ordered if they are

received in the same real time order as their sending – Using true real time

  • In some cases, merely receiving all

messages in same order at all processors is enough

slide-31
SLIDE 31

Lecture 13 Page 31 CS 188,Winter 2015

Point-to-Point vs. Broadcast Communications

  • Point-to-point communications means

a given message sent by Pi is seen only by its destination Pj

  • Broadcast communications mean that

Pi can send a message to all other processors in a single atomic step

  • Most typically by hardware broadcast
slide-32
SLIDE 32

Lecture 13 Page 32 CS 188,Winter 2015

So, When Can We Reach Agreement?

  • Case 1: Processors are synchronous

and communications is bounded

  • Case 2: Messages are ordered and the

transmission medium is broadcast

  • Case 3: Processors are synchronous

and messages are ordered

  • And that’s it

– (Case 1 covers Byzantine agreement)

slide-33
SLIDE 33

Lecture 13 Page 33 CS 188,Winter 2015

What Does This Result Mean?

  • For practical systems we really build
  • Not that we can never reach agreement

– Good systems almost always do

  • But that we generally can’t guarantee it
  • Which implies that our systems should

tolerate disagreements – At some times – Under some conditions

slide-34
SLIDE 34

Lecture 13 Page 34 CS 188,Winter 2015

When Is Disagreement OK?

  • For preference, when it doesn’t matter

– E.g., when reasonable results possible even without agreement

  • Or when it eventually works itself out

– With possible inconsistencies in the meantime

  • Or, at worst, when it is visible to

people who can fix it

slide-35
SLIDE 35

Lecture 13 Page 35 CS 188,Winter 2015

When Is Disagreement Not OK?

  • When the consequences of

disagreement are dire

  • When it results in unfixable problems
  • When its consequences are invisible,

but relevant

  • Unfortunately, we don’t always get to

choose when we can avoid it

slide-36
SLIDE 36

Lecture 13 Page 36 CS 188,Winter 2015

Minimizing Chances of Disagreement

  • Understand when agreement is most

critical

  • In those cases, use protocols that are

less likely to fail on agreement – Which usually have heavy expenses – So don’t always use them

slide-37
SLIDE 37

Lecture 13 Page 37 CS 188,Winter 2015

A Classification of Faults

  • More detailed than previously

discussed

  • Produced by fault-tolerant computing

community

  • Divides faults into classes

– Stronger class is subset of weaker class

slide-38
SLIDE 38

Lecture 13 Page 38 CS 188,Winter 2015

An Ordered Fault Classification

Byzantine Authenticated Byzantine Incorrect Computation Timing Omission Crash Fail Stop

slide-39
SLIDE 39

Lecture 13 Page 39 CS 188,Winter 2015

Fail Stop Faults

  • A processor ceases operation
  • But informs other processors in

computation that it has stopped

  • Relatively easy to deal with
slide-40
SLIDE 40

Lecture 13 Page 40 CS 188,Winter 2015

Crash Fault

  • A processor crashes or loses internal

state and halts

  • Without notification to anyone else
  • Hard to distinguish from a really slow

processor

slide-41
SLIDE 41

Lecture 13 Page 41 CS 188,Winter 2015

Omission Faults

  • A processor fails to do something in

time – Like respond to a message

  • But otherwise it may still be operating

correctly – Or it may have crashed

slide-42
SLIDE 42

Lecture 13 Page 42 CS 188,Winter 2015

Timing Fault

  • A processor completes a task before or

after the window when it should – Or never

  • A late acknowledgement to a message,

e.g.

slide-43
SLIDE 43

Lecture 13 Page 43 CS 188,Winter 2015

Incorrect Computation Fault

  • A processor fails to produce the correct

results for a given set of input

  • Which could be merely not producing

the results soon enough

  • Or could be sending back trash
slide-44
SLIDE 44

Lecture 13 Page 44 CS 188,Winter 2015

Authenticated Byzantine Fault

  • Processor performs an arbitrary or

malicious fault

  • But authentication mechanisms note

any alterations made to others’ messages

slide-45
SLIDE 45

Lecture 13 Page 45 CS 188,Winter 2015

Byzantine Fault

  • Any and every fault
  • Having arbitrarily bad consequences
  • Possibly working in combination with
  • ther faults to produce really bad

results

  • In this classification, all other faults are

subclasses of Byzantine faults