Lecture 13 Page 1 CS 188,Winter 2015
When Is Agreement Possible? CS 188 Distributed Systems February - - PowerPoint PPT Presentation
When Is Agreement Possible? CS 188 Distributed Systems February - - PowerPoint PPT Presentation
When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015 Lecture 13 Page 1 CS 188,Winter 2015 Introduction Basics of agreement protocols Impossibility of agreement in asynchronous system with failures When is
Lecture 13 Page 2 CS 188,Winter 2015
Introduction
- Basics of agreement protocols
- Impossibility of agreement in
asynchronous system with failures
- When is agreement possible?
Lecture 13 Page 3 CS 188,Winter 2015
Basics of Agreement Protocols
- What is agreement?
- What are the necessary conditions for
agreement?
Lecture 13 Page 4 CS 188,Winter 2015
What Do We Mean By Agreement?
- In simplest case, can n processors
agree that a variable takes on value 0
- r 1?
– Only non-faulty processors need agree
- More complex agreements can be built
from this simple agreement
Lecture 13 Page 5 CS 188,Winter 2015
Conditions for Agreement Protocols
- Consistency
– All participants agree on same value and decisions are final
- Validity
– Participants agree on a value at least
- ne of them wanted
- Termination
– All participants choose a value in a finite number of steps
Lecture 13 Page 6 CS 188,Winter 2015
Impossibility of Agreement in Async System With Failures
- Assume a reliable, but asynchronous,
message passing system – Any message may face arbitrary delays
- Can a set of processors reach
agreement if one of the processors fails?
Lecture 13 Page 7 CS 188,Winter 2015
Agreement Isn’t Always Possible
- In the general case for arbitrary
systems
- Adding some special properties to the
system may change that result
- But without those properties, provably
impossible – A result sometimes abbreviated FLP
- For Fischer, Lynch, and Patterson,
who proved it
Lecture 13 Page 8 CS 188,Winter 2015
Model of the System
- The system consists of n processors
- The goal is for all non-faulty
processors to agree on value 0 or 1
- Rule out the trivial case of always
agreeing on 0 (or 1)
- Agreement depends on protocol, initial
state, and inputs to each processor
Lecture 13 Page 9 CS 188,Winter 2015
Bivalent and Univalent States
- A bivalent state is a system state that
could lead to either value being decided
- A univalent state can only lead to one
- f the values being decided
– 0-valent or 1-valent
- Valency must take allowable failures
into account!
Lecture 13 Page 10 CS 188,Winter 2015
System Configuration
- Processors have internal state
- State of network is the set of messages
sent, but not yet received
- Event e is the receipt of message m by
a processor – Which can lead to sending one or more new messages – Events are deterministic
- A schedule is a sequence of events
Lecture 13 Page 11 CS 188,Winter 2015
Proving the Result
- Let’s assume the result is false
– That we can reach agreement with
- ne failure in these conditions
- Use an adversarial model
– Within rules of behavior, assume adversary can force any legal event
- Look for contradictions
Lecture 13 Page 12 CS 188,Winter 2015
What Can the Adversary Do?
- Force any processor to perform an
event at any moment
- Choose any message to be delivered to
any processor when it requests a message
- Delay any message arbitrarily long
- Once, it can kill one processor
permanently
Lecture 13 Page 13 CS 188,Winter 2015
The Necessity of Bivalency
- There has to be an initial bivalent
configuration for the system
- Why?
- If all processors started with value 1,
the system would decide 1
- If all processors started with value 0,
the system would decide 0
Lecture 13 Page 14 CS 188,Winter 2015
Intermediate Initial States
- If some processors start with value 0
and some with value 1 – Some initial states lead to result 1 – Some initial states lead to result 0 – All initial states lead to one or the
- ther
- So there is a 1-valent initial state that
differs from a 0-valent initial state by
- ne processor’s initial value
Lecture 13 Page 15 CS 188,Winter 2015
A Graphical Representation
0-valent initial states
1-valent initial states
What’s in these states?
Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0
They differ in only one value
State x State y
Lecture 13 Page 16 CS 188,Winter 2015
Why Does This Imply Bivalence?
- What if that one differing processor is
the processor that fails?
- The system must still reach agreement
from the remaining states – Which are identical, now
- But on what value?
Lecture 13 Page 17 CS 188,Winter 2015
Is This Possible?
0-valent initial states 1-valent initial states
Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0
State x
State y Does the system decide
- n 0?
Then State y wasn’t 1-valent, after all
Does the system decide
- n 1?
Then State x wasn’t 0-valent, after all
Looks like x and y must be bivalent
Lecture 13 Page 18 CS 188,Winter 2015
So What?
- So there has to be at least one bivalent
initial state
- Why’s that so bad?
- If the system never leaves a bivalent
state, it never makes a decision
- We must show our adversary can’t
perpetually force bivalency
Lecture 13 Page 19 CS 188,Winter 2015
The Persistence of Bivalency
- Let’s assume bivalency doesn’t persist
- At some point, some bivalent state
must transition to a univalent state – Implying at least two events
- One to go to 0-valent
- One to go to 1-valent
- With no events leading to bivalent
states
Lecture 13 Page 20 CS 188,Winter 2015
A Graphical Representation
C e D e’ D’
Remember, these events are each delivery of a message So m and m’ must have been in the message delivery system state simultaneously
Lecture 13 Page 21 CS 188,Winter 2015
Looking Closely at Events e and e’
- What would happen if we executed e
first, then e’?
- What would happen if we executed
them in the opposite order?
- Well, why should I care?
- Would executing them in either order
lead to the same state?
- If so, there’s a contradiction
Lecture 13 Page 22 CS 188,Winter 2015
Order of Events e and e’
C e D e’ D’ e’ e
Lecture 13 Page 23 CS 188,Winter 2015
Why Should They Lead to the Same State?
- What if e and e’ occur on different
processors?
- Then they’re independent events
- So they should produce the same result
if executed in either order
- So e and e’ could not have occurred on
different processors
Lecture 13 Page 24 CS 188,Winter 2015
Could the Events Occur on the Same Processor P?
- If e was first, the state became 0-valent
- If e’ was first, the state became 1-
valent
- But what if P then fails?
- Since the event happened only at P,
- nly P sees the effects
- So we’re still in a bivalent state
Lecture 13 Page 25 CS 188,Winter 2015
Recapitulating the Argument
- It’s possible to start in a bivalent state
- There must be some point at some
processor P at which the bivalent state changes to univalent
- If P fails before anyone knows the
valency, the system becomes bivalent – And can never settle to univalency
- Perpetual bivalency implies no
agreement
Lecture 13 Page 26 CS 188,Winter 2015
When Is Agreement Possible?
- Didn’t we show in the last class that we
can reach agreement if less than 1/3 of
- ur processors are faulty?
- Yes, but only if the message passing
system is synchronous
- Whether agreement is possible in a
system depends on certain parameters
Lecture 13 Page 27 CS 188,Winter 2015
Parameters for Agreement In Distributed Systems
- Synchronous vs. asynchronous
processors
- Bounded vs. unbounded
communications delay
- Ordered vs. unordered messages
- Point-to-point vs. broadcast
communications
Lecture 13 Page 28 CS 188,Winter 2015
Synchronous vs. Asynchronous Processors
- Synchronous processors imply that all
processors make progress predictably
- More precisely, there is a constant s
such that – for every s+1 steps taken by Pi – all Pj will take at least one step
Lecture 13 Page 29 CS 188,Winter 2015
Bounded vs. Unbounded Communications Delay
- Delay is bounded if and only if all
messages arrive at their destination within t steps – Implies no lost messages
- Doesn’t imply messages arrive in the
- rder sent
Lecture 13 Page 30 CS 188,Winter 2015
Ordered vs. Unordered Messages
- Messages are ordered if they are
received in the same real time order as their sending – Using true real time
- In some cases, merely receiving all
messages in same order at all processors is enough
Lecture 13 Page 31 CS 188,Winter 2015
Point-to-Point vs. Broadcast Communications
- Point-to-point communications means
a given message sent by Pi is seen only by its destination Pj
- Broadcast communications mean that
Pi can send a message to all other processors in a single atomic step
- Most typically by hardware broadcast
Lecture 13 Page 32 CS 188,Winter 2015
So, When Can We Reach Agreement?
- Case 1: Processors are synchronous
and communications is bounded
- Case 2: Messages are ordered and the
transmission medium is broadcast
- Case 3: Processors are synchronous
and messages are ordered
- And that’s it
– (Case 1 covers Byzantine agreement)
Lecture 13 Page 33 CS 188,Winter 2015
What Does This Result Mean?
- For practical systems we really build
- Not that we can never reach agreement
– Good systems almost always do
- But that we generally can’t guarantee it
- Which implies that our systems should
tolerate disagreements – At some times – Under some conditions
Lecture 13 Page 34 CS 188,Winter 2015
When Is Disagreement OK?
- For preference, when it doesn’t matter
– E.g., when reasonable results possible even without agreement
- Or when it eventually works itself out
– With possible inconsistencies in the meantime
- Or, at worst, when it is visible to
people who can fix it
Lecture 13 Page 35 CS 188,Winter 2015
When Is Disagreement Not OK?
- When the consequences of
disagreement are dire
- When it results in unfixable problems
- When its consequences are invisible,
but relevant
- Unfortunately, we don’t always get to
choose when we can avoid it
Lecture 13 Page 36 CS 188,Winter 2015
Minimizing Chances of Disagreement
- Understand when agreement is most
critical
- In those cases, use protocols that are
less likely to fail on agreement – Which usually have heavy expenses – So don’t always use them
Lecture 13 Page 37 CS 188,Winter 2015
A Classification of Faults
- More detailed than previously
discussed
- Produced by fault-tolerant computing
community
- Divides faults into classes
– Stronger class is subset of weaker class
Lecture 13 Page 38 CS 188,Winter 2015
An Ordered Fault Classification
Byzantine Authenticated Byzantine Incorrect Computation Timing Omission Crash Fail Stop
Lecture 13 Page 39 CS 188,Winter 2015
Fail Stop Faults
- A processor ceases operation
- But informs other processors in
computation that it has stopped
- Relatively easy to deal with
Lecture 13 Page 40 CS 188,Winter 2015
Crash Fault
- A processor crashes or loses internal
state and halts
- Without notification to anyone else
- Hard to distinguish from a really slow
processor
Lecture 13 Page 41 CS 188,Winter 2015
Omission Faults
- A processor fails to do something in
time – Like respond to a message
- But otherwise it may still be operating
correctly – Or it may have crashed
Lecture 13 Page 42 CS 188,Winter 2015
Timing Fault
- A processor completes a task before or
after the window when it should – Or never
- A late acknowledgement to a message,
e.g.
Lecture 13 Page 43 CS 188,Winter 2015
Incorrect Computation Fault
- A processor fails to produce the correct
results for a given set of input
- Which could be merely not producing
the results soon enough
- Or could be sending back trash
Lecture 13 Page 44 CS 188,Winter 2015
Authenticated Byzantine Fault
- Processor performs an arbitrary or
malicious fault
- But authentication mechanisms note
any alterations made to others’ messages
Lecture 13 Page 45 CS 188,Winter 2015
Byzantine Fault
- Any and every fault
- Having arbitrarily bad consequences
- Possibly working in combination with
- ther faults to produce really bad
results
- In this classification, all other faults are