1
Term 2 2020 Complete your myExperience and shape the future of - - PowerPoint PPT Presentation
Term 2 2020 Complete your myExperience and shape the future of - - PowerPoint PPT Presentation
Fault Tolerance and Inconsistent Information in Distributed Systems Dr Vladimir Z. Tosic 1 Term 2 2020 Complete your myExperience and shape the future of education at UNSW. Click the link in Moodle or login to
Complete your myExperience and shape the future of education at UNSW.
Click the link in Moodle
- r login to myExperience.unsw.edu.au
(use z1234567@ad.unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published
MAIN TOPICS IN THE LAST LECTURE… (NOT IN THE BEN-ARI TEXTBOOK! )
3
- Ricart-Agrawala algorithm demo in DA
in DAJ
- Revision of message-passing using CSP channels
- The actor
- r model for message-passing concurrency
- Brief overview of some other dist
istrib ribute uted d messag age- passing ing and dist istrib ribute uted d shared d memory
- ry paradigm
igms
- Notes on some other concepts for programming
asynchron chronou
- us
s dist istrib ribute uted d syste stems ms
MAIN TOPICS IN THIS LECTURE… (BEN-ARI TEXTBOOK CHAPTER 12 )
4
- Fault
lt toler leran ance ce and inc inconsisten istent t inf inform rmation ation in distributed systems – the problem of consen ensus sus
- By
Byzant ntine General rals algorithm hm explanation
- Byzantine Generals algorithm examples and demo in
in DA DAJ
- King
ing algo lgorit ithm hm explanation and examples
CONSENSUS – INTRODUCTION
5
From Chapter 12 in Ben-Ari’s Textbook book and additional materials
FAIL-SAFE AND FAULT-TOLERANT DISTRIBUTED SYSTEM – DEFINITIONS
6
- Re
Reli liabil ilit ity has several meanings
- We focus on 2 aspects:
1.
- 1. Fail-safe
afe – 1 or more failures do not damage system
- r users
2.
- 2. Fault
lt-toler tolerant nt – system continues to fulfill its requirements even when 1 or more failures happen
- E.g. Ricart-Agrawala algorithm for distributed mutual
exclusion is NOT fault-tolerant because it will deadlock when 1 node fails
CRASH FAILURES VS. BYZANTINE FAILURES – DEFINITIONS
7
- Cr
Crash h fail ilure – failed node(s) stop sending messages
- Assu
sume e we know a node crashed (e.g. timeout occurred)
- By
Byzant ntine failure – failed/malfunctioning node(s) can send arbitrary messages, possibly even according to a malicious plan
- We mu
must accou count t for the worst t poss ssibl ible e scenar enario, i.e. the biggest negative impact of messages by this failed node
- Name comes from Byzantine Empire (Eastern Roman
Empire, 395–1453) that had many civil wars and treasons
EXAMPLE ARCHITECTURE OF A RELIABLE DS USING REPLICATION
8
- Many complications are possible
- E.g. different sensors give somewhat different data
- E.g. all CPUs run software with the same bug
- No
No absolut lute reli liabil ilit ity in DS, always some limits
REPLICATION, PARTITIONING, REDUNDANCY – DEFINITIONS
9
- Re
Repli lication ion – multiple nodes doing the same work
- Apart from replication, there are other ways to achieve
reliability
- Notably: part
rtiti itioning ing (each node does independent subset of processing) with redundancy ndancy (additional information that enables discovering and fixing some failures)
- E.g. parity/CRC in RAID
- Many uses of these methods, e.g. in cloud computing
BROADER CONTEXT – AUTONOMIC / SELF-MANAGING SYSTEMS
10
- Automating work of network/system administrators
- Self
lf-ma mana nageme gement nt: self-healing, self-adaptation, …
- Au
Autonomic
- nomic comput
uting – an IBM self-management initiative
- Analogy with human autonomous nervous system
- Use of various artifi
ificial cial int intell llige igence (AI) I) techniques to make decisions using inc incomple lete te or inc inconsiste istent nt inf inform rmation ation
- Not only technical, but also busine
iness ss information (e.g. costs and benefits of various options)
CONSENSUS – PROBLEM DESCRIPTION
11
- Each node choses init
initial ial value lue
- E.g. result of measurement or computation
- It is required that all nodes in the system decide to use the
same value lue – 1 of the initial choices of these nodes
- If no faults: each node sends its choice to every other
node and then a decision is made using some algorithm (e.g. majority voting) to obtain consensu nsus s value lue
- All nodes have the same data and run the same decision
algorithm, so they all decide upon the same value
- If
f there ere are faults lts: … [to be discussed in this lesson!]
(CONSENSUS EXAMPLE) COMMITMENT – PROBLEM DESCRIPTION (1/2)
12
- n agents collaborate on a database
base tran ansact saction ion
- Each of the agents has done its share of the transaction
- They want to come to an agreement on whether to commit
it the transa nsaction ction results for later use by other transactions
- Each agent formed an init
initial ial vote but has not yet made the final decision
- All that remains to be done is to ensure
re that t no two agents ts make dif iffe ferent rent decisio isions
13
- All agents that reach a decision reach the same
e one
- If there are no fail
ilures s and all ll agents s vote ted d to commit mit, then the decision reached is to commit mit
- If an agent decides to commit, this means that all agents
voted to commit
- Failure model: Only agents can fail, and if they do then
they crash sh
(CONSENSUS EXAMPLE) COMMITMENT – PROBLEM DESCRIPTION (2/2)
14
- A dist
isting inguishe ished d agent, e.g. #1, collects the other agents' votes
- If
f all ll vote tes (incl. #1’s) are “commit” then #1 tells all other agents to commit mit
- Otherwise (if any agent voted “abort” or any agent did not send
it its vote e e.g. . it it crash shed ed), #1 tells all other agents to abort rt
- “All or nothing”
- 2-Phase Commit solves the commitment problem but may fail
to terminate if processes fail
(CONSENSUS EXAMPLE) COMMITMENT SOLUTION – 2-PHASE COMMIT
15
- Theorem
rem: It is impossible for a set of processes in an asynchronous distributed system to agree on a binary value, even if only a single process is subject to an unannounced crash
- Proof
- f by contr
trad adicti iction (sketch): Assume correct decisions made by algorithm; its result depends on some process – but if this process crashes then the other processes must choose arbitrarily and sometimes will make wrong decisions; contradiction with the assumption ∎
- Co
Conclus lusion ion: Some synchrony is needed to reach consensus in presence of faults; it also helps tolerate some Byzantine failures
CONSENSUS – THE NEED FOR SYNCHRONY
THE BYZANTINE GENERALS ALGORITHM
16
From Chapter 12 in Ben-Ari’s Text xtbo book
- k
(CONSENSUS EXAMPLE) BYZANTINE GENERALS - PROBLEM DESCRIPT. (1/2)
17
- Several
ral Byzantine ntine genera rals (each with own army) decide whether er to attack tack some enemy or to retr treat at (to avoid defeat)
- To win,
win, they y must st AL ALL attack tack togeth ether er; if they do not attack all together, they will be defeated
- There are reli
liable le messen senge gers rs delivering messages between the generals
- Some of the generals might be trai
aitor tors working towards defeat
- Devise algorithm so all
ll loy loyal l generals als come to consen ensus sus plan lan based on majority ty vote e of initial choices and if tied choose se retr trea eat
18
- Analogy with real-life distributed systems:
- Genera
ral – potentially failed/malfunctioning node
- Trait
itor
- r – failed/malfunctioning node
- Messen
enge ger – reliable communications channel
- BG algorithm executing concurr
urrentl ently with underlying computation
- Messages of BG algorithm disjo
isjoint int from computation messages
- Messages of BG algorithm are synch
chrono ronous us: request with reply
- In send/receive statements message types are omitted
(CONSENSUS EXAMPLE) BYZANTINE GENERALS - PROBLEM DESCRIPT. (2/2)
BYZANTINE GENERALS ALG. 1-ROUND VERSION - PSEUDOCODE
19
- Note: planType = {A; R} for attack and retreat
BYZANTINE GENERALS ALG. 1-ROUND VERSION – ERROR SCENARIO
20
- Zoe (attack) and Leo (retreat) are loyal, Basil (attack) is traitor
- Basil sends A to Leo, but then crashes before sending to Zoe
- Leo chooses A, Zoe chooses R – no consensus
nsus by loyal ge general als
21
- 1-Round Algorithm cannot
- t toler
lerat ate 1 c crash sh fail ilure among 3 ge genera rals
- Because not using the fact that certain generals are loyal
- This scenario can be extended
ended to an arbitrary number of generals
- Even if just few generals crash, they can cause no consensus
if vote is very close in 1-Round Algorithm
- Idea: Relay received messages in a further round
BYZANTINE GENERALS ALG. 1-ROUND VERSION – ERROR DISCUSSION
22
- Fir
irst t round: Each general sends own plan to all other generals and receives plans from them
- After it, array
y pla lan holds lds pla lans s of all ll generals rals
- Subsequ
quent ent round(s) d(s): Each general sends all other generals what it received from other generals about their plans and receives such reports from the other generals
- Loyal generals relay always what they have received
- Matrix
ix cell ll reported rtedPlan Plan[G [G,G ,G’] store
- res
s pla lan that t G reporte
- rted
d receiv eiving ng from G’
BYZANTINE GENERALS ALGORITHM – MAIN IDEAS AND DATA STRUCTURES
23
BYZANTINE GENERALS ALG. 2-ROUND VERSION – PSEUDOCODE (1/2)
// not sending ing back to G what t G reported rted // send to G’ what G reported to you
24
- 1st voting
ting stage ge: For each other general G, determine majority vote of: the plan received directly from G and what all other generals reported about the plan by G
- This “real intention” of G is stored into ma
majo jority ityPl Plan an[G] [G]
- majorityPlan[myID] stores plan by this node
- 2nd
2nd voting ing stage ge: Final decision is another majority vote of values in majorityPl tyPlan
BYZANTINE GENERALS ALG. 2-ROUND VERSION – PSEUDOCODE (2/2)
25
- For every additional traitor, additional round of messages
- For t tra
raitors, tors, at least 3t+1 general rals needed in tota
- tal
- Total
al number er of messa sage ges sent by n generals : 𝑜[ 𝑜 − 1 +
𝑙=1 𝑢
(𝑜 − 𝑙)(𝑜 − 𝑙 − 1)] = 𝓟(𝒐𝟓)
BYZANTINE GENERALS ALGORITHM – COMPLEXITY
- Byzantine Generals algorithm
quick ickly ly becomes mes im impractica ctical l as the number er of traitors itors inc increases ses
BYZANTINE GENERALS ALGORITHM EXAMPLES AND CORRECTNESS
26
From Chapter 12 in Ben-Ari’s Text xtbo book
- k
27
1ST CRASH FAILURE SCENARIO – DATA STRUCTURES OF LOYAL GENERALS
- Basil (traitor) sends A to Leo, but then crashes before sending to Zoe
- 2nd column in 1st round, 3rd-4th col. in 2nd round, 5th col. in voting
- Leo sends to Zoe Basil’ A, so majority vote is the same – consensu
nsus
28
2ND CRASH FAILURE SCENARIO – DATA STRUCTURES OF LOYAL GENERALS
- Basil (traitor) sends all 1st round messages and 1 message (Zoe’s
plan) to Leo in 2nd round; crashes before sending (Leo’s plan) to Zoe
- Only 1 missing message, so majority vote is the same – consen
ensus sus
29
- Kn
Knowled wledge tree ee (called “message tree” in DAJ) stores what is known about the general in this tree’s root
- Vir
irtua ual l glob lobal l data a stru ruct cture ure (not stored locally in any node)
- btained by integrating local data structures
- Us
Used to prove ve correctness rectness of the Byzantine Generals algorithm by showing that in any scenario loyal generals come to the same conclusion about traitor’s plan
- Also used to prove that with 3 generals including 1 traitor with
Byzantine failures consensus is im impossible sible in a in any algo lgorit ithm hm
- Optional task for you: Read this proof (textbook pages 280-281)
BYZANTINE GENERALS ALGORITHM – (VIRTUAL) KNOWLEDGE TREES
30
- All generals, including Basil, are loyal in this example
- Zoe and Leo have same information – consensu
nsus
- When Basil’s choice is R (instead of A) – analogous tree
BYZANTINE GENERALS ALGORITHM – KNOWLEDGE TREE EXAMPLE 1
- What Basil chose
- 1st round of messages – what
Zoe and Leo know about Basil
- 2nd round of messages
31
- Basil chose X ∈{A, R} and crashe
shes right before sending 1st round message to Zoe but after sending it to Leo
- Leo reports Basil’s choice to Zoe, so Zoe and Leo have
same information – consen ensus sus
BYZANTINE GENERALS ALGORITHM – KNOWLEDGE TREE EXAMPLE 2
32
- Basil chose X ∈{A, R} and crashe
shes s before
- re sending
ing any messa sage ge to Zoe or Leo
- Zoe and Leo do not know Basil’s choice, but correctly
send their own plans to each other and decide using that
- Zoe and Leo have same information – consensu
nsus
BYZANTINE GENERALS ALGORITHM – KNOWLEDGE TREE EXAMPLE 3
33
- Basil crashe
shes before sending 2nd round message to Zoe
- Zoe and Leo have same information (Leo knows his plan
X and Zoe got X from Leo) – there is no impact on Zoe’s decision making about Leo (nor vice versa), so consensu nsus
BYZANTINE GENERALS ALGORITHM – KNOWLEDGE TREE EXAMPLE 4
- What Leo (!) chose, X ∈{A, R}
- 1st round of messages – what
Zoe and Basil know about Leo
- 2nd round of messages
34
1ST BYZANTINE FAILURE SCENARIO – 1ST ROUND MESSAGES
- The advantage of BG algorithm is in handli
ling Byzant ntine ine fail ilures
- Traitor
itor sends A or R R regard rdless less of it its plan lan (so its plan not show)
- Example (above): traitor Basil sends different messages to Zoe and
Leo in 1st round
- 1-round algorithm results in inconsistent decisions
35
2ND BYZANTINE FAILURE SCENARIO – DATA STRUCTURES OF LOYAL GENERALS
- Basil (traitor): sends A in 1st round to both Leo and Zoe; in 2nd round
correctly reports to Zoe Leo’s plan R, but erroneously reports to Leo that Zoe’s plan is R; so Leo incorrectly decides R – inc inconsistency istency
36
2ND BYZANTINE FAILURE SCENARIO – KNOWLEDGE TREE ABOUT ZOE
- Leo has 2 conflicting reports about Zoe’s plan, so decides incorrectly
- Conclusion: The Byzantine Generals algorithm is inc
incorre rect ct for r 3 genera rals ls inc includ luding ing 1 1 traitor itor => additional loyal generals are needed
- Zoe chose A
- Zoe sent A to Leo and Basil
- Leo reports correctly, but traitor
Basil reports to Leo falsely (R)
37
3RD BYZANTINE FAILURE SCENARIO – DATA IN A LOYAL GENERAL (1/2)
- 3 l
loyal general als: Basil (chose A), John (A), Leo (R); 1 t tra raitor tor: Zoe
- Table shows messa
sage ges s only ly fro rom m loy loyal l general als, from traitor: ?
- Traitor’s messages cannot influence reaching consensus, but can
someti etime mes s influence e the final decision (see next slide)
38
3RD BYZANTINE FAILURE SCENARIO – DATA IN A LOYAL GENERAL (2/2)
- Traitor Zoe sends in 1st round R to Basil and Leo, but A to John; in
2nd round, these are relay layed d correct rectly ly (R, A, R) by loy loyal l genera rals ls so they make majority vote about Zoe’s plan; consen ensus sus decision is R
- If Zoe sends A (not R) to Basil, final decision wo
would be differe erent nt - A
39
3RD BYZANTINE FAILURE SCENARIO – KNOWLEDGE TREE ABOUT LOYAL LEO
- Both Basil (dashed line) and John (thick line) get 2 correct reports
that Leo chose X, so what Zoe sends cannot
- t inf
influe luence their ir votes
- With Byzantine failures knowledge trees can contain possibly
untrue knowledge known as beliefs
40
3RD BYZANTINE FAILURE SCENARIO – KNOWLEDGE TREE ABOUT TRAITOR ZOE
- What Zoe sends is reported accurately by loyal generals in 2nd round,
so they all come to the same e conclus clusion ion about Zoe’s plan
- Zoe can influence final vote, but cannot
- t cause
e defeat eat (no consensus) ensus)
DAJ DEMO OF THE BYZANTINE GENERALS ALGORITHM
41
From Chapter 12 and Appendix D.3 in Ben-Ari’s Textboo xtbook and DAJ docu cumentat entation ion
BYZANTINE GENERALS ALGORITHM DEMO IN BEN-ARI’S DAJ TOOL
42
Experiment with DAJ! See: https://github.com/motib/daj (URL), Textbook Appendix D.3
THE FLOODING ALGORITHM
43
From Chapter 12 in Ben-Ari’s Text xtbo book
- k
44
THE FLOODING ALGORITHM – PSEUDOCODE
// guarantee rantees s that all ll lo loyal l general rals s get reports
- rts from
m all ll other r lo loly lyals ls
- Very simple algorithm for consensus
ensus wh when only ly crash sh fail ilures s
- ccur – send reports in t+1 rounds when there are t traitors
45
- Total
al number er of message sages sent by n generals : (𝑢 + 1)𝑜 𝑜 − 1 = 𝓟(𝒐𝟒)
- Better than 𝒫(𝑜4) for the Byzantine Generals algorithm
- E.g. for 4 generals including 1 traitor the flooding algorithm
requires 24 messages, while BG algorithm requires 36
- However, the flooding algorithm does not handle
le By Byzant ntine failures!
- Task for you: Read flooding algorithm examples and
proofs from the textbook (pages 271-273)
THE FLOODING ALGORITHM – COMPLEXITY
THE KING ALGORITHM
46
From Chapter 12 in Ben-Ari’s Text xtbo book
- k
47
- The King
King algorithm hm uses significantl ntly fewe wer r messages sages than Byzantine Generals (BG) algorithm
- But, at the cost of an addit
ition ional l genera ral l per traitor itor, i.e. it requir ires es 4t+1 genera rals in total
- 1 of
- f the genera
rals ls is c is curr rrent nt king ing wh whose vote te is m is more e im important tant
- Assume: each node knows whether it is current king, but need not
know identity of the king if it is another node
- 2 ex
execution cutions in which king ing must st change according to some algorithm (e.g. myID order) – note dif iffe ferent rent term rmino inology logy from Ben-Ari’s textbook
- Each execution has 2 rounds similar to BG algorithm
THE KING ALGORITHM – MAIN IDEAS (1/2)
48
- King
ing can fail il – traitor king
- However, in 2 executions at lea
least 1 k king ing is is loy loyal l
- If traitor king in 1st execution, in 2nd execution a loyal king will
cause the other loyal generals to come to consensus ensus
- In 2nd round of each execution
ution ONL NLY Y the king ing sends s it its plan lan
- If traitor king in 2nd execution, loyal generals already have
- verwhel
whelming majority ty for consensus that cannot be influenced
- Each general stores the number of votes for majority in
votesMajority and if it is > 𝑜/2 + 𝑢 then king’s plan is ignored
THE KING ALGORITHM – MAIN IDEAS (2/2)
49
THE KING ALGORITHM – PSEUDOCODE (1/2)
// 2 executi cutions
- ns of 2 rounds
ds each // t=1, 1, so t+1 1 = 1+1 = 2 executi cution
- ns
s (each ch contai ainin ing 2 rounds) s) // 1st round of each executi cution
- n
50
THE KING ALGORITHM – PSEUDOCODE (2/2)
// only the kind sends s in 2nd round // ignore re the king (but t not its vote as genera ral) l) // king is the tiebre reake ker r to achieve ve consensu sensus // 2nd round of each executi cution
- n
// n=5 5 and t=1, so flo loor(n (n/2)+ 2)+t = flo loor(5/2)+ (5/2)+1 = 3
51
1ST SCENARIO FOR KING ALGORITHM (1/3) 1E1R: MIKE TRAITOR, 1ST KING LOYAL ZOE
52
1ST SCENARIO FOR KING ALGORITHM (2/3): 1E2R: MIKE TRAITOR, 1ST KING LOYAL ZOE
53
1ST SCENARIO FOR KING ALGORITHM (3/3): 2E1R: MIKE TRAITOR, 2ND KING IGNORED
54
2ND SCENARIO FOR KING ALGORITHM (1/3): 1E2R: 1ST KING TRAITOR MIKE
55
2ND SCENARIO FOR KING ALGORITHM (2/3): 2E1R: TRAITOR MIKE, 2ND KING LOYAL ZOE
A A
56
2ND SCENARIO FOR KING ALGORITHM (3/3): 2E2R: TRAITOR MIKE, 2ND KING LOYAL ZOE
57
- Lemma 12.1: If king is loyal, plan[myID
myID] has equal value in all all loyal genera rals ls after 2 rounds of the execution while this general is the king
- Proof: Case-by-case of plan[myID] in
in loy loyals ls when the execution starts
- 1. plan[myID] was equal in all loyal generals: myMajority equal,
votesMajority > 𝑜/2 + 𝑢 (i.e. overwhelming), king ignored
- 2. plan[myID] split 3–1: myMajority equal in all generals, possibly only
some majorities overwhelming (4-1 or 3-2 overall ), but king’s plan equal to myMajority so it does not matter whether accepted or not
- 3. plan[myID] split 2–2: myMajority possibly different, but no majority
- verwhelming (3-2 or 2-3 ), so king’s plan accepted by all loyals ∎
THE KING ALGORITHM – PROOF OF CORRECTNESS (1/2)
58
- Theorem 12.2: The King algorithm achieves
s consensu nsus for 4 loyal generals and 1 traitor
- Proof:
a) If 2nd king loyal, then follows ws fro rom m Lemma ma 12.1 b) If 1st king loyal, then by Lemma 12.1 plan lan[myI myID] has equal l value lue in in all ll loy loyal l genera rals ls after 1st execution, so myMajority will be equal and overwhelming (as 4>3), 2nd king is ignored As finalPlan=plan[myID] consen ensus sus is achieved ∎
THE KING ALGORITHM – PROOF OF CORRECTNESS (2/2)
59
- For t traitors, at lea
least 4t+1 generals rals needed needed in total
- Total
al number er of messa sage ges sent by n generals : (𝑢 + 1)(𝑜 + 1) 𝑜 − 1 = 𝓟(𝒐𝟒) Byzant ntine ine Generals rals 3t+1 +1, , 𝓟 𝒐𝟓 : King ing 4t+1 +1, , 𝓟(𝒐𝟒):
- King alg. is more pract
ctica ical l as the number of failures increases
THE KING ALGORITHM – COMPLEXITY
NEXT TIME… (PREVIEW HIGHLIGHTS)
60
From Chapter 11 in Ben-Ari’s Textbook
MAIN TOPICS IN THE NEXT LECTURE… (BEN-ARI TEXTBOOK CHAPTER 11 )
61
- Glob
lobal l properti rties in a distributed system – the problem of consisten stency cy
- Dis
Distrib ribute uted d terminati ination using the Dijkstra-Scholten and credit recovery algorithms
- Global snapsho
hots ts and the Chandy-Lamport algorithm
- (Briefly; not in our textbook) Parallel programming in
scientific computing and the Gravi avitat tation ional al N-Body y Problem lem
Complete your myExperience and shape the future of education at UNSW.
Click the link in Moodle
- r login to myExperience.unsw.edu.au
(use z1234567@ad.unsw.edu.au to login) The survey is confidential, your identity will never be released Survey results are not released to teaching staff until after your results are published