Verteilte Systeme (Distributed Systems)
Karl M. Göschka Karl.Goeschka@tuwien.ac.at
http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/
Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation
Verteilte Systeme (Distributed Systems) Karl M. Gschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Dependability and fault tolerance Taxonomy Techniques and challenges
http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/
3
4
5
6
7
8
9
10
11
12
13
14
15
Service failure of component A causes an permanent or transient fault in the system that contains A. It causes an external fault for component B that receives service from A. This fault in B may be activated and lead to error propagation in B.
16
17
19
20
21
22
23
24
25
26
27
28
29
30
31
“In an extreme environment, following a plan produces the product you intended, just not the product you need.”
32
33
34
Application A Application B
35
36
37
38
http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/
40
41
Service failure of component A causes an permanent or transient fault in the system that contains A. It causes an external fault for component B that receives service from A. This fault in B may be activated and lead to error propagation in B.
36
43
44
45
46
47
48
Failure modes of detecting mechanisms: false alarm / undetected Relation between benefit and consequences
49
Silence is a special form of halt
50
51
A server may produce two-faced responses at arbitrary times Inconsistent failure (Byzantine) The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control Response failure Value failure State transition failure A server's response lies outside the specified time interval Timing failure A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages Omission failure Receive omission Send omission A server halts, but is working correctly until it halts Crash failure Description Type of failure
53
54
Followed by corrective maintenance
3 1 2
Damage assessment
55
56
57
58
59
60
61
63
64
65
66
p
1 (Commander)
p
2
p
3
1:v 1:v 2:1:v 3:1:u
p
1 (Commander)
p
2
p
3
1:x 1:w 2:1:w 3:1:x Faulty processes are shown shaded
67
p
1 (Commander)
p
2
p
3
1:v 1:v 2:1:v 3:1:u Faulty processes are shown shaded
p
4
1:v 4:1:v 2:1:v 3:1:w 4:1:v
p
1 (Commander)
p
2
p
3
1:w 1:u 2:1:u 3:1:w
p
4
1:v 4:1:v 2:1:u 3:1:w 4:1:v
68
The two-army problem: 1. Sparta and Carthage together can beat Bad guys but not
have to decide to attack at exactly the same time. 2. Sparta general sends a message to Carthage general to attack at noon 3. How does he know that Carthage general agrees?
69
70
71
73
74
75
76
77
78
DUP OK OK DUP PC(M) OK DUP OK DUP PMC Strategy P -> M OK ZERO ZERO OK C(MP) Server OK ZERO OK Only when not ACKed ZERO OK DUP Only when ACKed ZERO ZERO OK Never OK OK DUP Always C(PM) MC(P) MPC Reissue strategy Strategy M -> P Client
79
80
82
83
NACK only first (multicast) retransmission request (after random delay) leads to the suppression of others retransmission (not necessarily original sender) is also multicast
84
85
86
Notice the consistent ordering
and T2, the FIFO-related messages F1 and F2 and the causally related messages C1 and C3 – and the otherwise arbitrary delivery ordering of messages.
F3
F1 F2
T2
T1 P1 P2 P3 Time
C
3
C1 C2
87
Yes Causal-ordered delivery Causal atomic multicast Yes FIFO-ordered delivery FIFO atomic multicast Yes None Atomic multicast No Causal-ordered delivery Causal multicast No FIFO-ordered delivery FIFO multicast No None Reliable multicast Total-ordered Delivery? Basic Message Ordering Multicast
88
Message processing Delivery queue Hold-back queue deliver Incoming messages When delivery guarantees are met
89
90
91
2 1 1 2 2 1 Message
2 Proposed Seq
P2 P3 P1 P4 3 Agreed Seq 3 3
92
93
94
Closed group Open group
95
96
97
98
p q r p crashes view (q, r) view (p, q, r) p q r p crashes view (q, r) view (p, q, r) a (allowed). b (allowed). p q r view (p, q, r) p q r p crashes view (q, r) view (p, q, r) c (disallowed). d (disallowed). p crashes view (q, r)
99