Model Checking of Fault-Tolerant Distributed Algorithms
Igor Konnov
joint work with Annu Gmeiner Ulrich Schmid Helmut Veith Josef Widder
Model Checking of Fault-Tolerant Distributed Algorithms Igor Konnov - - PowerPoint PPT Presentation
Model Checking of Fault-Tolerant Distributed Algorithms Igor Konnov joint work with Annu Gmeiner Ulrich Schmid Helmut Veith Josef Widder Igor Konnov Distributed Systems Are they always working? 2/63 No. . . some failing systems
Igor Konnov
joint work with Annu Gmeiner Ulrich Schmid Helmut Veith Josef Widder
Are they always working?
Igor Konnov 2/63
Therac-25 (1985)
radiation therapy machine gave massive overdoses, e.g., due to race conditions of concurrent tasks
Quantas Airbus in-flight Learmonth upset (2008)
1 out of 3 replicated components failed computer initiated dangerous altitude drop
Ariane 501 maiden flight (1996)
primary/backup, i.e., 2 replicated computers both run into the same integer overflow
Netflix outages due to Amazon’s cloud (ongoing)
hundreds of computers involved
Igor Konnov 3/63
Igor Konnov 4/63
faults at design/implementation phase faults at runtime
e.g., to the right: crack in a diode in the data link interface of the Space Shuttle ⇒ led to erroneous messages being sent
Driscoll (Honeywell)
Igor Konnov 5/63
faults at design/implementation phase approach: find and fix faults before operation ⇒ model checking faults at runtime
e.g., to the right: crack in a diode in the data link interface of the Space Shuttle ⇒ led to erroneous messages being sent
Driscoll (Honeywell)
Igor Konnov 6/63
faults at design/implementation phase approach: find and fix faults before operation ⇒ model checking faults at runtime
e.g., to the right: crack in a diode in the data link interface of the Space Shuttle ⇒ led to erroneous messages being sent
approach: keep system operational despite faults ⇒ fault-tolerant distributed algorithms
Driscoll (Honeywell)
Igor Konnov 7/63
Goal: automatically verified fault-tolerant distributed algorithms e.g., Paxos, Fast Byzantine Consensus, etc.
Igor Konnov 8/63
Goal: automatically verified fault-tolerant distributed algorithms e.g., Paxos, Fast Byzantine Consensus, etc. model checking FTDAs is a research challenge: computers run independently at different speeds exchange messages with uncertain delays faults parameterization . . . fault-tolerance makes model checking harder
Igor Konnov 9/63
an alternative proof approach useful counter-examples ability to define and vary assumptions about the system and see why it breaks closer to code level good degree of automation Transition system:
s4 : {g} s1 : {y} s2 : {y} s3 : {r, y, g} s0 : {r}
Linear Temporal Logic:
F ( ) G ( ) s0 s1 s2 s4 s3 s′ s′
1
s′
2
s′
3
s′
4 Igor Konnov 10/63
unbounded data types
unbounded number of rounds (round numbers part of messages)
parameterization in multiple parameters
among n processes f ≤ t are faulty with n > 3t
contrast to concurrent programs
diverse fault models (adverse environments)
continuous time
fault-tolerant clock synchronization
degrees of concurrency: synchronous, asynchronous partially
synchronous a process makes at most 5 steps between 2 steps
Igor Konnov 11/63
clean crashes: least severe
faulty processes prematurely halt after/before “send to all”
crash faults:
faulty processes prematurely halt (also) in the middle of “send to all”
faulty processes follow the algorithm, but some messages sent by them might be lost
symmetric faults:
faulty processes send arbitrarily to all or nobody
Byzantine faults: most severe
faulty processes can do anything
encompass all behaviors of above models
Igor Konnov 12/63
Translate pseudo-code to a formal description that allows us to verify the algorithm and does not oversimplify the original algorithm. Assumptions about the communication medium are usually written in plain English, spread across research papers, constitute folklore knowledge.
Igor Konnov 13/63
receive messages compute using messages and local variables (description in English with basic control flow if-then-else) send messages
Igor Konnov 14/63
receive messages compute using messages and local variables (description in English with basic control flow if-then-else) send messages
Igor Konnov 15/63
Parameterized model checking problem: given a process template P(n, t, f), resilience condition RC : n > 3t ∧ t ≥ f ≥ 0, fairness constraints Φ, e.g., “all messages will be delivered” and an LTL-X formula ϕ show for all n, t, and f satisfying RC (P(n, t, f))n−f + f faults | = (Φ → ϕ)
Igor Konnov 16/63
Interplay of safety and liveness is a central challenge in DAs achieving safety and liveness is non-trivial asynchrony and faults lead to impossibility results [Fischer, Lynch, Paterson’85]
Igor Konnov 17/63
Interplay of safety and liveness is a central challenge in DAs achieving safety and liveness is non-trivial asynchrony and faults lead to impossibility results [Fischer, Lynch, Paterson’85] Rich literature to verify safety (e.g. in concurrent systems) Distributed algorithms perspective: “doing nothing is always safe” “tools verify algorithms that actually might do nothing” Verification efforts often have to simplify assumptions
Igor Konnov 18/63
faults, communication medium captured in English, algorithms written in pseudo-code.
safety and liveness
with unbounded integers, non-standard fairness constraints,
Igor Konnov 19/63
Igor Konnov 20/63
n processes communicate by messages all processes know that at most t of them might be faulty f are actually faulty
Igor Konnov 21/63
n processes communicate by messages all processes know that at most t of them might be faulty f are actually faulty
Igor Konnov 22/63
n processes communicate by messages all processes know that at most t of them might be faulty f are actually faulty
Igor Konnov 23/63
The core of the classic broadcast algorithm from the DA literature. It solves an agreement problem depending on the inputs vi. Variables of process i
vi : {0 , 1} i n i t i a l l y 0 or 1 accepti : {0 , 1} i n i t i a l l y
An atomic step:
i f vi = 1 then send ( echo ) to all ; i f received (echo) from at l e a s t t + 1 distinct processes and not sent ( echo ) before then send ( echo ) to all ; i f received ( echo ) from at l e a s t n - t distinct processes then accepti := 1 ;
Igor Konnov 24/63
The core of the classic broadcast algorithm from the DA literature. It solves an agreement problem depending on the inputs vi. Variables of process i
vi : {0 , 1} i n i t i a l l y 0 or 1 accepti : {0 , 1} i n i t i a l l y
An atomic step:
i f vi = 1 then send ( echo ) to all ; i f received (echo) from at l e a s t t + 1 distinct processes and not sent ( echo ) before then send ( echo ) to all ; i f received ( echo ) from at l e a s t n - t distinct processes then accepti := 1 ;
asynchronous t Byzantine faults correct if n > 3t the code is parameterized in n and t ⇒ process template P(n, t, f)
Igor Konnov 25/63
Standard construct: quantified guards (t=f=0) Existential Guard if received m from some process then ... Universal Guard if received m from all processes then ...
Igor Konnov 26/63
Standard construct: quantified guards (t=f=0) Existential Guard if received m from some process then ... Universal Guard if received m from all processes then ... what if faults might occur?
Igor Konnov 27/63
Standard construct: quantified guards (t=f=0) Existential Guard if received m from some process then ... Universal Guard if received m from all processes then ... what if faults might occur? Fault-Tolerant Algorithms: n processes, at most t are Byzantine Threshold Guard if received m from n − t processes then ... (the processes cannot refer to f!)
Igor Konnov 28/63
t + 1
Correct processes count incoming messages from distinct processes
Igor Konnov 29/63
t + 1
Correct processes count incoming messages from distinct processes
Igor Konnov 30/63
t + 1
at least one non-faulty sent the message
Correct processes count incoming messages from distinct processes
Igor Konnov 31/63
As the distributed algorithms are given in pseudo-code, we have to decide on how to encode in PROMELA: send to all and receive counting expressions “received <m> from n − t distinct processes” faults
Igor Konnov 32/63
As the distributed algorithms are given in pseudo-code, we have to decide on how to encode in PROMELA: send to all and receive counting expressions “received <m> from n − t distinct processes” faults In what follows, we compare side-by-side two solutions: A straightforward encoding with PROMELA channels and explicit representation of faulty processes. [Solution 1] An advanced encoding with shared variables and fault injection. [Solution 2]
Igor Konnov 33/63
As the distributed algorithms are given in pseudo-code, we have to decide on how to encode in PROMELA: send to all and receive counting expressions “received <m> from n − t distinct processes” faults In what follows, we compare side-by-side two solutions: A straightforward encoding with PROMELA channels and explicit representation of faulty processes. [Solution 1] An advanced encoding with shared variables and fault injection. [Solution 2]
Igor Konnov 34/63
All our case studies are designed with the assumption of classic reliable asynchronous message passing as in [FLP85]: non-blocking communication,
if a message can be received now, it may be also received later, a process does not have to receive a message as soon as it is able to. every sent message is eventually received, but there are no bounds on the delays.
Igor Konnov 35/63
States (logscale)
10 100 1000 10000 100000 1e+06 1e+07 1e+08 3 4 5 6 7 8
states (logscale) number of processes, N
Memory (MB, logscale, ≤ 12 GB)
100 1000 10000 3 4 5 6 7 8
memory, MB (logscale) number of processes, N
Solution 1: Channels + explicit Byzantine processes (blue) Solution 2: shared variables + fault injection (red) in the presence of one Byzantine faulty process (f = 1) (case f = 2 runs out of memory too fast)
Igor Konnov 36/63
We consider a number of threshold-based algorithms. Our running example ST87 for
1 Byzantine faults (BYZ) 2 omission faults (OMIT) 3 symmetric faults (SYMM) 4 clean crashes (CLEAN). 5 Forklore reliable broadcast for clean crashes
[Chandra & Toueg 96, CT96] (to be continued)
Igor Konnov 37/63
more involved algorithms in the purely asynchronous setting:
6 Asynchronous Byzantine Agreement (Bracha & Toueg 85, BT85)
Byzantine faults two phases and two message types five status values properties: unforgeability, correctness (liveness), agreement (liveness)
7 Condition-based Consensus (Most´
efaoui et al. 01, MRRR01)
crash faults two phases and four message types nine status variables properties: validity, agreement, termination (liveness)
8 Fast Byzantine Consensus: common case (Martin, Alvisi 06,
MA06)
Byzantine faults the core part of the algorithm no cryptography
Igor Konnov 38/63
Algorithm Fault Parameters Resilience Properties Time
BYZ n = 7, t = 2, f = 2 n > 3t U, C, R 6 sec.
BYZ n = 7, t = 3, f = 2 n > 3t U, C, R 5 sec.
BYZ n = 7, t = 1, f = 2 n > 3t U, C, R 1 sec.
OMIT n = 5, t = 2, f = 2 n > 2t U, C, R 4 sec.
OMIT n = 5, t = 2, f = 3 n > 2t U, C, R 5 sec.
SYMM n = 5, t = 1, fp = 1, fs = 0 n > 2t U, C, R 1 sec.
SYMM n = 5, t = 2, fp = 3, fs = 1 n > 2t U, C, R 1 sec.
CLEAN n = 3, t = 2, fc = 2, fnc = 0 n > t U, C, R 1 sec.
CRASH n = 2 — U, C, R 1 sec.
BYZ n = 5, t = 1, f = 1 n > 3t R 131 sec.
BYZ n = 5, t = 1, f = 2 n > 3t R 1 sec.
BYZ n = 5, t = 2, f = 2 n > 3t R 1 sec.
CRASH n = 3, t = 1, f = 1 n > 2t V0, V1, A, T 1 sec.
CRASH n = 3, t = 1, f = 2 n > 2t V0, V1, A, T 1 sec.
BYZ p = 4,a = 6,l = 4, t = 1,f = 1 p > 3t, a > 5t, l > 3t CS1, CS3, CL1, CL2 3 hrs.
BYZ p = 4,a = 5,l = 4, t = 1, f = 1 p > 3t, a > 5t, l > 3t CS1, CS3, CL1, CL2 14 min.
BYZ p = 4,a = 6,l = 4, t = 1, f = 2 p > 3t, a > 5t, l > 3t CS1, CS3, CL1, CL2 2 sec.
Igor Konnov 39/63
We show how to model threshold-based fault-tolerant algorithms starting with an imprecise description [Spin’13] We create PROMELA models using expert advice. The tool demonstrates that the model behaves as predicted by theory (for fixed parameters) This reference implementation allows us to optimize the encoding ... and to make the model amenable to parameterized verification
Igor Konnov 40/63
Igor Konnov 41/63
qI q0 q1 q2 q3 sv = V1 ¬(sv = V1) inc nsnt sv := SE q4 q5 q6 q7 q8 qF rcvd := z where (rcvd ≤ z ∧ z ≤ nsnt + f) ¬(t + 1 ≤ rcvd) t + 1 ≤ rcvd sv = V0 ¬(sv = V0) inc nsnt n − t ≤ rcvd ¬(n − t ≤ rcvd) sv := SE sv := AC
concrete values are not important thresholds are essential:
Igor Konnov 42/63
qI q0 q1 q2 q3 sv = V1 ¬(sv = V1) inc nsnt sv := SE q4 q5 q6 q7 q8 qF rcvd := z where (rcvd ≤ z ∧ z ≤ nsnt + f) ¬(t + 1 ≤ rcvd) t + 1 ≤ rcvd sv = V0 ¬(sv = V0) inc nsnt n − t ≤ rcvd ¬(n − t ≤ rcvd) sv := SE sv := AC
concrete values are not important thresholds are essential:
intervals with symbolic boundaries:
I0 = [0, 1) I1 = [1, t + 1) It+1 = [t + 1, n − t) In−t = [n − t, ∞)
Igor Konnov 43/63
qI q0 q1 q2 q3 sv = V1 ¬(sv = V1) inc nsnt sv := SE q4 q5 q6 q7 q8 qF rcvd := z where (rcvd ≤ z ∧ z ≤ nsnt + f) ¬(t + 1 ≤ rcvd) t + 1 ≤ rcvd sv = V0 ¬(sv = V0) inc nsnt n − t ≤ rcvd ¬(n − t ≤ rcvd) sv := SE sv := AC
concrete values are not important thresholds are essential:
intervals with symbolic boundaries:
I0 = [0, 1) I1 = [1, t + 1) It+1 = [t + 1, n − t) In−t = [n − t, ∞)
Parameteric Interval Abstraction (PIA) Similar to interval abstraction: [t + 1, n − t) rather than [4, 10). Total order: 0 < 1 < t + 1 < n − t for all parameters satisfying RC: n > 3t, t ≥ f ≥ 0.
Igor Konnov 44/63
We have to reduce the verification of an infinite number of instances where
1 the process code is parameterized 2 the number of processes is parameterized
to one finite state model checking instance
Igor Konnov 45/63
We have to reduce the verification of an infinite number of instances where
1 the process code is parameterized 2 the number of processes is parameterized
to one finite state model checking instance We do that by:
1 PIA data abstraction 2 PIA counter abstraction
Igor Konnov 46/63
We have to reduce the verification of an infinite number of instances where
1 the process code is parameterized 2 the number of processes is parameterized
to one finite state model checking instance We do that by:
1 PIA data abstraction 2 PIA counter abstraction
abstraction is an over approximation ⇒ possible abstract behavior that does not correspond to a concrete behavior.
3 Refining spurious counter-examples
Igor Konnov 47/63
Parameterized family
: n > 3t, t ≥ f, f ≥ 0}
EXTRACT
Parametric Interval Domain
PARAMETRIC INTERVAL DATA ABSTRACTION
Uniform parameterized family
M(p) = ˆ P · · · ˆ P
: n > 3t, t ≥ f, f ≥ 0}
CHANGE REPRESENTATION
Counter representation
PARAMETRIC INTERVAL COUNTER ABSTRACTION
simulates for every p the behavior of M(p)
Igor Konnov 48/63
Parameterized family
: n > 3t, t ≥ f, f ≥ 0}
EXTRACT
Parametric Interval Domain
PARAMETRIC INTERVAL DATA ABSTRACTION
Uniform parameterized family
M(p) = ˆ P · · · ˆ P
: n > 3t, t ≥ f, f ≥ 0}
CHANGE REPRESENTATION
Counter representation
PARAMETRIC INTERVAL COUNTER ABSTRACTION
simulates for every p the behavior of M(p)
finite-state model check- ing replay the counter-example refine the system
Igor Konnov 49/63
n = 6, t = 1, f = 1 t + 1 = 2, n − t = 5
received received
sent accepted
where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n 3 processes at (sent, received=3) 1 process at (accepted, received=5)
Igor Konnov 50/63
n = 6, t = 1, f = 1 t + 1 = 2, n − t = 5
received received
sent accepted
where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n
Igor Konnov 51/63
n = 6, t = 1, f = 1 t + 1 = 2, n − t = 5
received received
sent accepted
where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n
Igor Konnov 52/63
✘✘✘✘✘✘ ✘ ❳❳❳❳❳❳ ❳
n = 6, ✘✘✘✘✘
✘ ❳❳❳❳❳ ❳
t = 1, ✘✘✘✘✘
✘ ❳❳❳❳❳ ❳
f = 1 n > 3 · t ∧ t ≥ f Parametricintervals: I0 = [0, 1) I1 = [1, t + 1) It+1 = [t + 1, n − t) In−t = [n − t, ∞)
received received
sent accepted
I1 It+1 In−t
I1 It+1 In−t
I1 It+1 In−t A local state is (sv, rcvd), where sv ∈ {sent, accepted} and rcvd ∈ {I0, I1, It+1, In−t}
Igor Konnov 53/63
n > 3 · t ∧ t ≥ f Parametricintervals: I0 = [0, 1) I1 = [1, t + 1) It+1 = [t + 1, n − t) In−t = [n − t, ∞)
received received
sent accepted
I1 It+1 In−t
I1 It+1 In−t
I1 It+1 In−t when all correct processes accepted, all non-zero counters are in this area A local state is (sv, rcvd), where sv ∈ {sent, accepted} and rcvd ∈ {I0, I1, It+1, In−t}
Igor Konnov 54/63
Time to check relay (sec, logscale) Memory to check relay (MB, logscale)
Parameterized model checking performs well (the red line). Experiments for fixed parameters quickly degrade (n = 9 runs out of memory). We found counter-examples for the cases n = 3t and f > t, where the resilience condition is violated.
Igor Konnov 55/63
Algorithm Fault Resilience Property Valid? #Refinements Time
ST87 BYZ n > 3t U ✓ 4 sec. ST87 BYZ n > 3t C ✓ 10 32 sec. ST87 BYZ n > 3t R ✓ 10 24 sec. ST87 SYMM n > 2t U ✓ 1 sec. ST87 SYMM n > 2t C ✓ 2 3 sec. ST87 SYMM n > 2t R ✓ 12 16 sec. ST87 OMIT n > 2t U ✓ 1 sec. ST87 OMIT n > 2t C ✓ 5 6 sec. ST87 OMIT n > 2t R ✓ 5 10 sec. ST87 CLEAN n > t U ✓ 2 sec. ST87 CLEAN n > t C ✓ 4 8 sec. ST87 CLEAN n > t R ✓ 13 31 sec. CT96 CLEAN n > t U ✓ 1 sec. CT96 CLEAN n > t A ✓ 1 sec. CT96 CLEAN n > t R ✓ 1 sec. CT96 CLEAN n > t C ✗ 1 sec.
Igor Konnov 56/63
Algorithm Fault Resilience Property Valid? #Refinements Time
ST87 BYZ n > 3t ∧ f ≤ t+1 U ✗ 9 56 sec. ST87 BYZ n > 3t ∧ f ≤ t+1 C ✗ 11 52 sec. ST87 BYZ n > 3t ∧ f ≤ t+1 R ✗ 10 17 sec. ST87 BYZ n ≥ 3t ∧ f ≤ t U ✓ 5 sec. ST87 BYZ n ≥ 3t ∧ f ≤ t C ✓ 9 32 sec. ST87 BYZ n ≥ 3t ∧ f ≤ t R ✗ 30 78 sec. ST87 SYMM n > 2t ∧ f ≤ t+1 U ✗ 2 sec. ST87 SYMM n > 2t ∧ f ≤ t+1 C ✗ 2 4 sec. ST87 SYMM n > 2t ∧ f ≤ t+1 R ✓ 8 12 sec. ST87 OMIT n ≥ 2t ∧ f ≤ t U ✓ 1 sec. ST87 OMIT n ≥ 2t ∧ f ≤ t C ✗ 2 sec. ST87 OMIT n ≥ 2t ∧ f ≤ t R ✗ 2 sec.
Igor Konnov 57/63
Igor Konnov 58/63
partial orders: we need to check computations of bounded length complete SAT-based model checking (safety) [CONCUR’14] sort the transitions between the milestones:
true true x++ x++ x ≥ n − f, y++ y ≥ t t1 t3 t2 t4 t5 t6
accelerate adjacent transitions of the same type:
true x++ x ≥ n − f, y++ y ≥ t
×2 ×2 ×1
t′
1
t′
2
t′
5
t′
6
Igor Konnov 59/63
encode representative executions in linear integer arithmetics (SMT) [submitted to CAV’15] Now we can verify safety of: Reliable broadcast (FRB, STRB, ABA) Condition-based consensus (CBC) One-step consensus (CF1S, C1CS, BOSCO)
Liveness is whatever prevents an empty system from being correct. Orna Kupferman
Igor Konnov 60/63
10^0 10^1 10^2 10^3 10^4 10^5
5 10 15 20 25 Number of checked benchmarks Time to verify an instance, sec. (logscale)
smt sat:lingeling sat:minisat bdd fast
Igor Konnov 61/63
Standard model checking tools are not tuned to computational models
Computational primitives in FTDAs are simpler than the standard ones Thinking in terms of parameterized systems helps to develop efficient techniques
85 ABA 87 STRB 96 FRB 97 NBAC 01 CBC, C1CS 02 NBACG 06 CF1S,FBC 08 BOSCO
Igor Konnov 62/63
Igor Konnov 63/63
Igor Konnov 64/63
65/63
Discrete synchronous Discrete partially synchronous Discrete asynchronous Continuous synchronous Continuous partially synchronous One instance/ finite payload Many inst./ finite payload Many inst./ unbounded payload Messages with reals
core of {ST87, BT87, CT96}, CBC, CF1S, C1CS, BOSCO
Igor Konnov 66/63
Discrete synchronous Discrete partially synchronous Discrete asynchronous Continuous synchronous Continuous partially synchronous One instance/ finite payload Many inst./ finite payload Many inst./ unbounded payload Messages with reals
core of {ST87, BT87, CT96}, CBC, CF1S, C1CS, BOSCO
DHM12 ST87 AK00 CT96 (failure detector) DLS86, MA06, L98 (Paxos) ST87, BT87, CT96, DAs with failure-detectors DLPSW86 DFLPS13 WS07 ST87 (JACM) FSFK06 WS09
clock sync broadcast
Igor Konnov 67/63
We implement the following loop
receive messages compute using messages and local variables (description in English with basic control flow if-then-else) send messages
atomic
/∗ shared s t a t e : a v a r i a b l e
a channel ∗/ active proctype[N(n,t,f)] P(){ /∗ l o c a l v a r i a b l e to count messages from d i s t i n c t p r o c e s s e s ∗/ int nrcvd; /∗ i n i t i a l i z a t i o n ∗/ loop: atomic { /∗ 1 . r e c e i v e and count messages 2 . compute using nrcvd 3 . send messages ∗/ } goto loop; }
Igor Konnov 68/63
All our case studies are designed with the assumption of classic reliable asynchronous message passing as in (?): non-blocking communication,
if a message can be received now, it may be also received later, a process does not have to receive a message as soon as it is able to. every sent message is eventually received, but there are no bounds on the delays.
Igor Konnov 69/63
A straightforward encoding using message channels:
/∗ message type ∗/ mtype = { ECHO }; /∗ point −to−point channels ∗/ chan p2p[N][N] = [1] of { mtype }; /∗ tag r e c e i v e d messages ∗/ bit rx[N][N];
Sending a message to all processes:
for (i : 1 .. N) { p2p[_pid][i]!ECHO; }
Note: pid denotes the process identifier in PROMELA (we use it solely to encode message passing).
Igor Konnov 70/63
Receiving and counting messages from distinct processes (no faults yet):
/∗ l o c a l ∗/ int nrcvd = 0; /∗ i n i t i a l l y , no messages ∗/ ... i = 0; do /∗ i s t h e r e a message from p r o c e s s i? ∗/ :: (i < N) && nempty(p2p[i][_pid]) -> p2p[i][_pid]?ECHO; /∗ remove i t ∗/ if :: !rx[i][_pid] -> /∗ 1 . the f i r s t time : ∗/ rx[i][_pid] = 1; /∗ a . mark as r e c e i v e d ∗/ nrcvd++; break; /∗ b . i n c r e a s e l o c a l counter ∗/ :: rx[i][_pid]; /∗ 2 . ignore a d u p l i c a t e ∗/ fi; i++; /∗ next p r o c e s s ∗/ :: (i < N) -> i++; /∗ r e c e i v e nothing from i ∗/ :: i == N -> break;
Igor Konnov 71/63
Keeping the number of send-to-all’s by (correct) processes:
int nsnt; /∗ shared v a r i a b l e ∗/ /∗ number
send−to−a l l ’ s sent by c o r r e c t p r o c e s s e s ∗/
Sending a message to all:
nsnt++;
Receiving and counting messages from distinct processes (no faults):
if /∗ p i c k a l a r g e r value ≤ nsnt ∗/ :: ((nrcvd + 1) < nsnt) -> nrcvd++; /∗
∗/ :: skip; /∗
nothing ∗/ fi;
Reliable communication as a fairness property: F G [∀i.nrcvdi ≥ nsnt]
Igor Konnov 72/63