Contention-Related Crash Failures Anaïs Durand
LIP6, Sorbonne Université, Paris
April 1st, 2019
Anaïs Durand Contention-Related Crash Failures 1/25
Contention-Related Crash Failures Anas Durand LIP6, Sorbonne - - PowerPoint PPT Presentation
Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st, 2019 1 / 25 Anas Durand Contention-Related Crash Failures Set Agreement and Renaming in the Presence of Contention-Related Crash Failures SSS 2018
April 1st, 2019
Anaïs Durand Contention-Related Crash Failures 1/25
Anaïs Durand Contention-Related Crash Failures 2/25
Asynchronous deterministic system n processes p1, . . . , pn Atomic read/write registers 0 ≤ t < n process crashes Participation required
Anaïs Durand Contention-Related Crash Failures 3/25
Initially dead processes “Classical” (any-time) crashs: no
Anaïs Durand Contention-Related Crash Failures 4/25
Contention = # processes that accessed a shared register
λ = predefined contention threshold 2 possible definitions:
No crashes
No crashes
Anaïs Durand Contention-Related Crash Failures 5/25
Contention = # processes that accessed a shared register
λ = predefined contention threshold 2 possible definitions:
No crashes λ-constrained crashes
No crashes
Anaïs Durand Contention-Related Crash Failures 5/25
Consensus: ◮ [Fischer et al., 85]: Impossible with one any-time crash failure. ◮ [Taubenfeld, 18]: Algorithm that tolerates one (n − 1)-constrained
crash failure for n > 1.
k-Set Agreement, 1 ≤ k < n: ◮ [Borowsky, Gafni, 93]: Impossible with k any-time crash failures. ◮ [Taubenfeld, 18]: Algorithm that tolerates ℓ + k − 2 (n − ℓ)-constrained
crash failures for ℓ ≥ 1 and n ≥ 2ℓ + k − 2.
Anaïs Durand Contention-Related Crash Failures 6/25
Anaïs Durand Contention-Related Crash Failures 7/25
Anaïs Durand Contention-Related Crash Failures 8/25
One-shot object Operation propose(v): propose value v and return a decided value Properties: ◮ Validity: decided value ∈ proposed values ◮ Agreement: ≤ k decided values ◮ Termination: every correct process decides
Anaïs Durand Contention-Related Crash Failures 9/25
λ = n − k k ≥ 2 k = m + f , m ≥ 0, f ≥ 1
[Borowsky, Gafni, 93]: Impossible with k any-time crash failures.
Anaïs Durand Contention-Related Crash Failures 10/25
m f k-1 1 k t = k − 1
m f k-1 k
2
k k
2
k
2
k
2
m f k-1 1 k t = 2k − 2
Anaïs Durand Contention-Related Crash Failures 11/25
DEC: atomic register, initially ⊥ PART[1 . . . n]: snapshot object, initially [down, . . . , down] ◮ Atomic (linearizable) operations write() and snapshot() ◮ ≈ array of single-writer multi-reader atomic registers
PART[1 . . . n] such that:
PART[1 . . . n] as if it read simultaneously and instantaneously all its entries
Anaïs Durand Contention-Related Crash Failures 12/25
MUTEX[1]: one-shot deadlock-free f -mutex MUTEX[2]: one-shot deadlock-free m-mutex ◮ Operations acquire() and release() (invoked at most
◮ Properties:
section
process invoking acquire() terminates its invocation
Anaïs Durand Contention-Related Crash Failures 13/25
(1)
PART.write(up);
% signal participation
Anaïs Durand Contention-Related Crash Failures 14/25
(1)
PART.write(up);
% signal participation
(2)
repeat
(3)
parti := PART.snapshot();
% wait for n − t
(4)
counti := |{x such that parti[x] = up}|;
% participants
(5)
until counti ≥ n − t end repeat;
Anaïs Durand Contention-Related Crash Failures 14/25
(1)
PART.write(up);
% signal participation
(2)
repeat
(3)
parti := PART.snapshot();
% wait for n − t
(4)
counti := |{x such that parti[x] = up}|;
% participants
(5)
until counti ≥ n − t end repeat;
(6)
if counti ≤ λ then
% split processes into groups
(7)
groupi := 2;
% MUTEX[2] (m-mutex)
(8)
else
(9)
groupi := 1;
% MUTEX[1] (f-mutex)
(10)
end if
Anaïs Durand Contention-Related Crash Failures 14/25
(1)
PART.write(up);
% signal participation
(2)
repeat
(3)
parti := PART.snapshot();
% wait for n − t
(4)
counti := |{x such that parti[x] = up}|;
% participants
(5)
until counti ≥ n − t end repeat;
(6)
if counti ≤ λ then
% split processes into groups
(7)
groupi := 2;
% MUTEX[2] (m-mutex)
(8)
else
(9)
groupi := 1;
% MUTEX[1] (f-mutex)
(10)
end if
(11)
launch in // the threads T1 and T2;
Anaïs Durand Contention-Related Crash Failures 14/25
thread T1 is
% wait for a decided value
(12)
loop forever
(13)
if DEC = ⊥ then
(14)
return(DEC);
(15)
end if;
(16)
end loop;
Anaïs Durand Contention-Related Crash Failures 15/25
thread T1 is
% wait for a decided value
(12)
loop forever
(13)
if DEC = ⊥ then
(14)
return(DEC);
(15)
end if;
(16)
end loop; thread T2 is
% decide a value if enters its CS
(17)
if groupi = 1 ∨ m > 0 then
(18)
MUTEX[groupi].acquire();
(19)
if DEC = ⊥ then
(20)
DEC := ini;
(21)
end if
(22)
MUTEX[groupi].release();
(23)
return(DEC);
(24)
end if;
Anaïs Durand Contention-Related Crash Failures 15/25
thread T1 is
(12)
loop forever
(13)
if DEC = ⊥ then
(14)
return(DEC);
(15)
end if;
(16)
end loop; thread T2 is
(17)
if groupi = 1 ∨ m > 0 then
(18)
MUTEX[groupi].acquire();
(19)
if DEC = ⊥ then
(20)
DEC := ini;
(21)
end if
(22)
MUTEX[groupi].release();
(23)
return(DEC);
(24)
end if;
a Decided value = DEC
Anaïs Durand Contention-Related Crash Failures 16/25
thread T1 is
(12)
loop forever
(13)
if DEC = ⊥ then
(14)
return(DEC);
(15)
end if;
(16)
end loop; thread T2 is
(17)
if groupi = 1 ∨ m > 0 then
(18)
MUTEX[groupi].acquire();
(19)
if DEC = ⊥ then
(20)
DEC := ini;
(21)
end if
(22)
MUTEX[groupi].release();
(23)
return(DEC);
(24)
end if;
a Decided value = DEC b DEC assigned to proposed
Anaïs Durand Contention-Related Crash Failures 16/25
thread T1 is
(12)
loop forever
(13)
if DEC = ⊥ then
(14)
return(DEC);
(15)
end if;
(16)
end loop; thread T2 is
(17)
if groupi = 1 ∨ m > 0 then
(18)
MUTEX[groupi].acquire();
(19)
if DEC = ⊥ then
(20)
DEC := ini;
(21)
end if
(22)
MUTEX[groupi].release();
(23)
return(DEC);
(24)
end if;
a Decided value = DEC b DEC assigned to proposed
c MUTEX[1] ≤ f = values
Anaïs Durand Contention-Related Crash Failures 16/25
(1)
PART.write(up);
(2)
repeat
(3)
parti := PART.snapshot();
(4)
counti := |{x such that parti[x] = up}|;
(5)
until counti ≥ n − t end repeat;
a ≤ t crashes + participation required
Anaïs Durand Contention-Related Crash Failures 17/25
(1)
PART.write(up);
(2)
repeat
(3)
parti := PART.snapshot();
(4)
counti := |{x such that parti[x] = up}|;
(5)
until counti ≥ n − t end repeat;
(6)
if counti ≤ λ then
(7)
groupi := 2;
(8)
else
(9)
groupi := 1;
(10)
end if
a ≤ t crashes + participation required
b ≤ n − k processes with counti ≤ n − k = λ when leaving loop (2)-(5)
Anaïs Durand Contention-Related Crash Failures 17/25
thread T1 is
(12)
loop forever
(13)
if DEC = ⊥ then
(14)
return(DEC);
(15)
end if;
(16)
end loop; thread T2 is
(17)
if groupi = 1 ∨ m > 0 then
(18)
MUTEX[groupi].acquire();
(19)
if DEC = ⊥ then
(20)
DEC := ini;
(21)
end if
(22)
MUTEX[groupi].release();
(23)
return(DEC);
(24)
end if;
a ≤ t crashes + participation required
b ≤ n − k processes with counti ≤ n − k = λ when leaving loop (2)-(5)
c one process decides ⇒ every correct process decides
Anaïs Durand Contention-Related Crash Failures 17/25
d If m = 0: k = m + f = f
f -1 any-time crashes n − t correct processes
f
Anaïs Durand Contention-Related Crash Failures 18/25
d If m = 0: k = m + f = f
f -1 any-time crashes n − t correct processes n − t = n − (f − 1) = n − k + 1
n-k f
Anaïs Durand Contention-Related Crash Failures 18/25
d If m = 0: k = m + f = f
f -1 any-time crashes n − t correct processes n − t = n − (f − 1) = n − k + 1
n-k f
Anaïs Durand Contention-Related Crash Failures 18/25
d If m = 0: k = m + f = f
f -1 any-time crashes n − t correct processes n − t = n − (f − 1) = n − k + 1
n-k f
Anaïs Durand Contention-Related Crash Failures 18/25
d If m > 0: ◮ |group 1| ≥ f f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
f
Anaïs Durand Contention-Related Crash Failures 19/25
d If m > 0: ◮ |group 1| ≥ f f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
f
Anaïs Durand Contention-Related Crash Failures 19/25
d If m > 0: ◮ |group 1| ≥ f f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
f
Anaïs Durand Contention-Related Crash Failures 19/25
d If m > 0: ◮ |group 1| ≥ f f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
f
≥ 1 correct process & ≤ f − 1 (any-time) crashes in group 1 (Properties of DF f -mutex MUTEX[1]) ⇒ at least one process decides
Anaïs Durand Contention-Related Crash Failures 19/25
d If m > 0: ◮ |group 1| < f , correct ∈ group 1 f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
f
≥ 1 correct process & ≤ f − 1 (any-time) crashes in group 1 (Properties of DF f -mutex MUTEX[1]) ⇒ at least one process decides
Anaïs Durand Contention-Related Crash Failures 20/25
d If m > 0: ◮ |group 1| < f , correct /
∈ group 1
f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
f m
(n − k) − (n − t) = t − k = (2m + f − 1) − (m + f ) = m − 1
≥ 1 correct process & ≤ m − 1 crashes in group 2 (Properties of DF m-mutex MUTEX[2]) ⇒ at least one process decides
Anaïs Durand Contention-Related Crash Failures 21/25
d If m > 0: ◮ |group 1| < f , correct /
∈ group 1
f -1 any-time crashes n − t correct processes 2m λ-constrained crashes
Group 1 Group 2
n-k f m
(n − k) − (n − t) = t − k = (2m + f − 1) − (m + f ) = m − 1
≥ 1 correct process & ≤ m − 1 crashes in group 2 (Properties of DF m-mutex MUTEX[2]) ⇒ at least one process decides
Anaïs Durand Contention-Related Crash Failures 21/25
λ = n − k k ≥ 2 k = m + f , m ≥ 0, f ≥ 1
Anaïs Durand Contention-Related Crash Failures 22/25
λ = n − ℓ k ≤ ℓ ≤ n k ≥ 2 k = m + f , m ≥ 0, f ≥ 1
Anaïs Durand Contention-Related Crash Failures 23/25
Notion of contention-related crash failures Allows to circumvent impossibility results Better understanding of fault tolerance:
Future work: ◮ Tight bounds? ◮ General algorithm for k-set agreement, ∀k ≥ 1. ◮ What about crashes after the contention threshold λ? ◮ What about other definitions of weak crash failures?
Anaïs Durand Contention-Related Crash Failures 24/25
Anaïs Durand Contention-Related Crash Failures 25/25
Anaïs Durand Contention-Related Crash Failures 26/25
Initial name: idi New name space: {1 . . . M} Operation rename(idi): return a new name Properties: ◮ Validity: new name ∈ {1 . . . M} ◮ Agreement: no 2 same new names ◮ Termination: invokation of rename() by a correct process terminates
Anaïs Durand Contention-Related Crash Failures 27/25
M = n + f λ = n − t − 1 t = m + f , m ≥ 0, f ≥ 0
[Herlihy, Shavit, 93]: Impossible with f + 1 any-time crash failures.
Anaïs Durand Contention-Related Crash Failures 28/25
m f t t M = n + t
m f t t
2
t
2
t
2
f t t M = n
Anaïs Durand Contention-Related Crash Failures 29/25
PART[1 . . . n]: snapshot object, initially [down, . . . , down] RENAMING f : (n + f )-renaming object that: ◮ tolerates ≤ f any-time crash failures ◮ does not require participation
Anaïs Durand Contention-Related Crash Failures 30/25
(1)
PART.write(up);
% signal participation
(2)
repeat
(3)
parti := PART.snapshot();
% wait for n − t
(4)
counti := |{x such that parti[x] = up}|;
% participants
(5)
until counti ≥ n − t end repeat;
Anaïs Durand Contention-Related Crash Failures 31/25
(1)
PART.write(up);
% signal participation
(2)
repeat
(3)
parti := PART.snapshot();
% wait for n − t
(4)
counti := |{x such that parti[x] = up}|;
% participants
(5)
until counti ≥ n − t end repeat;
(6)
newNamei := RENAMING f .rename(idi);
% get new name
(7)
return(newNamei);
Anaïs Durand Contention-Related Crash Failures 31/25
(1)
PART.write(up);
(2)
repeat
(3)
parti := PART.snapshot();
(4)
counti := |{x such that parti[x] = up}|;
(5)
until counti ≥ n − t end repeat;
a ≤ t crashes + participation required
Anaïs Durand Contention-Related Crash Failures 32/25
(1)
PART.write(up);
(2)
repeat
(3)
parti := PART.snapshot();
(4)
counti := |{x such that parti[x] = up}|;
(5)
until counti ≥ n − t end repeat;
a ≤ t crashes + participation required
b n − t > λ no λ-constrained crashes in RENAMING f
Anaïs Durand Contention-Related Crash Failures 32/25
(1)
PART.write(up);
(2)
repeat
(3)
parti := PART.snapshot();
(4)
counti := |{x such that parti[x] = up}|;
(5)
until counti ≥ n − t end repeat;
a ≤ t crashes + participation required
b n − t > λ no λ-constrained crashes in RENAMING f
c participation not required for RENAMING f + properties of
Anaïs Durand Contention-Related Crash Failures 32/25
λ = n − t − 1 t = m + f , m ≥ 0, 0 ≤ f ≤ X
total # of faults t = m + f λ-constrained crashes m any-time crashes f ≤ X
(1)
PART.write(up);
(2)
repeat
(3)
parti := PART.snapshot();
(4)
counti := |{x such that parti[x] = up}|;
(5)
until counti ≥ n − t end repeat;
(6)
resi := OB.op(ini);
(7)
return(resi);
Anaïs Durand Contention-Related Crash Failures 33/25