In Search of Lost Time
Bernadette Charron-Bost CNRS / Ecole Polytechnique, France Martin Hutle EPFL, Switzerland Josef Widder TU Wien, Austria tubs.CITY Symposium / Braunschweig July 1, 2009
1/32
In Search of Lost Time Bernadette Charron-Bost CNRS / Ecole - - PowerPoint PPT Presentation
In Search of Lost Time Bernadette Charron-Bost CNRS / Ecole Polytechnique, France Martin Hutle EPFL, Switzerland Josef Widder TU Wien, Austria tubs.CITY Symposium / Braunschweig July 1, 2009 1/32 failure detector approach consensus
Bernadette Charron-Bost CNRS / Ecole Polytechnique, France Martin Hutle EPFL, Switzerland Josef Widder TU Wien, Austria tubs.CITY Symposium / Braunschweig July 1, 2009
1/32
◮ consensus
◮ solvable in synchronous systems ◮ not solvable in asynchronous systems [FLP85]
◮ FD approach: an effort to establish intermediate assumptions
that allows solving consensus
◮ motivation: in an async. system, a slow process cannot be
distinguished from a crashed one
◮ replace timing assumptions ◮ provide processes with information about failures 2/32
◮ [DDS87] study 5 synchrony parameters
◮ 32 models ◮ coarse frontier between models where consensus is solvable and
not
◮ FD models allows a comparatively fine grained classification of
models [CHT96]
◮ this advantage is widely accepted ◮ disadvantages mostly ignored / not investigated
3/32
t good period partial synchrony [DLS88] Paxos [Lam98] (established 1989): A finite good period is sufficient to solve Consensus.
4/32
t good period partial synchrony [DLS88] Paxos [Lam98] (established 1989): A finite good period is sufficient to solve Consensus. consensus is solvable in practice
4/32
t good period Ω: “There is a time after which all the correct processes always trust the same correct process.” By the proof in [CHT96]: Ω is necessary! A good period which lasts forever is necessary for Consensus.
5/32
t good period Ω: “There is a time after which all the correct processes always trust the same correct process.” By the proof in [CHT96]: Ω is necessary! A good period which lasts forever is necessary for Consensus. consensus is not solvable in practice
5/32
◮ explain the paradox
◮ the problem is inherent to the model
◮ in parallel we found fundamental problems with the relation to
compare FDs
◮ both problems have the same origin ◮ we will discuss this origin in this talk
6/32
The Model
7/32
◮ set Π of n processes
8/32
◮ set Π of n processes ◮ discrete time base T
8/32
◮ set Π of n processes ◮ discrete time base T
(we employ T = I N)
8/32
◮ set Π of n processes ◮ discrete time base T
◮ failure pattern F : T → Π such that F(t) ⊆ F(t + 1) 8/32
◮ set Π of n processes ◮ discrete time base T
◮ failure pattern F : T → Π such that F(t) ⊆ F(t + 1)
◮ and two layers. . .
8/32
◮ history H : Π × T → R
(range R)
9/32
◮ history H : Π × T → R
(range R)
◮ defined for every time
9/32
◮ history H : Π × T → R
(range R)
◮ defined for every time
Example: R =
, ,
◮ history H : Π × T → R
(range R)
◮ defined for every time
Example: R =
, ,
H(r, ) H(q, ) H(p, )
9/32
◮ history H : Π × T → R
(range R)
◮ defined for every time
Example: R =
, ,
H(r, ) H(q, ) H(p, )
◮ failure detector D : F → D(F) = {H1, H2, . . . }
9/32
10/32
10/32
◮ Ω [CHT96] has range R = Π ◮ q = H(p, t): at time t, Ω “tells” p that q is correct ◮ from some time on
◮ Ω provides the same information to all correct processes, and ◮ this information is correct
∀F ∈ E, ∀H ∈ Ω(F), ∃t0 ∈ T , ∃q ∈ correct(F), ∀p ∈ correct(F), ∀t > t0 : H(p, t) = q.
10/32
11/32
11/32
failure detector C
◮ range R = I
N
◮ HC : ∀p, ∀t, HC(p, t) = t ◮ C : F → {HC}
11/32
failure detector C
◮ range R = I
N
◮ HC : ∀p, ∀t, HC(p, t) = t ◮ C : F → {HC} ◮ perfect global clock is a FD ◮ reason:
◮ FDs are defined via histories ◮ a history is a function of time 11/32
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5 6
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5 6 7
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5 6 7 8
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5 6 7 8 9
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5 6 7 8 9 10
12/32
◮ algorithm is a collection of automata ◮ step: receive + query + state transition + send ◮ schedule: sequence of steps ◮ run: steps at certain times
p q r t 1 2 3 4 5 6 7 8 9 10 p q r t
12/32
t H(r) H(q) H(p)
13/32
t H(r) H(q) H(p) t p q r
13/32
t H(r) H(q) H(p) t p q r
13/32
Comparison Relations
14/32
◮ D D′ if ∃ Algorithm A that transforms D into D′ ◮ transformation: algorithm has variable outputp at each p ◮ Op history of outputp ◮ O must be in D′
failure detector layer asynchronous system layer D D′ A
15/32
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
H(p) ∈ D p
O ∈ D′
H(p) ∈ D p
O ∈ D′ stuttering!
16/32
Failure detector P+
Instantaneous strong completeness
∀F, ∀H ∈ D(F), ∀t ∈ T ,
∀p ∈ correct(F), ∀q ∈ Π : q ∈ F(t) ⇒ q ∈ H(p, t) “every failure is instantaneously detected” Strong accuracy
∀F, ∀H ∈ D(F), ∀t ∈ T ,
∀p, q ∈ Π − F(t) : q ∈ H(p, t) “no wrong suspicions”
17/32
18/32
F(t)
∅ ∅ q q · · ·
failure pattern
18/32
F(t)
∅ ∅ q q · · ·
failure pattern H(p, t)
∅ ∅ q q · · ·
history of P+
18/32
F(t)
∅ ∅ q q · · ·
failure pattern H(p, t)
∅ ∅ q q · · ·
history of P+ · · · steps of p
18/32
F(t)
∅ ∅ q q · · ·
failure pattern H(p, t)
∅ ∅ q q · · ·
history of P+ · · · steps of p O(p, t)
∅ ∅ ∅ q · · ·
18/32
F(t)
∅ ∅ q q · · ·
failure pattern H(p, t)
∅ ∅ q q · · ·
history of P+ · · · steps of p O(p, t)
∅ ∅ ∅ q · · ·
O ∈ P+. . . is not reflexive!
18/32
“No problem, lets just consider the reflexive closure of . . . ”
19/32
failure pattern-wise inclusion of histories
D ⊑ D′ ⇔ ∀F ∈ E : D(F) ⊆ D′(F)
20/32
failure pattern-wise inclusion of histories
D ⊑ D′ ⇔ ∀F ∈ E : D(F) ⊆ D′(F)
◮ very natural
◮ D′ allows more possible histories ◮ D is “more precise”
◮ good relation should extend ⊑
20/32
failure pattern-wise inclusion of histories
D ⊑ D′ ⇔ ∀F ∈ E : D(F) ⊆ D′(F)
◮ very natural
◮ D′ allows more possible histories ◮ D is “more precise”
◮ good relation should extend ⊑
There is an FD S+ with P+ ⊑ S+ and P+ S+.
20/32
“No problem, lets just add the history-wise inclusion . . . ” The patch rev.1.02
D ∗ D′ ⇔ (D D′) ∨ (D ⊑ D′)
21/32
“No problem, lets just add the history-wise inclusion . . . ” The patch rev.1.02
D ∗ D′ ⇔ (D D′) ∨ (D ⊑ D′) . . . investigate ∗:
◮ in async. systems: real-time properties not relevant ◮ two FDs that give information on failures of same “quality”
and differ only in the time domain should be equivalent
◮ study of time contraction
21/32
◮ Θ is a sequence of values from T
22/32
◮ Θ is a sequence of values from T ◮ e.g. Θ = 2, 3, 6, 11 · · ·
H(r, ) · · · H(q, ) · · · H(p, ) · · · t Θ.H(p, ) Θ.H(q, ) Θ.H(r, )
◮ Θ is a sequence of values from T ◮ e.g. Θ = 2, 3, 6, 11 · · ·
H(r, ) · · · H(q, ) · · · H(p, ) · · · t Θ.H(p, ) Θ.H(q, ) Θ.H(r, )
◮ Θ is a sequence of values from T ◮ e.g. Θ = 2, 3, 6, 11 · · ·
H(r, ) · · · H(q, ) · · · H(p, ) · · · t Θ.H(p, ) Θ.H(q, ) Θ.H(r, )
◮ Θ is a sequence of values from T ◮ e.g. Θ = 2, 3, 6, 11 · · ·
H(r, ) · · · H(q, ) · · · H(p, ) · · · t Θ.H(p, ) Θ.H(q, ) Θ.H(r, )
◮ Θ is a sequence of values from T ◮ e.g. Θ = 2, 3, 6, 11 · · ·
H(r, ) · · · H(q, ) · · · H(p, ) · · · t Θ.H(p, ) Θ.H(q, ) Θ.H(r, ) · · · · · · · · ·
◮ Θ is a sequence of values from T ◮ e.g. Θ = 2, 3, 6, 11 · · ·
H(r, ) · · · H(q, ) · · · H(p, ) · · · t Θ.H(p, ) Θ.H(q, ) Θ.H(r, ) · · · · · · · · · all possible time contractions
D
22/32
H(p, ) · · · Θ.H(p, ) · · ·
23/32
H(p, ) · · · Θ.H(p, ) · · · run (p) using D run (p) using D
23/32
H(p, ) · · · Θ.H(p, ) · · · run (p) using D run (p) using D
◮
D and D only differ in their “relation to time”
◮ any asynchronous algorithm cannot distinguish D from
D
23/32
◮
D and D only differ in their “relation to time”
◮ and
24/32
◮
D and D only differ in their “relation to time”
◮ and
Theorem TC. An algorithm A solves a problem P using D if and only if A solves P using D
24/32
◮
D and D only differ in their “relation to time”
◮ and
Theorem TC. An algorithm A solves a problem P using D if and only if A solves P using D
◮ D and
D allow solving the same problems
◮ with the same algorithm ◮ but D and
D may not be equivalent
◮ e.g.,
C ∗ C (perfect clock)
24/32
failure detector layer asynchronous system layer D D′ A
25/32
failure detector layer asynchronous system layer D D′ A less operational relation would be required. . .
25/32
Time contraction also explains the DLS/Paxos vs. Ω paradox
26/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
27/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
consequences:
◮
WP ∈ ∆P (Theorem TC)
27/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
consequences:
◮
WP ∈ ∆P (Theorem TC)
◮ if extends ⊑ then
27/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
consequences:
◮
WP ∈ ∆P (Theorem TC)
◮ if extends ⊑ then
◮ WP
WP
27/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
consequences:
◮
WP ∈ ∆P (Theorem TC)
◮ if extends ⊑ then
◮ WP
WP
◮ from Theorem TC:
WP ∈ ∆P
27/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
consequences:
◮
WP ∈ ∆P (Theorem TC)
◮ if extends ⊑ then
◮ WP
WP
◮ from Theorem TC:
WP ∈ ∆P
◮ from def. of weakest:
WP WP
27/32
∆ a set of FDs that is closed w.r.t. Θ-contraction. ∆P ⊆ ∆ FDs in ∆ that can be used to solve problem P
Weakest failure detector
A failure detector WP is a weakest to solve P if
◮ WP ∈ ∆P ◮ ∀D ∈ ∆P : D WP
consequences:
◮
WP ∈ ∆P (Theorem TC)
◮ if extends ⊑ then
◮ WP
WP
◮ from Theorem TC:
WP ∈ ∆P
◮ from def. of weakest:
WP WP
◮
WP and WP are equivalent w.r.t.
27/32
Failure detector with “good periods”: t H ∈ D φ φ
28/32
Failure detector with “good periods”: t H ∈ D φ φ t Θ.H ∈ D
28/32
Failure detector with “good periods”: t H ∈ D φ φ t Θ.H ∈ D
◮
D and D allow solving the same problems
◮ D has good period of just finite length ⇒ D cannot be used
to solve non-trivial problems
◮ every weakest FD to solve some non-trivial problem has
eventually forever good period (independently of chosen)
28/32
due to two-layered structure of the FD model:
◮ shortcomings of classic relation
29/32
due to two-layered structure of the FD model:
◮ shortcomings of classic relation ◮ in the FD world, fine grained sufficient conditions for
consensus are eliminated from the very beginning
29/32
due to two-layered structure of the FD model:
◮ shortcomings of classic relation ◮ in the FD world, fine grained sufficient conditions for
consensus are eliminated from the very beginning
◮ eventually forever is necessary to solve non-trivial problems 29/32
due to two-layered structure of the FD model:
◮ shortcomings of classic relation ◮ in the FD world, fine grained sufficient conditions for
consensus are eliminated from the very beginning
◮ eventually forever is necessary to solve non-trivial problems ◮ lower bound results are invalidated by “real systems” 29/32
due to two-layered structure of the FD model:
◮ shortcomings of classic relation ◮ in the FD world, fine grained sufficient conditions for
consensus are eliminated from the very beginning
◮ eventually forever is necessary to solve non-trivial problems ◮ lower bound results are invalidated by “real systems” ◮ models that restrict the executions appear more appropriate ◮ restrictions internally to the system instead of augmenting
information
29/32
due to two-layered structure of the FD model:
◮ shortcomings of classic relation ◮ in the FD world, fine grained sufficient conditions for
consensus are eliminated from the very beginning
◮ eventually forever is necessary to solve non-trivial problems ◮ lower bound results are invalidated by “real systems” ◮ models that restrict the executions appear more appropriate ◮ restrictions internally to the system instead of augmenting
information
recently [JT08] provided new relation that escapes some problems but cannot escape the paradox (which is independent of )
29/32
30/32
Bernadette Charron-Bost, Martin Hutle, and Josef Widder. In search of lost time. Technical Report LSR-REPORT-2008-006, EPFL, 2008. Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685–722, June 1996. Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, March 1996. Danny Dolev, Cynthia Dwork, and Larry Stockmeyer. On the minimal synchronism needed for distributed consensus. Journal of the ACM, 34(1):77–97, January 1987.
31/32
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323, April 1988. Michael J. Fischer, Nancy A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985. Prasad Jayanti and Sam Toueg. Every problem has a weakest failure detector. In Proceedings of the 27th ACM symposium on Principles of Distributed Computing (PODC), pages 75–84. ACM, 2008. Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169, May 1998.
32/32