Failure Detectors Concurrency Trilogy Part IV Announcements - - PowerPoint PPT Presentation

failure detectors
SMART_READER_LITE
LIVE PREVIEW

Failure Detectors Concurrency Trilogy Part IV Announcements - - PowerPoint PPT Presentation

Failure Detectors Concurrency Trilogy Part IV Announcements Project proposals are due tonight, unless you got an extension. Only a few hours left to submit something or seek an extension. No quiz next week. Should have gotten


slide-1
SLIDE 1

Failure Detectors

Concurrency Trilogy Part IV

slide-2
SLIDE 2

Announcements

  • Project proposals are due tonight, unless you got an extension.
  • Only a few hours left to submit something or seek an extension.
  • No quiz next week.
  • Should have gotten results for last week's quiz.
slide-3
SLIDE 3

RSMs All Over Again

slide-4
SLIDE 4

Revisiting RSMs

Application Application Application Application Ordering Ordering Ordering Ordering

Client Client Client Client

slide-5
SLIDE 5

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client

slide-6
SLIDE 6

Revisiting RSMs

Raft Raft Raft Raft

Client Client Client Client

slide-7
SLIDE 7

Revisiting RSMs

Raft Raft Raft Raft

Client Client Client Client

slide-8
SLIDE 8

Revisiting RSMs

Application Application Application Application Raft Raft Raft Raft

Client Client Client Client The act of executing a command at the application is destructive. Cannot undo a command.

slide-9
SLIDE 9

Revisiting RSMs

Application Application Application Application Raft Raft Raft Raft

Client Client Client Client Requirement: All application replicas end up in the same state.

slide-10
SLIDE 10

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client

set(x, 5)

1

M0 M1 M2 M3

slide-11
SLIDE 11

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client

set(x, 5)

1 2 2 2 AppendEntries

M0 M1 M2 M3

slide-12
SLIDE 12

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client

set(x, 5)

1 2 2 2 AppendEntries

set(x, 5)

3 4

success

5

M0 M1 M2 M3

slide-13
SLIDE 13

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client

set(x, 5)

1 2 2 2 AppendEntries

set(x, 5)

3

success

5 4

M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

slide-14
SLIDE 14

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

For which replicas is x=5?

slide-15
SLIDE 15

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

When?

slide-16
SLIDE 16

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

set(x, 5)

6

Is this safe?

slide-17
SLIDE 17

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

get(x)

a leaderCommit =-1

slide-18
SLIDE 18

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

get(x)

a

AppendEntries

b

get(x), 1, 2

leaderCommit =-1

slide-19
SLIDE 19

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

get(x)

a

get(x), 1, 2 get(x), 1, 2 get(x), 1, 2 get(x), 1, 2

AppendEntries

b leaderCommit =-1

slide-20
SLIDE 20

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

get(x)

a

get(x), 1, 2 get(x), 1, 2 get(x), 1, 2 get(x), 1, 2

AppendEntries

b

get(x)

c

Is this correct?

leaderCommit =-1

slide-21
SLIDE 21

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

get(x)

a

get(x), 1, 2 get(x), 1, 2 get(x), 1, 2 get(x), 1, 2

AppendEntries

b

set(x,5)

c leaderCommit = 0

slide-22
SLIDE 22

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

get(x)

a

get(x), 1, 2 get(x), 1, 2 get(x), 1, 2 get(x), 1, 2

AppendEntries

b

set(x,5)

c

get(x)

d

KeyValue(x, 5)

e

KeyValue(x, 5)

f leaderCommit = 1

slide-23
SLIDE 23

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

cas(x, 5, 4)

g

get(x), 1, 2 get(x), 1, 2 get(x), 1, 2 get(x), 1, 2

AppendEntries

h

cas(x, 5, 4)

i

cas(x,5,4), 2, 2 cas(x,5,4), 2, 2 cas(x,5,4), 2, 2 cas(x,5,4), 2, 2

Is this correct?

leaderCommit = 2

slide-24
SLIDE 24

Revisiting RSMs

KVStore KVStore KVStore KVStore Raft Raft Raft Raft

Client Client Client Client M0 M1 M2 M3

M0 M1 M2 M3

set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1 set(x, 5), 0, 1

Term = 2

cas(x, 5, 4)

g

get(x), 1, 2 get(x), 1, 2 get(x), 1, 2 get(x), 1, 2

AppendEntries

h

cas(x, 5, 4)

i

cas(x,5,4), 2, 2 cas(x,5,4), 2, 2 cas(x,5,4), 2, 2 cas(x,5,4), 2, 2

KeyValue(x, 4)

j

k leaderCommit = 2

slide-25
SLIDE 25

Configuration Change

slide-26
SLIDE 26

Why?

  • Want to be able to change the set of servers.
  • Take down servers for maintenance.
  • Add new servers to replace failed ones.
  • Other reasons.
slide-27
SLIDE 27

How?

  • Use a special log message which contains the set of servers.
  • Use Raft to replicate this to everyone.

Term Config Index

slide-28
SLIDE 28

How Special?

  • All peers use configuration as soon as logged.
  • Why safe?
  • We know how to revert this change.

Term Config Index

slide-29
SLIDE 29

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

...

2

C

5

slide-30
SLIDE 30

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

...

2

C

5

...

3

...

4

C

5

...

2

...

3

...

4

C

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C

5

slide-31
SLIDE 31

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

set(x, 5) set(x, 6)

1

...

2

...

2

...

3

...

4

C

5 1

...

3

...

2 1

...

3

...

2 1

...

3

What happens now?

slide-32
SLIDE 32

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

...

2

C-all

5

slide-33
SLIDE 33

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

...

2

C-all

5

...

3

...

4

...

2

...

3

...

4

C-all

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C-all

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C-all

5

slide-34
SLIDE 34

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

...

2

C-all

5

...

3

...

4

...

2

...

3

...

4

C-all

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C-all

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C-all

5

C-new

5

slide-35
SLIDE 35

Protocol

set(x, 5) set(x, 5) set(x, 6)

1

set(x, 6)

1

...

2

...

3

...

4

C-all

5

...

2

...

3

...

4

C-all

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C-all

5

set(x, 5) set(x, 6)

1

...

2

...

3

...

4

C-all

5

C-new

5

C-new

5

C-new

5

C-new

5

slide-36
SLIDE 36

Failure Detectors

slide-37
SLIDE 37

What Problem?

  • We have been depending on random timeouts, etc. to build consensus.
  • Based on partial synchrony: the network is not always behaving at its worse.
  • Tedious to model (for proofs) and tune (for deployment).
  • Abstract them away with failure detectors.
slide-38
SLIDE 38

Failure Detector

Application Failure Detector

suspect p0 is failed. suspect p0, p1 are failed. suspect p1, p2 are failed. suspect p1 is failed.

slide-39
SLIDE 39

Reasoning about Detectors

Completeness Accuracy Failed nodes:

  • When are they detected?
  • Who detects them?

Live nodes:

  • When can they be suspected?
slide-40
SLIDE 40

Reasoning about Detectors

Completeness Accuracy Strong Weak Every failed node is eventually detected by all correct nodes. Every failed node is eventually detected by some correct nodes. No correct node is ever suspected. Some correct node is never suspected by any node.

slide-41
SLIDE 41

Reasoning about Detectors

Accuracy Eventual Not Eventual

Strong Weak Eventually No correct node is ever suspected. No correct node is ever suspected. Eventually some correct node is never suspected by any node. Some correct node is never suspected by any node.

slide-42
SLIDE 42

Types of Detectors

  • Strong completeness, strong accuracy: Perfect detector (P)
  • Strong completeness, weak accuracy: Strong detector (S)
  • Strong completeness, eventual strong accuracy: ♢P
  • Strong completeness, eventual weak accuracy: ♢S or Ω
slide-43
SLIDE 43

Types of Detectors

  • Weak completeness, strong accuracy: Q
  • Weak completeness, weak accuracy: Weak Detector (W)
  • Weak completeness, eventual strong accuracy: ♢Q
  • Weak completeness, eventual weak accuracy: ♢W
slide-44
SLIDE 44

How to use Failure Detectors?

slide-45
SLIDE 45

How to build failure detectors?