two-phase commit / security (start) 1 Changelog Changes made in - - PowerPoint PPT Presentation

two phase commit security start
SMART_READER_LITE
LIVE PREVIEW

two-phase commit / security (start) 1 Changelog Changes made in - - PowerPoint PPT Presentation

two-phase commit / security (start) 1 Changelog Changes made in this version not seen in fjrst lecture: quorum: add note that part of voting is updating other nodes to latest version 1 last time (1) RPC: remote function calls like local


slide-1
SLIDE 1

two-phase commit / security (start)

1

slide-2
SLIDE 2

Changelog

Changes made in this version not seen in fjrst lecture:

quorum: add note that part of voting is updating other nodes to latest version

1

slide-3
SLIDE 3

last time (1)

RPC: remote function calls like local

interface description language compiled into stubs (wrapper functions) marshalling (AKA serialization) of arguments/return value into bytes

NFS: fjle operations into remote procedure calls NFS is stateless operation

server uses fjle IDs — give inode number client remembers fd to fjle ID mapping nothing to recover on server failure nothing for server to forget on client failure

2

slide-4
SLIDE 4

last time (2)

close-to-open consistency

check for updates on open, write fjle on close idea: inconsistent behavior if two processes open fjle at once okay

AFS: callbacks on write rather than proactive checks

…but server still needs to know about write to callback

3

slide-5
SLIDE 5

fjle locking

so, your program doesn’t like confmicting writes what can you do? if offmine operation, probably not much…

  • therwise fjle locking

except it often doesn’t work on NFS, etc.

4

slide-6
SLIDE 6

advisory fjle locking with fcntl

int fd = open(...); struct flock lock_info = { .l_type = F_WRLCK, // write lock; RDLOCK also available // range of bytes to lock: .l_whence = SEEK_SET, l_start = 0, l_len = ... }; /* set lock, waiting if needed */ int rv = fcntl(fd, F_SETLKW, &lock_info); if (rv == −1) { /* handle error */ } /* now have a lock on the file */ /* unlock --- could also close() */ lock_info.l_type = F_UNLCK; fcntl(fd, F_SETLK, &lock_info);

5

slide-7
SLIDE 7

advisory locks

fcntl is an advisory lock doesn’t stop others from accessing the fjle… unless they always try to get a lock fjrst

6

slide-8
SLIDE 8

POSIX fjle locks are horrible

actually two locking APIs: fcntl() and fmock() fcntl: not inherited by fork fcntl: closing any fd for fjle release lock

even if you dup2’d it!

fcntl: maybe sometimes works over NFS? fmock: less likely to work over NFS, etc.

7

slide-9
SLIDE 9

fcntl and NFS

seems to require extra state at the server typical implementation: separate lock server not a stateless protocol

8

slide-10
SLIDE 10

lockfjles

use a separate lockfjle instead of “real” locks

e.g. convention: use NOTES.txt.lock as lock fjle

lock: create a lockfjle with link() or open() with O_EXCL

can’t lock: link()/open() will fail “fjle already exists” for current NFSv3: should be single RPC calls that always contact server some (old, I hope?) systems: link() atomic, open() O_EXCL not

unlock: remove the lockfjle

annoyance: what if program crashes, fjle not removed?

9

slide-11
SLIDE 11

failure models

how do machines fail?… well, lots of ways

10

slide-12
SLIDE 12

two models of machine failure

fail-stop failing machines stop responding

  • r one always detects they’re broken and can ignore them

Byzantine failures failing machiens do the worst possible thing

11

slide-13
SLIDE 13

dealing with machine failure

recover when machine comes back up

does not work for Byzantine failures

rely on a quorum of machines working

requires 1 extra machine for fail-stop requires 3F + 1 to handle F failures with Byzantine failures

12

slide-14
SLIDE 14

distributed transaction problem

distributed transaction two machines both agree to do something or not do something even if a machine fails

13

slide-15
SLIDE 15

distributed transaction example

course database across many machines machine A and B: student records machine C: course records want to make sure machines agree to add students to course …even if one machine fails no confusion about student is in course

14

slide-16
SLIDE 16

the centralized solution

  • ne solution: a new machine D decides what to do

for machines A-C which store records

machine D maintains a redo log for all machines treats them as just data storage problem: we’d like machines to work indepdently

not really taking advantage of distributed why did we split student records across two machines anyways?

15

slide-17
SLIDE 17

the centralized solution

  • ne solution: a new machine D decides what to do

for machines A-C which store records

machine D maintains a redo log for all machines treats them as just data storage problem: we’d like machines to work indepdently

not really taking advantage of distributed why did we split student records across two machines anyways?

15

slide-18
SLIDE 18

decentralized solution sketch

want each machine to be responsible just for their own data

  • nly coordinate when transaction crosses machine

e.g. changing course + student records

  • nly coordinate with involved machines

hopefully, scales to tens or hundreds of machines

typical transaction would involve 1 to 3 machines?

16

slide-19
SLIDE 19

distributed transactions and failures

extra tool: persistent log idea: machine remembers what happen on failure same idea as redo log: record what to do in log

preview: whether trying to do/not do action

…but need to handle if machine stopped while writing log

17

slide-20
SLIDE 20

two-phase commit: setup

every machine votes on transaction commit — do the operation (add student A to class) abort — don’t do it (something went wrong require unanimity to commit

  • therwise, default=abort

18

slide-21
SLIDE 21

two-phase commit: phases

phase 1: preparing each machine states their intention: commit/abort phase 2: fjnishing gather intentions, fjgure out whether to do/not do it

19

slide-22
SLIDE 22

preparing

agree to commit

promise: “I will accept this transaction” promise recorded in the machine log in case it crashes

agree to abort

promise: “I will not accept this transaction” promise recorded in the machine log in case it crashes

never ever take back agreement!

to keep promise: can’t allow interfering operations e.g. agree to add student to class reserve seat in class (even though student might not be added)

20

slide-23
SLIDE 23

preparing

agree to commit

promise: “I will accept this transaction” promise recorded in the machine log in case it crashes

agree to abort

promise: “I will not accept this transaction” promise recorded in the machine log in case it crashes

never ever take back agreement!

to keep promise: can’t allow interfering operations e.g. agree to add student to class → reserve seat in class (even though student might not be added)

20

slide-24
SLIDE 24

fjnishing

learn all machines agree to commit: commit transaction

actually apply transaction (e.g. record student is in class) record decision in local log

learn any machine agreed to abort: abort transaction

don’t ever try to apply transaction record decision in local log

unsure which? just ask everyone what they agreed to do

they can’t change their mind once they tell you

21

slide-25
SLIDE 25

fjnishing

learn all machines agree to commit: commit transaction

actually apply transaction (e.g. record student is in class) record decision in local log

learn any machine agreed to abort: abort transaction

don’t ever try to apply transaction record decision in local log

unsure which? just ask everyone what they agreed to do

they can’t change their mind once they tell you

21

slide-26
SLIDE 26

two-phase commit: blocking

agree to commit “add student to class”? can’t allow confmicting actions…

adding student to confmicting class? removing student from the class? not leaving seat in class?

…until know transaction globally committed/aborted

22

slide-27
SLIDE 27

two-phase commit: blocking

agree to commit “add student to class”? can’t allow confmicting actions…

adding student to confmicting class? removing student from the class? not leaving seat in class?

…until know transaction globally committed/aborted

22

slide-28
SLIDE 28

waiting forever?

machine goes away, two-phase commit state is uncertain never resolve what happens solution in practice: manual intervention

23

slide-29
SLIDE 29

two-phase commit: roles

typical two-phase commit implementation several workers

  • ne coordinator

might be same machine as a worker

24

slide-30
SLIDE 30

two-phase-commit messages

coordiantor → worker: PREPARE

“will you agree to do this action?”

  • n failure: can ask multiple times!

worker → coordinator: VOTE-COMMIT or VOTE-ABORT

I agree to commit/abort transaction worker records decision in log, returns same result each time

coordinator → worker: GLOBAL-COMMMIT or GLOBAL-ABORT

I counted the votes and the result is commit/abort

  • nly commit if all votes were commit

25

slide-31
SLIDE 31

reasoning about protocols: state machines

very hard to reason about dist. protocol correctness typical tool: state machine each machine is in some state know what every message does in this state avoids common problem: don’t know what message does

26

slide-32
SLIDE 32

reasoning about protocols: state machines

very hard to reason about dist. protocol correctness typical tool: state machine each machine is in some state know what every message does in this state avoids common problem: don’t know what message does

26

slide-33
SLIDE 33

coordinator state machine (simplifjed)

INIT WAITING ABORTED COMMITTED

send PREPARE (ask for votes) receive any AGREE-TO-ABORT send ABORT receive AGREE-TO-COMMIT from all send COMMIT accumulate votes resend PREPARE after timeout worker resends vote? gets ABORT workers resends vote? gets COMMIT

27

slide-34
SLIDE 34

coordinator state machine (simplifjed)

INIT WAITING ABORTED COMMITTED

send PREPARE (ask for votes) receive any AGREE-TO-ABORT send ABORT receive AGREE-TO-COMMIT from all send COMMIT accumulate votes resend PREPARE after timeout worker resends vote? gets ABORT workers resends vote? gets COMMIT

27

slide-35
SLIDE 35

coordinator state machine (simplifjed)

INIT WAITING ABORTED COMMITTED

send PREPARE (ask for votes) receive any AGREE-TO-ABORT send ABORT receive AGREE-TO-COMMIT from all send COMMIT accumulate votes resend PREPARE after timeout worker resends vote? gets ABORT workers resends vote? gets COMMIT

27

slide-36
SLIDE 36

coordinator state machine (simplifjed)

INIT WAITING ABORTED COMMITTED

send PREPARE (ask for votes) receive any AGREE-TO-ABORT send ABORT receive AGREE-TO-COMMIT from all send COMMIT accumulate votes resend PREPARE after timeout worker resends vote? gets ABORT workers resends vote? gets COMMIT

27

slide-37
SLIDE 37

coordinator failure recovery

duplicate messages okay — unique transaction ID! coordinator crashes? log indicating last state

log written before sending any messages if INIT: resend PREPARE, if WAIT/ABORTED: send ABORT to all (dups okay!) if COMMITTED: resend COMMIT to all (dups okay!)

message doesn’t make it to worker?

coordinator can resend PREPARE after timeout (or just ABORT) worker can resend vote to coordinator to get extra reply

28

slide-38
SLIDE 38

coordinator failure recovery

duplicate messages okay — unique transaction ID! coordinator crashes? log indicating last state

log written before sending any messages if INIT: resend PREPARE, if WAIT/ABORTED: send ABORT to all (dups okay!) if COMMITTED: resend COMMIT to all (dups okay!)

message doesn’t make it to worker?

coordinator can resend PREPARE after timeout (or just ABORT) worker can resend vote to coordinator to get extra reply

28

slide-39
SLIDE 39

worker state machine (simplifjed)

INIT AGREED-TO-COMMIT COMMITTED ABORTED

recv PREPARE send AGREE-TO-COMMIT recv PREPARE send AGREE-TO-ABORT recv ABORT recv COMMIT

29

slide-40
SLIDE 40

worker failure recovery

duplicate messages okay — unqiue transaction ID! worker crashes? log indicating last state

if INIT: wait for PREPARE (resent)? if AGREE-TO-COMMIT or ABORTED: resend AGREE-TO-COMMIT/ABORT if COMMITTED: redo operation

message doesn’t make it to coordinator

resend after timeout or during reboot on recovery

30

slide-41
SLIDE 41

state machine missing details

really want to specify result of/action for every message! allows verifying properties of state machine

what happens if machine fails at each possible time? what happens if possible message is lost? …

31

slide-42
SLIDE 42

TPC: normal operation

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT

log: state=WAIT log: state=AGREED-TO-COMMIT log: state=COMMIT

32

slide-43
SLIDE 43

TPC: normal operation

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT

log: state=WAIT log: state=AGREED-TO-COMMIT log: state=COMMIT

32

slide-44
SLIDE 44

TPC: normal operation — confmict

coordinator worker 1 worker 2 PREPARE AGREE-TO- ABORT AGREE-TO- COMMIT ABORT

class is full! log: state=ABORT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=ABORT

33

slide-45
SLIDE 45

TPC: normal operation — confmict

coordinator worker 1 worker 2 PREPARE AGREE-TO- ABORT AGREE-TO- COMMIT ABORT

class is full! log: state=ABORT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=ABORT

33

slide-46
SLIDE 46

TPC: worker failure (1)

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- ABORT ABORT

  • n reboot — didn’t record transaction

abort it (proactively/when coord. retries)

34

slide-47
SLIDE 47

TPC: worker failure (1)

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- ABORT ABORT

  • n reboot — didn’t record transaction

abort it (proactively/when coord. retries)

34

slide-48
SLIDE 48

TPC: worker failure (2)

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT

record agree-to-commit

  • n reboot — resend logged message

35

slide-49
SLIDE 49

TPC: worker failure (2)

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT

record agree-to-commit

  • n reboot — resend logged message

35

slide-50
SLIDE 50

TPC: worker failure (3)

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT

record agree-to-commit

  • n reboot — resend logged message

36

slide-51
SLIDE 51

TPC: worker failure (3)

coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT

record agree-to-commit

  • n reboot — resend logged message

36

slide-52
SLIDE 52

extending voting

two-phase commit: unanimous vote to commit assumption: data split across nodes, every must cooperate

  • ther model: every node has a copy of data

goal: work despite a few failing nodes just require “enough” nodes to be working for now — assume fail-stop

nodes don’t respond or tell you if broken

37

slide-53
SLIDE 53

extending voting

two-phase commit: unanimous vote to commit assumption: data split across nodes, every must cooperate

  • ther model: every node has a copy of data

goal: work despite a few failing nodes just require “enough” nodes to be working for now — assume fail-stop

nodes don’t respond or tell you if broken

37

slide-54
SLIDE 54

quorums (1)

A B C D E

perform read/write with vote of any quorum of nodes any quorum enough — okay if some nodes fail if A, C, D agree: that’s enough B, E will fjgure out what happened when they come back up

38

slide-55
SLIDE 55

quorums (1)

A B C D E

perform read/write with vote of any quorum of nodes any quorum enough — okay if some nodes fail if A, C, D agree: that’s enough B, E will fjgure out what happened when they come back up

38

slide-56
SLIDE 56

quorums (2)

A B C D E

requirement: quorums overlap

  • verlap = someone in quorum knows about every update

e.g. every operation requires majority of nodes

part of voting — provide other voting nodes with ‘missing’ updates

make sure updates survive later on

cannot get a quorum to agree on anything confmicting with past updates

39

slide-57
SLIDE 57

quorums (2)

A B C D E

requirement: quorums overlap

  • verlap = someone in quorum knows about every update

e.g. every operation requires majority of nodes

part of voting — provide other voting nodes with ‘missing’ updates

make sure updates survive later on

cannot get a quorum to agree on anything confmicting with past updates

39

slide-58
SLIDE 58

quorums (2)

A B C D E

requirement: quorums overlap

  • verlap = someone in quorum knows about every update

e.g. every operation requires majority of nodes

part of voting — provide other voting nodes with ‘missing’ updates

make sure updates survive later on

cannot get a quorum to agree on anything confmicting with past updates

39

slide-59
SLIDE 59

quorums (3)

A B C D E

sometimes vary quorum based on operation type example: update quorum = 4 of 5; read quorum = 2 of 5 requirement: read overlaps with last update compromise: better performance sometimes, but tolerate less failures

40

slide-60
SLIDE 60

quorums (3)

A B C D E

sometimes vary quorum based on operation type example: update quorum = 4 of 5; read quorum = 2 of 5 requirement: read overlaps with last update compromise: better performance sometimes, but tolerate less failures

40

slide-61
SLIDE 61

quorums

A B C D E

details very tricky

what about coordinator failures? how does recovery happen? what information needs to be logged? “catching up” nodes that aren’t part of several updates

full details: lookup Raft or Paxis

41

slide-62
SLIDE 62

quorums for Byzantine failures

just overlap not enough problem: node can give inconsistent votes

tell A “I agree to commit”, tell B “I do not”

need to confjrm consistency of votes with other notes need supermajority-type quorums

f failures — 3f + 1 nodes

full details: lookup PBFT

42

slide-63
SLIDE 63

protection/security

protection: mechanisms for controlling access to resources

page tables, preemptive scheduling, encryption, …

security: using protection to prevent misuse

misuse represented by policy e.g. “don’t expose sensitive info to bad people”

this class: about mechanisms more than policies goal: provide enough fmexibility for many policies

43

slide-64
SLIDE 64

adversaries

security is about adversaries do the worst possible thing challenge: adversary can be clever…

44

slide-65
SLIDE 65

authorization v authentication

authentication — who is who authorization — who can do what

probably need authentication fjrst…

45

slide-66
SLIDE 66

authorization v authentication

authentication — who is who authorization — who can do what

probably need authentication fjrst…

45

slide-67
SLIDE 67

authentication

password hardware token … this class: mostly won’t deal with how just tracking afterwards

46

slide-68
SLIDE 68

authentication

password hardware token … this class: mostly won’t deal with how just tracking afterwards

46

slide-69
SLIDE 69

access control matrix: who does what?

fjle 1 fjle 2 process 1 domain 1 read/write domain 2 read write wakeup domain 3 read write kill each process belongs to 1+ protection domains: “user cr4bd” “group csfaculty” …

  • bjects (whatever type) with restrictions

47

slide-70
SLIDE 70

access control matrix: who does what?

fjle 1 fjle 2 process 1 domain 1 read/write domain 2 read write wakeup domain 3 read write kill each process belongs to 1+ protection domains: “user cr4bd” “group csfaculty” …

  • bjects (whatever type) with restrictions

47

slide-71
SLIDE 71

access control matrix: who does what?

fjle 1 fjle 2 process 1 domain 1 read/write domain 2 read write wakeup domain 3 read write kill each process belongs to 1+ protection domains: “user cr4bd” “group csfaculty” …

  • bjects (whatever type) with restrictions

47

slide-72
SLIDE 72

representing access

with objects (fjles, etc.): access control list

list of protection domains (users, groups, processes, etc.) allowed to use each item

list of (domain, object, permissions) stored “on the side”

example: AppArmor on Linux confjguration fjle with list of program + what it is allowed to access prevent, e.g., print server from writing fjles it shouldn’t

48

slide-73
SLIDE 73

49

slide-74
SLIDE 74

two general’s problem (setup)

A B “enemy”

general A and B want to agree on time to attack enemy (center)

  • nly attack if they know the other will

attack together: victory attack separately: defeat

communication mecahnism: unreliable messengers

could be captured by enemy — message lost

50

slide-75
SLIDE 75

two general’s problem

recall: both agree to attack at same time

(otherwise don’t attack — sure defeat)

general A general B

a t t a c k a t 1 1 A M ? O K ? O K , a s l

  • n

g a s y

  • u

a r e . A r e y

  • u

? Y e a h , b u t a s l

  • n

g a s I k n

  • w

y

  • u

g

  • t

t h i s m e s s a g e … I w i l l i f I k n

  • w

y

  • u

g

  • t

t h i s m e s s a g e …

B: If I don’t get a reply, was A’s message lost? Or was my message just lost?

51

slide-76
SLIDE 76

two general’s problem

recall: both agree to attack at same time

(otherwise don’t attack — sure defeat)

general A general B

a t t a c k a t 1 1 A M ? O K ? O K , a s l

  • n

g a s y

  • u

a r e . A r e y

  • u

? Y e a h , b u t a s l

  • n

g a s I k n

  • w

y

  • u

g

  • t

t h i s m e s s a g e … I w i l l i f I k n

  • w

y

  • u

g

  • t

t h i s m e s s a g e …

B: If I don’t get a reply, was A’s message lost? Or was my message just lost?

51

slide-77
SLIDE 77

impossibility

can’t gaurentee that both parties will attack …even if no messages are lost proof sketch:

some message fmips A’s state from “attacking” to “not attacking” …but what if that message is lost — contradiction

52

slide-78
SLIDE 78

relaxing assumptions

can’t get gaurentee of receiving message in practice: best approximation wait for acknowledgement retry on timeout lots of timeouts — look like machine failure

53