two-phase commit / network FSes 1 last time remote procedure calls - PowerPoint PPT Presentation

preparing agree to commit promise: “I will accept this transaction” promise recorded in the machine log in case it crashes agree to abort promise: “I will not accept this transaction” promise recorded in the machine log in case it crashes never ever take back agreement! to keep promise: can’t allow interfering operations (even though student might not be added b/c of other machines) 16 e.g. agree to add student to class → reserve seat in class

coordinator decision coordinator can’t take back global decision must record in presistent log to ensure not forgotten coordinator fails without logged decision? collect votes again 17

fjnishing worker applies transcation (e.g. record student is in class) worker never ever applies transaction still want to do operation? make a new transaction unsure which? option 1: ask coordinator e.g. worker policy: keep asking if no outcome unsure which? option 2: make sure coordinator resends outcome e.g. coordinator keeps sending outcome until it gets “yes, I got it” reply 18 coordinator says commit → commit transaction coordinator (or anyone) says abort → abort transaction

two-phase commit: roles typical two-phase commit implementation several workers one coordinator might be same machine as a worker 19

two-phase-commit messages “will you agree to do this action?” on failure: can ask multiple times! AGREE-TO-COMMIT or AGREE-TO-ABORT worker records decision in log (before sending) I counted the votes and the result is commit/abort only commit if all votes were commit 20 coordiantor → worker: PREPARE worker → coordinator: coordinator → worker: COMMIT or ABORT

TPC: normal operation coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=COMMIT 21

TPC: normal operation — confmict coordinator worker 1 worker 2 PREPARE AGREE-TO- ABORT AGREE-TO- COMMIT ABORT class is full! log: state=ABORT log: state=WAIT log: state=AGREED-TO-COMMIT log: state=ABORT 22

exercise (1) under what circumstances may a worker send vote to abort? [A] in repsonse to a duplicate PREPARE message after replying to the fjrst with a vote to commit [B] after rebooting after a crash, if its log indicates it previously decided to vote to abort, but did not receive any decisions from the coordinator [C] after rebooting after a crash, if its log indicates it previously decided to vote to commit, but did not receive any decisions from the coordinator [D] after sending a vote to commit, but detecting that the coordinator crashed and has been down for a very long time 23

exercise (2) under what circumstances may a coordinator send a decision to abort? [A] when rebooting after a crash, after having last sent a request to vote to all but one worker and receiving votes to commit from all workers contacted [B] when rebooting after a crash, when the log indicates that the last thing the coordinator did was deciding to commit but the log doesn’t indicate that any workers were contacted [C] after successfully sending a request for a vote to a worker, but not receiving the reply due to a network problem 24

two-phase commit: blocking agree to commit “add student to class”? can’t allow confmicting actions… adding student to confmicting class? removing student from the class? not leaving seat in class? …until know transaction globally committed/aborted 25

waiting forever? if machine goes away at wrong time, might never decide what happens solution in practice: manual intervention 26

reasoning about protocols: state machines very hard to reason about dist. protocol correctness each machine is in some state know what every message does in this state avoids common problem: don’t know what message does 27 typical tool: state machine

coordinator state machine (simplifjed?) receive AGREE-TO-COMMIT from all resend COMMIT if needed resend ABORT if needed after timeout/failure resend PREPARE accumulate votes send COMMIT send ABORT INIT or no reply from worker receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 28

coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message

coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent 29 coordinator crashes? log indicating last state → resend last message

coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message

coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! in assignment: worker sends acknowledgment; arrange retry if no ack worker doesn’t get COMMIT/ABORT? worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message or, if allowed, maybe send ABORT

coordinator failure recovery workers need to handle duplicate messages! assignment: you throw exception; we’ll restart (easier testing) normal strategy: wait for timeout, then resend using gRPC — so have return value from “COMMIT” RPC in assignment, errors detected only at coordinator haven’t sent commit? can abort instead (simpler?) coordinators need to handle duplicate replies! other option: worker asks again after timeout duplicate messages okay — unique transaction ID! worker doesn’t get COMMIT/ABORT? or, if allowed, maybe send ABORT worst case: log written, but message not sent coordinator crashes? log indicating last state 29 → resend last message in assignment: worker sends acknowledgment; arrange retry if no ack

coordinator state machine (less simplifjed?) resend PREPARE resend COMMIT vote/failure/timeout: resend ABORT vote/failure/timeout: store + tally vote: (or send ABORT) failure/timeout: INIT send COMMIT receive AGREE-TO-COMMIT from all send ABORT receive any AGREE-TO-ABORT send PREPARE to all COMMITTED ABORTED WAITING 30

worker state machine (simplifjed) INIT AGREED-TO-COMMIT COMMITTED ABORTED recv PREPARE send AGREE-TO-COMMIT recv PREPARE send AGREE-TO-ABORT recv ABORT recv COMMIT 31

worker state machine (less simplifjed?) INIT AGREED-TO-COMMIT COMMITTED ABORTED recv PREPARE send AGREE-TO-COMMIT recv PREPARE send AGREE-TO-ABORT recv ABORT recv COMMIT recv PREPARE (re)send AGREE-TO-ABORT recv PREPARE resend AGREE-TO-COMMIT 32

worker failure recovery worker crashes? log indicating last state log written before acting on that state if INIT: wait for PREPARE (resent)? if AGREE-TO-COMMIT or ABORTED: resend AGREE-TO-COMMIT/ABORT if COMMITTED: redo operation (just like redo logging) 33

state machine missing details really want to specify result of/action for every message! worker recv ABORT in ABORTED: do nothing worker recv ABORT in INIT: go to ABORTED worker recv PREPARE in COMMITTED: ignore? … everything specifjed: machine checkable? want to discard fjnished transactions eventually 34

worker failure during prepare worker failure after prepare without sending vote? option 1: coordinator retries prepare option 2: coordinator gives up, sends abort option 3: worker resends vote proactively 35

TPC: worker fails after prepare (1a) COMMIT guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends as if never received on reboot: didn’t record transaction assignment: coord crash+reboot coordinator timeout COMMIT coordinator AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 37

TPC: worker fails after prepare (1b) coordinator timeout guess: message lost or worker broke (assignment: coordinator crashes, testing code reboots) after timeout – coordinator resends not sure whether decision received on reboot: read log recorded in log: agree-to-commit assignment: coord crash+reboot COMMIT coordinator COMMIT AGREE-TO- PREPARE COMMIT AGREE-TO- PREPARE worker 2 worker 1 38

TPC: worker fails after prepare (2) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT ABORT didn’t have time to log response? coordinator gives up, votes to abort doesn’t care about worker 2’s vote anymore 40

TPC: worker fails after prepare (3) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT record agree-to-commit on reboot — can proactively resend vote 42

network failure after during voting? same options: coordinator resends PREPARE coordinator gives up worker resends vote 43 network failure during voting ≈ node failure

TPC: network failure (1) coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT ABORT 44

worker failure during commit worker failure during commit? option 1: coordinator resends outcome somehow? requires acknowledgements from worker required for assignment option 2: worker resends vote (coordinator resends outcome) NB: coordinator cannot give up 45

aside: worker ACKs coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT ack-commit assignment: worker sends response from COMMIT (no extra work: Commit is RPC call with return value) if not received, coordinator knows something wrong 46

worker failure during commit worker failure during commit? option 1: coordinator resends outcome somehow? requires acknowledgements from worker required for assignment option 2: worker resends vote (coordinator resends outcome) NB: coordinator cannot give up 47

coordinator resend automatically coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT COMMIT could detect missing ACK and resend but how many times to retry? how long to wait? would complicate testing COMMIT 48

TPC: worker revoting coordinator worker 1 worker 2 PREPARE AGREE-TO- COMMIT AGREE-TO- COMMIT COMMIT COMMIT record agree-to-commit on reboot — resend vote coordinator resends decision 49

two-phase commit assignment two phase commit assignment store single value across workers single coordinator sends messages to/from workers to change values workers current value can be queried directly goal: several replicas all have same value or unavailable …even if failures 50

assignment: RPC coordinator talks to worker by making RPC calls workers only talk to coordinator by replying to RPC example: make ”prepare” call, worker’s ”agree-to-X” is return value RPC system detects worker being down, network errors, etc. become Python exception in coordinator coordinator verifjes Commit/Abort received instead of worker asking again automatic: Commit/Abort message is RPC call; RPC call fails if problem 51

assignment: failure recovery to simplify assignment: always return error if you detect failure assume testing code/user will restart the coordinator+workers coordinator sends messages to workers on reboot to recover resend prepare or commit, abort, etc. 52

assignment: failure types send RPC and it gets lost it gets sent, but acknowledgment/reply is lost it gets sent, but delayed until after another RPC 53

assignment: failure types send RPC and it gets lost it gets sent, but acknowledgment/reply is lost it gets sent, but delayed until after another RPC 54

TPC: reordering id= 0 solution: resent later (timeout or coordinator recovery) fjrst prepare message didn’t get to worker 2 one solution: unique/increasing ID numbers problem: need to know this is an old message but maybe prepare wasn’t really lost… id= 1 PREPARE id= 0 COMMIT AGREE-TO- COMMIT coordinator id= 0 COMMIT AGREE-TO- (resent) id= 0 PREPARE id= 0 PREPARE worker 2 worker 1 55

message reordering and assignment assignment: you need to worry about reordering connections prevent reordering, but… RPC system doesn’t prevent it: can use multiple connections problem: old request seems to fail , but is actually slow you repeat old request again solution: sequence numbers or transactions ID and/or timestamps some way to tell “this is old” 56 later on slow old request reaches machine → must be ignored!

other model: every node has a copy of data extending voting two-phase commit: unanimous vote to commit assumption: data split across nodes, every must cooperate goal: work (including updates!) despite a few failing nodes just require “enough” nodes to be working for now — assume fail-stop nodes don’t respond or tell you if broken 57

extending voting two-phase commit: unanimous vote to commit assumption: data split across nodes, every must cooperate goal: work (including updates!) despite a few failing nodes just require “enough” nodes to be working for now — assume fail-stop nodes don’t respond or tell you if broken 57 other model: every node has a copy of data

backup slides 58

two-phase commit / network FSes 1 last time remote procedure calls - PowerPoint PPT Presentation

two-phase commit / network FSes 1 last time remote procedure calls imitate function/method call interface extra setup: where is server interface description language to specify interface extra concerns: portability (language + machine),

How To Make Your Commit Seen? Marta Rybczyska Akademy 2012, Tallin Commit message? What for?

DATABSE SYSTEMS CONSENSUS ON TRANSACTION COMMIT. TODS06 MADE BY- ARCHIT GARG 1 Agenda

Easy Commit: A Non-blocking Two-phase Commit Protocol Suyash Gupta, Mohammad Sadoghi Dept. of

RPC / Network FSes 1 last time names and addresses IPv4, IPV6 addresses, routers tables

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

observability for developers How to Get from Here to There @cyen @honeycombio Christine DEV

Network FS / Access Control 1 last time two-phase commit consensus: workers + coordinator agree

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

File system fun File systems: traditionally hardest part of OS - More papers on FSes than any

Two-Phase Commit (2PC) Y Site at which Xact originates is coordinator; other sites at which it

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Growing Green Recyclean Program Introduction Two-Phase Project Phase One Phase Two

COMMIT: Jon Meredith March 15, 2017 COMMIT is Live! We have had some WINS : Very minimal

Assuming rational parties and watchtowers - Will a party commit fraud? - Will a watchtower get

getting comfortable in prod to improve your life in dev @cyen @honeycombio first, some

Evalua&ng Opera&ng System Vulnerability to Memory Errors

Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Memory Threads Formats

Social Media Reboot Justin Ramers Director of Social Media @JustinRamers June 2012 For audio

Web of Things Easier, more accessible descriptions for Web developers Dave Raggett

Verifying concurrent, crash-safe systems with Perennial Tej Chajed , Joseph Tassarotti*, Frans

Prometheus Best Practices and Beastly Pitfalls Julius Volz, August 17, 2017 Prometheus

The unbreakable, scalable elephant - Patroni automation with Ansible 18.10.2019 Who we are The

Improving Agility and Elasticity in Bare-metal Clouds Yushi Omote , Takahiro Shinagawa ,

two-phase commit / network FSes 1 last time remote procedure calls - PowerPoint PPT Presentation

two-phase commit / network FSes 1 last time remote procedure calls imitate function/method call interface extra setup: where is server interface description language to specify interface extra concerns: portability (language + machine),

How To Make Your Commit Seen? Marta Rybczyska Akademy 2012, Tallin Commit message? What for?

DATABSE SYSTEMS CONSENSUS ON TRANSACTION COMMIT. TODS06 MADE BY- ARCHIT GARG 1 Agenda

Easy Commit: A Non-blocking Two-phase Commit Protocol Suyash Gupta, Mohammad Sadoghi Dept. of

RPC / Network FSes 1 last time names and addresses IPv4, IPV6 addresses, routers tables

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

observability for developers How to Get from Here to There @cyen @honeycombio Christine DEV

Network FS / Access Control 1 last time two-phase commit consensus: workers + coordinator agree

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

File system fun File systems: traditionally hardest part of OS - More papers on FSes than any

Two-Phase Commit (2PC) Y Site at which Xact originates is coordinator; other sites at which it

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Growing Green Recyclean Program Introduction Two-Phase Project Phase One Phase Two

COMMIT: Jon Meredith March 15, 2017 COMMIT is Live! We have had some WINS : Very minimal

Assuming rational parties and watchtowers - Will a party commit fraud? - Will a watchtower get

getting comfortable in prod to improve your life in dev @cyen @honeycombio first, some

Evalua&amp;ng Opera&amp;ng System Vulnerability to Memory Errors

Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Memory Threads Formats

Social Media Reboot Justin Ramers Director of Social Media @JustinRamers June 2012 For audio

Web of Things Easier, more accessible descriptions for Web developers Dave Raggett

Verifying concurrent, crash-safe systems with Perennial Tej Chajed , Joseph Tassarotti*, Frans

Prometheus Best Practices and Beastly Pitfalls Julius Volz, August 17, 2017 Prometheus

The unbreakable, scalable elephant - Patroni automation with Ansible 18.10.2019 Who we are The

Improving Agility and Elasticity in Bare-metal Clouds Yushi Omote , Takahiro Shinagawa ,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Evalua&ng Opera&ng System Vulnerability to Memory Errors