commitment and mutual exclusion cs 188 distributed
play

Commitment and Mutual Exclusion CS 188 Distributed Systems - PowerPoint PPT Presentation

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015 Lecture 11 Page 1 CS 188,Winter 2015 Introduction Many distributed systems require that participants agree on something On changes to important data On


  1. Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015 Lecture 11 Page 1 CS 188,Winter 2015

  2. Introduction • Many distributed systems require that participants agree on something – On changes to important data – On the status of a computation – On what to do next • Reaching agreement in a general distributed system is challenging Lecture 11 Page 2 CS 188,Winter 2015

  3. Commitment • Reaching agreement in a distributed system is extremely important • Usually impossible to control a system’s behavior without agreement • One approach to agreement is to get all participants to prepare to agree • Then, once prepared, to take the action Lecture 11 Page 3 CS 188,Winter 2015

  4. Challenges to Commitment • There are challenges to ensuring that commitment occurs • Different nodes’ actions aren’t synchronous • Communication only via messages • Other actions can intervene • Failures can occur Lecture 11 Page 4 CS 188,Winter 2015

  5. For Example, • An optimistically replicated file system like Ficus • We want to be able to add replicas of a volume • Which is a lot easier to do if all nodes hosting existing replicas agree Lecture 11 Page 5 CS 188,Winter 2015

  6. The Scenario A B C 3 7 5 3 3 7 7 0 5 5 3 7 5 1 2 4 3 But we need a version vector D I want a element for the replica, new replica too! 0 Lecture 11 Page 6 CS 188,Winter 2015

  7. So What’s the Problem? • A and C don’t know about the new replica – But they can learn about it as soon as they contact B • So why is there any difficulty? Lecture 11 Page 7 CS 188,Winter 2015

  8. One Problem A B C 3 7 5 3 7 5 3 7 5 1 2 4 3 4 Now for some updates! D E Different updates . . . Same version vector . . . 3 3 7 7 5 5 0 1 3 3 7 7 5 5 1 0 Lecture 11 Page 8 CS 188,Winter 2015

  9. And It Can Be a Lot Worse • What if replicas are being added and dropped frequently? • How will we keep track of which ones are live and which ones are which? • It can get very confusing Lecture 11 Page 9 CS 188,Winter 2015

  10. But That’s Not What I Want To Do, Anyway • A common answer from system designers • They don’t care about the odd corner cases • They don’t expect them to happen • So why pay a lot to handle them right? • Sometimes a reasonable answer . . . Lecture 11 Page 10 CS 188,Winter 2015

  11. Why You Should Care • If you allow a system to behave a certain way – Even if you don’t think it ever will • And your system is widely deployed and used • Sooner or later that improbable thing will happen • And who knows what happens next? Lecture 11 Page 11 CS 188,Winter 2015

  12. The Basic Solution • Use a commitment protocol • To ensure that all participating nodes understand what’s happening • And agree to it • Handles issues of concurrency and failures Lecture 11 Page 12 CS 188,Winter 2015

  13. Transactions • A mechanism to achieve commitment • By ensuring atomicity – Also consistency, isolation, and durability • Very important in database community • Set of asynchronous request/reply communications • Either all of set are complete or none Lecture 11 Page 13 CS 188,Winter 2015

  14. Transactions and ACID Properties • ACID - Atomicity, Consistency, Isolation, and Durability • Atomicity - all happen or none • Consistency - Outcome equivalent to some serial ordering of actions • Isolation - Partial results are invisible outside the transaction • Durability - Committed transactions survive crashes and other failures Lecture 11 Page 14 CS 188,Winter 2015

  15. Achieving the ACID Properties • In distributed environment, use two- phase commit protocol • A unanimous voting protocol – Do something if all participants agree it should be done • Essentially, hold on to results of a transaction until all participants agree Lecture 11 Page 15 CS 188,Winter 2015

  16. Basics of Two-Phase Commit • Run at the end of all application actions in a transaction • Must end in commit or abort decision • Must work despite delays and failures • Require access to stable storage • Usually started by a coordinator – But coordinator has no more power than any other participant Lecture 11 Page 16 CS 188,Winter 2015

  17. The Two Phases • Phase one: prepare to commit – All participants are informed that they should get ready to commit – All agree to do so • Phase two: commitment – Actually commit all effects of the transaction Lecture 11 Page 17 CS 188,Winter 2015

  18. Outline of Two-Phase Commit Protocol 1. Coordinator writes prepare to his local stable log 2. Coordinator sends prepare message to all other participants 3. Each participant either prepares or aborts, writing choice to its local log 4. Each participant sends his choice to the coordinator Lecture 11 Page 18 CS 188,Winter 2015

  19. The Two-Phase Commit Protocol, continued 5. The coordinator collects all votes 6. If all participants vote to commit, coordinator writes commit to its log 7. If any participant votes to abort, coordinator writes abort to its log 8. Coordinator sends his decision to all others Lecture 11 Page 19 CS 188,Winter 2015

  20. The Two-Phase Commit Protocol, concluded 9. If other participants receive a commit message, write commit to log and release transaction resources 10. If other participants receive an abort message, write abort to log and release transaction resources 11. Return acknowledgement to coordinator Lecture 11 Page 20 CS 188,Winter 2015

  21. A Two-Phase Commit Example Phase 1 Phase 2 Node 4 Node 1 coordinator committed prepared commit prepare prepare commit All voted prepare yes! commit prepare prepare commit commit Node 2 Node 3 Lecture 11 Page 21 CS 188,Winter 2015

  22. What About the Abort Case? • Same as commit, except not everyone voted yes • Instead of committing, send aborts – And abort locally at coordinator • On receipt of an abort message, undo everything Lecture 11 Page 22 CS 188,Winter 2015

  23. Overheads of Two-Phase Commit • For n participants, 4*(n-1) messages – Each participant (except coordinator) gets a prepare and a commit message – Each participant (except coordinator) sends a prepared and a committed message • Can optimize committed messages away – With potential cost of serious latencies in clearing log records Lecture 11 Page 23 CS 188,Winter 2015

  24. Two-Phase Commit and Failures • Two-phase commit behaves well in the face of all single node failures – May not be able to commit – But will cleanly commit or abort – And, if anyone commits, eventually everyone will • Assumes fail-stop failures Lecture 11 Page 24 CS 188,Winter 2015

  25. Some Failure Examples: Example 1 Node 4 Node 1 Failure of coordinator after prepare prepare sent; not all participants get Nodes 2, 3, 4 consult prepare on timeout and abort prepare Node 2 Node 3 Lecture 11 Page 25 CS 188,Winter 2015

  26. Some Failure Examples: Example 2 Node 4 Node 1 Failure of other abort prepare participant before it prepare replied to prepare Node 1 never got a response from node 4 prepare prepare Node 2 Node 3 Lecture 11 Page 26 CS 188,Winter 2015

  27. Some Failure Examples: Example 3 Node 4 Node 1 Failure of other commit Query commit status Commit participant after prepare commit All voted prepare replying prepared commit yes! Node 4 consults its log Node 1 never got the What happens if and notices it was committed message node 4 recovers? prepared from node 4 prepare prepare commit commit Node 2 Node 3 Lecture 11 Page 27 CS 188,Winter 2015

  28. Handling Failures • Non-failed nodes still recover if some participants failed • The coordinator can determine what other nodes did – Did we commit or did we not? • If the coordinator failed, a new coordinator can be elected – And can determine state of commit – Except . . . Lecture 11 Page 28 CS 188,Winter 2015

  29. An Issue With Two-Phase Commit • What if both the coordinator and another node fail? – During the commit phase • Two possibilities 1. The other failed node committed 2. The other failed node did not commit Lecture 11 Page 29 CS 188,Winter 2015

  30. Possibility 1 Node 4 Node 1 prepare prepare commit prepare prepare commit Node 2 Node 3 Lecture 11 Page 30 CS 188,Winter 2015

  31. Possibility 2 Node 4 Node 1 prepare prepare commit prepare prepare Node 2 Node 3 Lecture 11 Page 31 CS 188,Winter 2015

  32. What Do the Other Nodes Do? Here’s what they see, in both cases: Node 4 Node 1 Node 1 But what happened at the failed nodes? prepare prepare prepare This? commit commit Or this? prepare prepare prepare commit Node 2 Node 2 Node 3 Lecture 11 Page 32 CS 188,Winter 2015

  33. Why Does It Matter? • Well, why? • Consider, for each case, what would have happened if node 2 hadn’t failed Lecture 11 Page 33 CS 188,Winter 2015

  34. Handling the Problem • Go to three phases instead of two • Third phase provides the necessary information to distinguish the cases • So if this two node failure occurs, other nodes can tell what happened Lecture 11 Page 34 CS 188,Winter 2015

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend