1
play

1 Chances to weaken ordering Virtual Synchrony at a glance Suppose - PDF document

Virtual Synchrony Goal: Simplifies distributed systems Virtual synchrony development by introducing emulating a simplified world a synchronous one Features of the virtual synchrony model Process groups with state transfer,


  1. Virtual Synchrony � Goal: Simplifies distributed systems Virtual synchrony development by introducing emulating a simplified world – a synchronous one � Features of the virtual synchrony model � Process groups with state transfer, automated Ken Birman fault detection and membership reporting � Ordered reliable multicast, in several flavors � Fault-tolerance, replication tools layered on top � Extremely good performance Process groups Why “virtual” synchrony? � Offered as a new and fundamental � What would a synchronous execution programming abstraction look like? � Just a set of application processes that � In what ways is a “virtual” synchrony cooperate for some purpose execution not the same thing? � Could replicate data, coordinate handling of incoming requests or events, perform parallel tasks, or have a shared perspective on some sort of “fact” about the system � Can create many of them * * Within limits... Many systems only had limited scalability A synchronous execution Virtual Synchrony at a glance p p q q r r s s t t u u � With true synchrony executions run in With virtual synchrony executions only look genuine lock-step. “lock step” to the application 1

  2. Chances to “weaken” ordering Virtual Synchrony at a glance � Suppose that any conflicting updates are p synchronized using some form of locking q � Multicast sender will have mutual exclusion r � Hence simply because we used locks, cbcast delivers conflicting updates in order they were s performed! t � If our system ever does see concurrent u multicasts… they must not have conflicted. We use the weakest (least ordered, hence So it won’t matter if cbcast delivers them in different orders at different recipients! fastest) form of communication possible Causally ordered updates In general? � Each thread corresponds to a different lock � Replace “safe” (dynamic uniformity) with a standard multicast when possible 2 5 p � Replace abcast with cbcast 1 r � Replace cbcast with fbcast s 3 t 2 1 4 � Unless replies are needed, don’t wait � In effect: red “events” never conflict with for replies to a multicast green ones! Why “virtual” synchrony? Why groups? � The user writes code as it will experience a � Other concurrent work, such as Lamport’s purely synchronous execution state machines, treat the entire program as a � Simplifies the developer’s task – very few cases to deterministic entity and replicate it worry about, and all group members see the same � But a group replicates state at the “abstract thing at the same “time” data type” level � But the actual execution is rather concurrent and asynchronous � Each group can correspond to one object � Maximizes performance � This is a good fit with modern styles of � Reduces risk that lock-step execution will trigger application development correlated failures 2

  3. Correlated failures Programming with groups � Perhaps surprisingly, experiments showed � Many systems just have one group that virtual synchrony makes these less likely! � E.g. replicated bank servers � Recall that many programs are buggy � Cluster mimics one highly reliable server � Often these are Heisenbugs (order sensitive) � But we can also use groups at finer � With lock-step execution each group member granularity sees group events in identical order � E.g. to replicate a shared data structure � So all die in unison � Now one process might belong to many groups � With virtual synchrony orders differ � A further reason that different processes � So an order-sensitive bug might only kill one group member! might see different inputs and event orders Embedding groups into “tools” Distributed algorithms � We can design a groups API: � Processes that might participate join an appropriate group � pg_join(), pg_leave(), cbcast()… � Now the group view gives a simple � But we can also use groups to build leader election rule other higher level mechanisms � Distributed algorithms, like snapshot � Everyone sees the same members, in the same order, ranked by when they joined � Fault-tolerant request execution � Leader can be, e.g., the “oldest” process � Publish-subscribe Distributed algorithms Distributed algorithms � A group can easily solve consensus � A group can easily do consistent snapshot algorithm � Leader multicasts: “what’s your input”? � All reply: “Mine is 0. Mine is 1” � Either use cbcast throughout system, or build the algorithm over gbcast � Initiator picks the most common value and multicasts that: the “decision value” � Two phases: � Start snapshot: a first cbcast � If the leader fails, the new leader just � Finished: a second cbcast, collect process restarts the algorithm states and channel logs � Puzzle: Does FLP apply here? 3

  4. More tools: fault-tolerance Distributed algorithms: Summary � Suppose that we want to offer clients “fault- � Leader election tolerant request execution” � Consensus and other forms of � We can replace a traditional service with a group of members agreement like voting � Each request is assigned to a primary (ideally, spread the work around) and a backup � Snapshots, hence deadlock detection, � Primary sends a “cc” of the response to the request to auditing, load balancing the backup � Backup keeps a copy of the request and steps in only if the primary crashes before replying � Sometimes called “coordinator/cohort” just to distinguish from “primary/backup” Coordinator-cohort Coordinator-cohort p p q q r r s s t t u u Q assigned as coordinator for t’s request…. P picked to perform u’s request. Q stands by But p takes over if q fails until it sees request completion message Parallel processing Parallel processing p p q q r r s s t t P and Q split a task. P performs part 1 of 2; In this example, r is the cohort and both p Q performs part 2 of 2. Such as searching and q function as coordinators. If either a large database… they agree on the initial fails, r can step in and take over its role…. state in which the request was received 4

  5. Publish / Subscribe Parallel processing � Goal is to support a simple API: p � Publish(“topic”, message) q � Subscribe(“topic”, event_hander) r � We can just create a group for each s t topic � Publish multicasts to the group …. As it did in this case, when q fails � Subscribers are the members Scalability warnings! Publish / Subscribe issue? � We could have thousands of topics! � Many existing group communication systems don’t scale incredibly well � Too many to directly map topics to groups � Instead map topics to a smaller set of groups. � E.g. JGroups, Isis, Horus, Ensemble, Spread � SPREAD system calls these “lightweight” groups � Group sizes limited to perhaps 50-75 members (idea traces to work done by Glade on Isis) � And individual processes limited to joining perhaps � Mapping will result in inaccuracies… Filter 50-75 groups (lightweight groups an exception) incoming messages to discard any not actually � Overheads soar as these sizes increase destined to the receiver process � Each group runs protocols oblivious of the others, � Cornell’s new QuickSilver system instead and this creates huge inefficiency directly supports immense numbers of groups Other “toolkit” ideas Other similar ideas � We could embed group communication � There was some work on embedding into a framework in a “transparent” way groups into programming languages � Example: CORBA fault-tolerance � But many applications want to use them to specification does lock-step replication of link programs coded in different languages deterministic components and systems � The client simply can’t see failures � Hence an interesting curiosity but just a curiosity � But the determinism assumption is painful, and users have been unenthusiastic � Quicksilver: Transparently embeds � And exposed to correlated crashes groups into Windows 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend