Commitment and Mutual Exclusion CS 188 Distributed Systems - - PowerPoint PPT Presentation

commitment and mutual exclusion cs 188 distributed
SMART_READER_LITE
LIVE PREVIEW

Commitment and Mutual Exclusion CS 188 Distributed Systems - - PowerPoint PPT Presentation

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015 Lecture 11 Page 1 CS 188,Winter 2015 Introduction Many distributed systems require that participants agree on something On changes to important data On


slide-1
SLIDE 1

Lecture 11 Page 1 CS 188,Winter 2015

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015

slide-2
SLIDE 2

Lecture 11 Page 2 CS 188,Winter 2015

Introduction

  • Many distributed systems require that

participants agree on something – On changes to important data – On the status of a computation – On what to do next

  • Reaching agreement in a general

distributed system is challenging

slide-3
SLIDE 3

Lecture 11 Page 3 CS 188,Winter 2015

Commitment

  • Reaching agreement in a distributed

system is extremely important

  • Usually impossible to control a

system’s behavior without agreement

  • One approach to agreement is to get all

participants to prepare to agree

  • Then, once prepared, to take the action
slide-4
SLIDE 4

Lecture 11 Page 4 CS 188,Winter 2015

Challenges to Commitment

  • There are challenges to ensuring that

commitment occurs

  • Different nodes’ actions aren’t

synchronous

  • Communication only via messages
  • Other actions can intervene
  • Failures can occur
slide-5
SLIDE 5

Lecture 11 Page 5 CS 188,Winter 2015

For Example,

  • An optimistically replicated file system

like Ficus

  • We want to be able to add replicas of a

volume

  • Which is a lot easier to do if all nodes

hosting existing replicas agree

slide-6
SLIDE 6

Lecture 11 Page 6 CS 188,Winter 2015

The Scenario

A B C D

1 2 3 I want a replica, too!

3 7 5 3 7 5 3 7 5

4

3 7 5

But we need a version vector element for the new replica

slide-7
SLIDE 7

Lecture 11 Page 7 CS 188,Winter 2015

So What’s the Problem?

  • A and C don’t know about the new

replica – But they can learn about it as soon as they contact B

  • So why is there any difficulty?
slide-8
SLIDE 8

Lecture 11 Page 8 CS 188,Winter 2015

One Problem

A B C

1 2 3

3 7 5 3 7 5 3 7 5

4

D E

4

5 7 3 5 7 3 1 5 7 3

Now for some updates!

1 5 7 3

Different updates . . . Same version vector . . .

slide-9
SLIDE 9

Lecture 11 Page 9 CS 188,Winter 2015

And It Can Be a Lot Worse

  • What if replicas are being added and

dropped frequently?

  • How will we keep track of which ones

are live and which ones are which?

  • It can get very confusing
slide-10
SLIDE 10

Lecture 11 Page 10 CS 188,Winter 2015

But That’s Not What I Want To Do, Anyway

  • A common answer from system

designers

  • They don’t care about the odd corner

cases

  • They don’t expect them to happen
  • So why pay a lot to handle them right?
  • Sometimes a reasonable answer . . .
slide-11
SLIDE 11

Lecture 11 Page 11 CS 188,Winter 2015

Why You Should Care

  • If you allow a system to behave a

certain way – Even if you don’t think it ever will

  • And your system is widely deployed

and used

  • Sooner or later that improbable thing

will happen

  • And who knows what happens next?
slide-12
SLIDE 12

Lecture 11 Page 12 CS 188,Winter 2015

The Basic Solution

  • Use a commitment protocol
  • To ensure that all participating nodes

understand what’s happening

  • And agree to it
  • Handles issues of concurrency and

failures

slide-13
SLIDE 13

Lecture 11 Page 13 CS 188,Winter 2015

Transactions

  • A mechanism to achieve commitment
  • By ensuring atomicity

– Also consistency, isolation, and durability

  • Very important in database community
  • Set of asynchronous request/reply

communications

  • Either all of set are complete or none
slide-14
SLIDE 14

Lecture 11 Page 14 CS 188,Winter 2015

Transactions and ACID Properties

  • ACID - Atomicity, Consistency, Isolation,

and Durability

  • Atomicity - all happen or none
  • Consistency - Outcome equivalent to some

serial ordering of actions

  • Isolation - Partial results are invisible
  • utside the transaction
  • Durability - Committed transactions survive

crashes and other failures

slide-15
SLIDE 15

Lecture 11 Page 15 CS 188,Winter 2015

Achieving the ACID Properties

  • In distributed environment, use two-

phase commit protocol

  • A unanimous voting protocol

– Do something if all participants agree it should be done

  • Essentially, hold on to results of a

transaction until all participants agree

slide-16
SLIDE 16

Lecture 11 Page 16 CS 188,Winter 2015

Basics of Two-Phase Commit

  • Run at the end of all application actions in a

transaction

  • Must end in commit or abort decision
  • Must work despite delays and failures
  • Require access to stable storage
  • Usually started by a coordinator

– But coordinator has no more power than any other participant

slide-17
SLIDE 17

Lecture 11 Page 17 CS 188,Winter 2015

The Two Phases

  • Phase one: prepare to commit

– All participants are informed that they should get ready to commit – All agree to do so

  • Phase two: commitment

– Actually commit all effects of the transaction

slide-18
SLIDE 18

Lecture 11 Page 18 CS 188,Winter 2015

Outline of Two-Phase Commit Protocol

  • 1. Coordinator writes prepare to his local

stable log

  • 2. Coordinator sends prepare message to all
  • ther participants
  • 3. Each participant either prepares or aborts,

writing choice to its local log

  • 4. Each participant sends his choice to the

coordinator

slide-19
SLIDE 19

Lecture 11 Page 19 CS 188,Winter 2015

The Two-Phase Commit Protocol, continued

  • 5. The coordinator collects all votes
  • 6. If all participants vote to commit,

coordinator writes commit to its log

  • 7. If any participant votes to abort,

coordinator writes abort to its log

  • 8. Coordinator sends his decision to all
  • thers
slide-20
SLIDE 20

Lecture 11 Page 20 CS 188,Winter 2015

The Two-Phase Commit Protocol, concluded

  • 9. If other participants receive a commit

message, write commit to log and release transaction resources

  • 10. If other participants receive an abort

message, write abort to log and release transaction resources

  • 11. Return acknowledgement to coordinator
slide-21
SLIDE 21

Lecture 11 Page 21 CS 188,Winter 2015

A Two-Phase Commit Example

Node 1 Node 4 Node 2 Node 3 coordinator

prepare

Phase 1

prepare

prepare prepare prepare

prepared

All voted yes!

Phase 2

commit

commit

commit commit commit

committed

slide-22
SLIDE 22

Lecture 11 Page 22 CS 188,Winter 2015

What About the Abort Case?

  • Same as commit, except not everyone

voted yes

  • Instead of committing, send aborts

– And abort locally at coordinator

  • On receipt of an abort message, undo

everything

slide-23
SLIDE 23

Lecture 11 Page 23 CS 188,Winter 2015

Overheads of Two-Phase Commit

  • For n participants, 4*(n-1) messages

– Each participant (except coordinator) gets a prepare and a commit message – Each participant (except coordinator) sends a prepared and a committed message

  • Can optimize committed messages away

– With potential cost of serious latencies in clearing log records

slide-24
SLIDE 24

Lecture 11 Page 24 CS 188,Winter 2015

Two-Phase Commit and Failures

  • Two-phase commit behaves well in the

face of all single node failures – May not be able to commit – But will cleanly commit or abort – And, if anyone commits, eventually everyone will

  • Assumes fail-stop failures
slide-25
SLIDE 25

Lecture 11 Page 25 CS 188,Winter 2015

Some Failure Examples: Example 1

Node 1 Node 4 Node 2 Node 3

prepare prepare

Failure of coordinator after prepare sent; not all participants get prepare Nodes 2, 3, 4 consult

  • n timeout and abort
slide-26
SLIDE 26

Lecture 11 Page 26 CS 188,Winter 2015

Some Failure Examples: Example 2

Node 1 Node 4 Node 2 Node 3 Failure of other participant before it replied to prepare

prepare prepare prepare

Node 1 never got a response from node 4 prepare abort

slide-27
SLIDE 27

Lecture 11 Page 27 CS 188,Winter 2015

Some Failure Examples: Example 3

Node 1 Node 4 Node 2 Node 3 Failure of other participant after replying prepared

prepare prepare prepare prepare All voted yes!

commit

commit commit

What happens if node 4 recovers? Node 4 consults its log and notices it was prepared Query commit status Node 1 never got the committed message from node 4 Commit

commit commit

slide-28
SLIDE 28

Lecture 11 Page 28 CS 188,Winter 2015

Handling Failures

  • Non-failed nodes still recover if some

participants failed

  • The coordinator can determine what other

nodes did – Did we commit or did we not?

  • If the coordinator failed, a new coordinator

can be elected – And can determine state of commit – Except . . .

slide-29
SLIDE 29

Lecture 11 Page 29 CS 188,Winter 2015

An Issue With Two-Phase Commit

  • What if both the coordinator and

another node fail? – During the commit phase

  • Two possibilities
  • 1. The other failed node committed
  • 2. The other failed node did not

commit

slide-30
SLIDE 30

Lecture 11 Page 30 CS 188,Winter 2015

Possibility 1

Node 1 Node 4 Node 2 Node 3

prepare prepare prepare prepare

commit commit

slide-31
SLIDE 31

Lecture 11 Page 31 CS 188,Winter 2015

Possibility 2

Node 1 Node 4 Node 2 Node 3

prepare prepare prepare prepare

commit

slide-32
SLIDE 32

Lecture 11 Page 32 CS 188,Winter 2015

What Do the Other Nodes Do?

Node 4 Node 3

prepare prepare

Here’s what they see, in both cases: But what happened at the failed nodes? Node 1 Node 2

prepare prepare

commit commit

This? Or this? Node 1 Node 2

prepare prepare

commit

slide-33
SLIDE 33

Lecture 11 Page 33 CS 188,Winter 2015

Why Does It Matter?

  • Well, why?
  • Consider, for each case, what would

have happened if node 2 hadn’t failed

slide-34
SLIDE 34

Lecture 11 Page 34 CS 188,Winter 2015

Handling the Problem

  • Go to three phases instead of two
  • Third phase provides the necessary

information to distinguish the cases

  • So if this two node failure occurs, other

nodes can tell what happened

slide-35
SLIDE 35

Lecture 11 Page 35 CS 188,Winter 2015

Three Phase Commit

send canCommit OK abort receive canCommit no wait send ack send startCommit prep all ack abort nak timeout wait receive startCommit prep receive Commit send ack all ack send Commit abort timeout nak timeout Commit

confirm

send ack

Coordinator Participant(s)

timeout

slide-36
SLIDE 36

Lecture 11 Page 36 CS 188,Winter 2015

Why Three Phases?

  • First phase tells everyone a commit is in

progress

  • Second phase ensures that everyone

knows that everyone else was told – No chance that only some were told

  • Third phase actually performs the commit
  • Three phases ensures that failures of

coordinator plus another participant is non-ambiguous

slide-37
SLIDE 37

Lecture 11 Page 37 CS 188,Winter 2015

How Does This Work?

Node 4 Node 3

startCommit startCommit

These status records tell us more than the prepare record did Node 2 ACKed the canCommit message Node 1 knew all participants did a canCommit

So it’s safe to commit and nodes 3 and 4

slide-38
SLIDE 38

Lecture 11 Page 38 CS 188,Winter 2015

Overhead of Three Phase Commit

  • For n participants, 6*(n-1) messages

– Each participant (except coordinator) gets a canCommit, startCommit, and a doCommit message – Each participant (except coordinator) ACKed each of those messages

  • Again, the final ACK can be optimized

away – But coordinator can’t delete record till it knows of all ACKs

slide-39
SLIDE 39

Lecture 11 Page 39 CS 188,Winter 2015

Distributed Mutual Exclusion

  • Another common problem in synchronizing

distributed systems

  • One-way communications can use simple

synchronization – Built into the paradigm – Or handled at the shared server

  • More general communications require more

complex synchronization – To ensure multiple simultaneously running processes interact properly

slide-40
SLIDE 40

Lecture 11 Page 40 CS 188,Winter 2015

Synchronization and Mutual Exclusion

  • Mutual exclusion ensures that only one
  • f a set of participants uses a resource

– At any given moment

  • In certain cases, that’s all the

synchronization required

  • In other cases, more synchronization

can be built on top of mutual exclusion

slide-41
SLIDE 41

Lecture 11 Page 41 CS 188,Winter 2015

The Basic Mutual Exclusion Problem

  • n independent participants are sharing a

resource – In distributed case, each participant on a different node

  • At any moment, only one participant can

use the resource

  • Must avoid deadlock, ensure fairness, and

use few resources

slide-42
SLIDE 42

Lecture 11 Page 42 CS 188,Winter 2015

Mutual Exclusion Approaches

  • Contention-based
  • Controlled
slide-43
SLIDE 43

Lecture 11 Page 43 CS 188,Winter 2015

Contention-Based Mutual Exclusion

  • Each process freely and equally competes

for the resource

  • Some algorithm used to evaluate request

resolution criterion

  • Timestamps, priorities, and voting are ways

to resolve conflicting requests

  • Problem assumes everyone cooperates and

follows the rules

slide-44
SLIDE 44

Lecture 11 Page 44 CS 188,Winter 2015

Timestamp Schemes

  • Whoever asked first should get the

resource

  • Runs into obvious problems of

distributed clocks

  • Usually handled with logical clocks,

not physical clocks

slide-45
SLIDE 45

Lecture 11 Page 45 CS 188,Winter 2015

Lamport’s Mutual Exclusion Algorithm

  • Uses Lamport clocks

– With total order

  • Assumes N processes
  • Any pair can communicate directly
  • Assumes reliable, in-order delivery of

messages – Though arbitrary message delays allowable

slide-46
SLIDE 46

Lecture 11 Page 46 CS 188,Winter 2015

Outline of Lamport’s Algorithm

  • Each process keeps a queue of requests
  • When process wants the resource, it adds

request to local queue, in order

  • Sends REQUEST to all other processes
  • All other processes send REPLY msgs
  • When done with resource, process sends

RELEASE msg to all others

  • Lamport timestamps on all msgs
slide-47
SLIDE 47

Lecture 11 Page 47 CS 188,Winter 2015

When Does Someone Get the Resource?

  • A requesting process gets the resource

when: 1) It has received replies from all other processes 2) Its request is at the top of its queue 3) A RELEASE message was received

slide-48
SLIDE 48

Lecture 11 Page 48 CS 188,Winter 2015

Lamport’s Algorithm At Work

A B C D 10 10 10 10

A 10 A 10 A 10

B requests the resource 11

B 11 B 11 B 11 B 11

12 12 12

B 11 A 10

13

B 11 B 11 B 11

14 14 14 B receives the resource RELEASE RELEASE

slide-49
SLIDE 49

Lecture 11 Page 49 CS 188,Winter 2015

Dealing With Multiple Requests

A B C D 10 10 10 10

A 10 A 10 A 10 A 10

B requests the resource

B 11

11 C requests the resource B and C send messages

C 11

11 14

A 10 B 11 C 11 A 10 B 11 C 11

14 14 14

A 10 B 11 C 11 A 10 B 11 C 11

A releases the resource

C 11 B 11 C 11 B 11 C 11 B 11 C 11 B 11

15 16 16 16 B receives the resource

slide-50
SLIDE 50

Lecture 11 Page 50 CS 188,Winter 2015

Complexity of Lamport Algorithm

  • For N participants, 3*(N-1) per completion
  • f critical section
  • Requester sends N-1 REQUEST messages
  • N-1 other processes each REPLY
  • When requester relinquishes critical section,

sends N-1 RELEASE messages

slide-51
SLIDE 51

Lecture 11 Page 51 CS 188,Winter 2015

A Problem With Lamport Algorithm

  • One slow/failed process can cripple

anyone getting the resource

  • Since no process can claim the

resource unless it knows all other processes have seen its request

slide-52
SLIDE 52

Lecture 11 Page 52 CS 188,Winter 2015

Voting Schemes

  • Processes vote on who should get the

shared resource next

  • Can work even if one process fails

– Or even if a minority of processes fail

  • Variants can allow weighted voting
slide-53
SLIDE 53

Lecture 11 Page 53 CS 188,Winter 2015

Basics of Voting Algorithms

  • Process needing shared resource sends

a REQUEST to all other processes

  • Each process receiving a request

checks if it has already voted for someone else

  • If not, it votes for the requester

– By replying

slide-54
SLIDE 54

Lecture 11 Page 54 CS 188,Winter 2015

Obtaining the Shared Resource In Voting Schemes

  • When a requester gets replies from a

majority of voters, it gets the section

  • Since any voting process only replies to one

requester, only one requester can get a majority

  • When done with resource, send RELEASE

message to all who voted for this process

slide-55
SLIDE 55

Lecture 11 Page 55 CS 188,Winter 2015

Avoiding Deadlock

  • If more than two processes request

resource, sometimes no one wins

  • Effectively a deadlock condition
  • Can be fixed by allowing processes to

change their votes – Requires permission from the process that originally got the vote

slide-56
SLIDE 56

Lecture 11 Page 56 CS 188,Winter 2015

Complexity of Voting Schemes for Mutual Exclusion

  • O(N)

– for reasons similar to Lamport discussion

  • Use of quorums can reduce to

O(SQRT(N))

slide-57
SLIDE 57

Lecture 11 Page 57 CS 188,Winter 2015

Token Based Mutual Exclusion

  • Maintain a token shared by all

processes needing the resource

  • Current holder of the token has access

to resource

  • To gain access to resource, must obtain

token

slide-58
SLIDE 58

Lecture 11 Page 58 CS 188,Winter 2015

Obtaining the Token

  • Typically done by asking for it through

some topology of the processes – Ring – Tree – Broadcast

slide-59
SLIDE 59

Lecture 11 Page 59 CS 188,Winter 2015

Ring Topologies for Tokens

  • The token circulates along a pre-

defined logical ring of processes

  • As token arrives, if local process wants

the resource the token is held

  • Once finished, the token is passed on
  • Good for high loads, high overhead for

low loads

slide-60
SLIDE 60

Lecture 11 Page 60 CS 188,Winter 2015

A Token Ring

slide-61
SLIDE 61

Lecture 11 Page 61 CS 188,Winter 2015

Tree Topologies

  • Only pass token when needed
  • Use a tree structure to pass requests

from requesting process to current token holder

  • When token passed, re-arrange the tree

to put new token holder at root

slide-62
SLIDE 62

Lecture 11 Page 62 CS 188,Winter 2015

Broadcast Topologies

  • When a process wants the token, it sends a

request to all other processes

  • If current token holder isn’t using it, it sends

the token to requester

  • If the token is in use, its holder adds the

request to the queue

  • Use timestamp scheme to order the queue
slide-63
SLIDE 63

Lecture 11 Page 63 CS 188,Winter 2015

A Common Problem With Token Schemes

  • What happens if the token-holder fails?
  • Could keep token in stable storage

– But still unavailable until token- holder recovers

  • Could create new token

– Must be careful not to end up with two tokens, though – Typically by running voting algorithm