Lecture 11 Page 1 CS 188,Winter 2015
Commitment and Mutual Exclusion CS 188 Distributed Systems - - PowerPoint PPT Presentation
Commitment and Mutual Exclusion CS 188 Distributed Systems - - PowerPoint PPT Presentation
Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015 Lecture 11 Page 1 CS 188,Winter 2015 Introduction Many distributed systems require that participants agree on something On changes to important data On
Lecture 11 Page 2 CS 188,Winter 2015
Introduction
- Many distributed systems require that
participants agree on something – On changes to important data – On the status of a computation – On what to do next
- Reaching agreement in a general
distributed system is challenging
Lecture 11 Page 3 CS 188,Winter 2015
Commitment
- Reaching agreement in a distributed
system is extremely important
- Usually impossible to control a
system’s behavior without agreement
- One approach to agreement is to get all
participants to prepare to agree
- Then, once prepared, to take the action
Lecture 11 Page 4 CS 188,Winter 2015
Challenges to Commitment
- There are challenges to ensuring that
commitment occurs
- Different nodes’ actions aren’t
synchronous
- Communication only via messages
- Other actions can intervene
- Failures can occur
Lecture 11 Page 5 CS 188,Winter 2015
For Example,
- An optimistically replicated file system
like Ficus
- We want to be able to add replicas of a
volume
- Which is a lot easier to do if all nodes
hosting existing replicas agree
Lecture 11 Page 6 CS 188,Winter 2015
The Scenario
A B C D
1 2 3 I want a replica, too!
3 7 5 3 7 5 3 7 5
4
3 7 5
But we need a version vector element for the new replica
Lecture 11 Page 7 CS 188,Winter 2015
So What’s the Problem?
- A and C don’t know about the new
replica – But they can learn about it as soon as they contact B
- So why is there any difficulty?
Lecture 11 Page 8 CS 188,Winter 2015
One Problem
A B C
1 2 3
3 7 5 3 7 5 3 7 5
4
D E
4
5 7 3 5 7 3 1 5 7 3
Now for some updates!
1 5 7 3
Different updates . . . Same version vector . . .
Lecture 11 Page 9 CS 188,Winter 2015
And It Can Be a Lot Worse
- What if replicas are being added and
dropped frequently?
- How will we keep track of which ones
are live and which ones are which?
- It can get very confusing
Lecture 11 Page 10 CS 188,Winter 2015
But That’s Not What I Want To Do, Anyway
- A common answer from system
designers
- They don’t care about the odd corner
cases
- They don’t expect them to happen
- So why pay a lot to handle them right?
- Sometimes a reasonable answer . . .
Lecture 11 Page 11 CS 188,Winter 2015
Why You Should Care
- If you allow a system to behave a
certain way – Even if you don’t think it ever will
- And your system is widely deployed
and used
- Sooner or later that improbable thing
will happen
- And who knows what happens next?
Lecture 11 Page 12 CS 188,Winter 2015
The Basic Solution
- Use a commitment protocol
- To ensure that all participating nodes
understand what’s happening
- And agree to it
- Handles issues of concurrency and
failures
Lecture 11 Page 13 CS 188,Winter 2015
Transactions
- A mechanism to achieve commitment
- By ensuring atomicity
– Also consistency, isolation, and durability
- Very important in database community
- Set of asynchronous request/reply
communications
- Either all of set are complete or none
Lecture 11 Page 14 CS 188,Winter 2015
Transactions and ACID Properties
- ACID - Atomicity, Consistency, Isolation,
and Durability
- Atomicity - all happen or none
- Consistency - Outcome equivalent to some
serial ordering of actions
- Isolation - Partial results are invisible
- utside the transaction
- Durability - Committed transactions survive
crashes and other failures
Lecture 11 Page 15 CS 188,Winter 2015
Achieving the ACID Properties
- In distributed environment, use two-
phase commit protocol
- A unanimous voting protocol
– Do something if all participants agree it should be done
- Essentially, hold on to results of a
transaction until all participants agree
Lecture 11 Page 16 CS 188,Winter 2015
Basics of Two-Phase Commit
- Run at the end of all application actions in a
transaction
- Must end in commit or abort decision
- Must work despite delays and failures
- Require access to stable storage
- Usually started by a coordinator
– But coordinator has no more power than any other participant
Lecture 11 Page 17 CS 188,Winter 2015
The Two Phases
- Phase one: prepare to commit
– All participants are informed that they should get ready to commit – All agree to do so
- Phase two: commitment
– Actually commit all effects of the transaction
Lecture 11 Page 18 CS 188,Winter 2015
Outline of Two-Phase Commit Protocol
- 1. Coordinator writes prepare to his local
stable log
- 2. Coordinator sends prepare message to all
- ther participants
- 3. Each participant either prepares or aborts,
writing choice to its local log
- 4. Each participant sends his choice to the
coordinator
Lecture 11 Page 19 CS 188,Winter 2015
The Two-Phase Commit Protocol, continued
- 5. The coordinator collects all votes
- 6. If all participants vote to commit,
coordinator writes commit to its log
- 7. If any participant votes to abort,
coordinator writes abort to its log
- 8. Coordinator sends his decision to all
- thers
Lecture 11 Page 20 CS 188,Winter 2015
The Two-Phase Commit Protocol, concluded
- 9. If other participants receive a commit
message, write commit to log and release transaction resources
- 10. If other participants receive an abort
message, write abort to log and release transaction resources
- 11. Return acknowledgement to coordinator
Lecture 11 Page 21 CS 188,Winter 2015
A Two-Phase Commit Example
Node 1 Node 4 Node 2 Node 3 coordinator
prepare
Phase 1
prepare
prepare prepare prepare
prepared
All voted yes!
Phase 2
commit
commit
commit commit commit
committed
Lecture 11 Page 22 CS 188,Winter 2015
What About the Abort Case?
- Same as commit, except not everyone
voted yes
- Instead of committing, send aborts
– And abort locally at coordinator
- On receipt of an abort message, undo
everything
Lecture 11 Page 23 CS 188,Winter 2015
Overheads of Two-Phase Commit
- For n participants, 4*(n-1) messages
– Each participant (except coordinator) gets a prepare and a commit message – Each participant (except coordinator) sends a prepared and a committed message
- Can optimize committed messages away
– With potential cost of serious latencies in clearing log records
Lecture 11 Page 24 CS 188,Winter 2015
Two-Phase Commit and Failures
- Two-phase commit behaves well in the
face of all single node failures – May not be able to commit – But will cleanly commit or abort – And, if anyone commits, eventually everyone will
- Assumes fail-stop failures
Lecture 11 Page 25 CS 188,Winter 2015
Some Failure Examples: Example 1
Node 1 Node 4 Node 2 Node 3
prepare prepare
Failure of coordinator after prepare sent; not all participants get prepare Nodes 2, 3, 4 consult
- n timeout and abort
Lecture 11 Page 26 CS 188,Winter 2015
Some Failure Examples: Example 2
Node 1 Node 4 Node 2 Node 3 Failure of other participant before it replied to prepare
prepare prepare prepare
Node 1 never got a response from node 4 prepare abort
Lecture 11 Page 27 CS 188,Winter 2015
Some Failure Examples: Example 3
Node 1 Node 4 Node 2 Node 3 Failure of other participant after replying prepared
prepare prepare prepare prepare All voted yes!
commit
commit commit
What happens if node 4 recovers? Node 4 consults its log and notices it was prepared Query commit status Node 1 never got the committed message from node 4 Commit
commit commit
Lecture 11 Page 28 CS 188,Winter 2015
Handling Failures
- Non-failed nodes still recover if some
participants failed
- The coordinator can determine what other
nodes did – Did we commit or did we not?
- If the coordinator failed, a new coordinator
can be elected – And can determine state of commit – Except . . .
Lecture 11 Page 29 CS 188,Winter 2015
An Issue With Two-Phase Commit
- What if both the coordinator and
another node fail? – During the commit phase
- Two possibilities
- 1. The other failed node committed
- 2. The other failed node did not
commit
Lecture 11 Page 30 CS 188,Winter 2015
Possibility 1
Node 1 Node 4 Node 2 Node 3
prepare prepare prepare prepare
commit commit
Lecture 11 Page 31 CS 188,Winter 2015
Possibility 2
Node 1 Node 4 Node 2 Node 3
prepare prepare prepare prepare
commit
Lecture 11 Page 32 CS 188,Winter 2015
What Do the Other Nodes Do?
Node 4 Node 3
prepare prepare
Here’s what they see, in both cases: But what happened at the failed nodes? Node 1 Node 2
prepare prepare
commit commit
This? Or this? Node 1 Node 2
prepare prepare
commit
Lecture 11 Page 33 CS 188,Winter 2015
Why Does It Matter?
- Well, why?
- Consider, for each case, what would
have happened if node 2 hadn’t failed
Lecture 11 Page 34 CS 188,Winter 2015
Handling the Problem
- Go to three phases instead of two
- Third phase provides the necessary
information to distinguish the cases
- So if this two node failure occurs, other
nodes can tell what happened
Lecture 11 Page 35 CS 188,Winter 2015
Three Phase Commit
send canCommit OK abort receive canCommit no wait send ack send startCommit prep all ack abort nak timeout wait receive startCommit prep receive Commit send ack all ack send Commit abort timeout nak timeout Commit
confirm
send ack
Coordinator Participant(s)
timeout
Lecture 11 Page 36 CS 188,Winter 2015
Why Three Phases?
- First phase tells everyone a commit is in
progress
- Second phase ensures that everyone
knows that everyone else was told – No chance that only some were told
- Third phase actually performs the commit
- Three phases ensures that failures of
coordinator plus another participant is non-ambiguous
Lecture 11 Page 37 CS 188,Winter 2015
How Does This Work?
Node 4 Node 3
startCommit startCommit
These status records tell us more than the prepare record did Node 2 ACKed the canCommit message Node 1 knew all participants did a canCommit
So it’s safe to commit and nodes 3 and 4
Lecture 11 Page 38 CS 188,Winter 2015
Overhead of Three Phase Commit
- For n participants, 6*(n-1) messages
– Each participant (except coordinator) gets a canCommit, startCommit, and a doCommit message – Each participant (except coordinator) ACKed each of those messages
- Again, the final ACK can be optimized
away – But coordinator can’t delete record till it knows of all ACKs
Lecture 11 Page 39 CS 188,Winter 2015
Distributed Mutual Exclusion
- Another common problem in synchronizing
distributed systems
- One-way communications can use simple
synchronization – Built into the paradigm – Or handled at the shared server
- More general communications require more
complex synchronization – To ensure multiple simultaneously running processes interact properly
Lecture 11 Page 40 CS 188,Winter 2015
Synchronization and Mutual Exclusion
- Mutual exclusion ensures that only one
- f a set of participants uses a resource
– At any given moment
- In certain cases, that’s all the
synchronization required
- In other cases, more synchronization
can be built on top of mutual exclusion
Lecture 11 Page 41 CS 188,Winter 2015
The Basic Mutual Exclusion Problem
- n independent participants are sharing a
resource – In distributed case, each participant on a different node
- At any moment, only one participant can
use the resource
- Must avoid deadlock, ensure fairness, and
use few resources
Lecture 11 Page 42 CS 188,Winter 2015
Mutual Exclusion Approaches
- Contention-based
- Controlled
Lecture 11 Page 43 CS 188,Winter 2015
Contention-Based Mutual Exclusion
- Each process freely and equally competes
for the resource
- Some algorithm used to evaluate request
resolution criterion
- Timestamps, priorities, and voting are ways
to resolve conflicting requests
- Problem assumes everyone cooperates and
follows the rules
Lecture 11 Page 44 CS 188,Winter 2015
Timestamp Schemes
- Whoever asked first should get the
resource
- Runs into obvious problems of
distributed clocks
- Usually handled with logical clocks,
not physical clocks
Lecture 11 Page 45 CS 188,Winter 2015
Lamport’s Mutual Exclusion Algorithm
- Uses Lamport clocks
– With total order
- Assumes N processes
- Any pair can communicate directly
- Assumes reliable, in-order delivery of
messages – Though arbitrary message delays allowable
Lecture 11 Page 46 CS 188,Winter 2015
Outline of Lamport’s Algorithm
- Each process keeps a queue of requests
- When process wants the resource, it adds
request to local queue, in order
- Sends REQUEST to all other processes
- All other processes send REPLY msgs
- When done with resource, process sends
RELEASE msg to all others
- Lamport timestamps on all msgs
Lecture 11 Page 47 CS 188,Winter 2015
When Does Someone Get the Resource?
- A requesting process gets the resource
when: 1) It has received replies from all other processes 2) Its request is at the top of its queue 3) A RELEASE message was received
Lecture 11 Page 48 CS 188,Winter 2015
Lamport’s Algorithm At Work
A B C D 10 10 10 10
A 10 A 10 A 10
B requests the resource 11
B 11 B 11 B 11 B 11
12 12 12
B 11 A 10
13
B 11 B 11 B 11
14 14 14 B receives the resource RELEASE RELEASE
Lecture 11 Page 49 CS 188,Winter 2015
Dealing With Multiple Requests
A B C D 10 10 10 10
A 10 A 10 A 10 A 10
B requests the resource
B 11
11 C requests the resource B and C send messages
C 11
11 14
A 10 B 11 C 11 A 10 B 11 C 11
14 14 14
A 10 B 11 C 11 A 10 B 11 C 11
A releases the resource
C 11 B 11 C 11 B 11 C 11 B 11 C 11 B 11
15 16 16 16 B receives the resource
Lecture 11 Page 50 CS 188,Winter 2015
Complexity of Lamport Algorithm
- For N participants, 3*(N-1) per completion
- f critical section
- Requester sends N-1 REQUEST messages
- N-1 other processes each REPLY
- When requester relinquishes critical section,
sends N-1 RELEASE messages
Lecture 11 Page 51 CS 188,Winter 2015
A Problem With Lamport Algorithm
- One slow/failed process can cripple
anyone getting the resource
- Since no process can claim the
resource unless it knows all other processes have seen its request
Lecture 11 Page 52 CS 188,Winter 2015
Voting Schemes
- Processes vote on who should get the
shared resource next
- Can work even if one process fails
– Or even if a minority of processes fail
- Variants can allow weighted voting
Lecture 11 Page 53 CS 188,Winter 2015
Basics of Voting Algorithms
- Process needing shared resource sends
a REQUEST to all other processes
- Each process receiving a request
checks if it has already voted for someone else
- If not, it votes for the requester
– By replying
Lecture 11 Page 54 CS 188,Winter 2015
Obtaining the Shared Resource In Voting Schemes
- When a requester gets replies from a
majority of voters, it gets the section
- Since any voting process only replies to one
requester, only one requester can get a majority
- When done with resource, send RELEASE
message to all who voted for this process
Lecture 11 Page 55 CS 188,Winter 2015
Avoiding Deadlock
- If more than two processes request
resource, sometimes no one wins
- Effectively a deadlock condition
- Can be fixed by allowing processes to
change their votes – Requires permission from the process that originally got the vote
Lecture 11 Page 56 CS 188,Winter 2015
Complexity of Voting Schemes for Mutual Exclusion
- O(N)
– for reasons similar to Lamport discussion
- Use of quorums can reduce to
O(SQRT(N))
Lecture 11 Page 57 CS 188,Winter 2015
Token Based Mutual Exclusion
- Maintain a token shared by all
processes needing the resource
- Current holder of the token has access
to resource
- To gain access to resource, must obtain
token
Lecture 11 Page 58 CS 188,Winter 2015
Obtaining the Token
- Typically done by asking for it through
some topology of the processes – Ring – Tree – Broadcast
Lecture 11 Page 59 CS 188,Winter 2015
Ring Topologies for Tokens
- The token circulates along a pre-
defined logical ring of processes
- As token arrives, if local process wants
the resource the token is held
- Once finished, the token is passed on
- Good for high loads, high overhead for
low loads
Lecture 11 Page 60 CS 188,Winter 2015
A Token Ring
Lecture 11 Page 61 CS 188,Winter 2015
Tree Topologies
- Only pass token when needed
- Use a tree structure to pass requests
from requesting process to current token holder
- When token passed, re-arrange the tree
to put new token holder at root
Lecture 11 Page 62 CS 188,Winter 2015
Broadcast Topologies
- When a process wants the token, it sends a
request to all other processes
- If current token holder isn’t using it, it sends
the token to requester
- If the token is in use, its holder adds the
request to the queue
- Use timestamp scheme to order the queue
Lecture 11 Page 63 CS 188,Winter 2015
A Common Problem With Token Schemes
- What happens if the token-holder fails?
- Could keep token in stable storage
– But still unavailable until token- holder recovers
- Could create new token