Message Ordering and Group Communications
Course: Distributed Computing Faculty: Dr. Rajendra Prasath
Spring 2019
Message Ordering and Group Communications Course: Distributed - - PowerPoint PPT Presentation
Message Ordering and Group Communications Course: Distributed Computing Faculty: Dr. Rajendra Prasath Spring 2019 About this topic This course covers various concepts in Message Ordering and Group communication in Distributed Systems. We will
Spring 2019
This course covers various concepts in Message Ordering and Group communication in Distributed
communications and their pros and cons
2
Rajendra, IIIT Sri City
è Challenges in Message Passing systems è Distributed Sorting è Space-Time Diagram è Partial Ordering / Total Ordering è Causal Ordering - Precedence Relations è Concurrent Events è Local Clocks and Vector Clocks è Distributed Snapshots è Termination Detection using Dist. Snapshots è Leader Election Problem in Rings
Rajendra, IIIT Sri City
3
è Various Interconnection Topologies è Abstraction - Basic Concepts è Interconnection Patterns suitable for message propagation è Types of Algorithms and their executions è Measures and Metrics
è Many more to come up … stay tuned in !!
Rajendra, IIIT Sri City
4
Rajendra, IIIT Sri City
5
Rajendra, IIIT Sri City
6
è One – to – One
è Unicast
è 1 – 1 è Point – to – point
è Anycast
è 1 – nearest 1 of several identical nodes
è One – to – Many
è Multicast
è 1 – many è Group Communication
è Broadcast
è 1 – All
7
Rajendra, IIIT Sri City
è Why groups?
è Groups allow us to deal with a collection of processes as one abstraction
è Send message to one entity
è Deliver to entire group
è Groups are dynamic
è Created and destroyed è Processes can join or leave
è May belong to 0 or more groups
è Primitives
è join_group, leave_group, send_to_group, query_membership
8
Rajendra, IIIT Sri City
Closed vs. Open
è Closed: only group members can sent messages
Peer vs. Hierarchical
è Peer: each member communicates with group è Hierarchical: go through dedicated coordinator(s) è Diffusion: send to other servers & clients
Managing membership & group creation/deletion
è Distributed vs. centralized
Leaving & joining must be synchronous Fault tolerance
è Reliable message delivery? What about missing members?
9
Rajendra, IIIT Sri City
è Crash failure
è Process stops communicating
è Omission failure (typically due to network)
è Send omission: A process fails to send messages è Receive omission: A process fails to receive messages
è Byzantine Failure
è Some messages are faulty, including sending fake messages
è Partition Failure
è The network may get segmented, dividing the group into two or more unreachable sub-groups
10
Rajendra, IIIT Sri City
11
Rajendra, IIIT Sri City
è coordinator knows group members 12
Rajendra, IIIT Sri City
è Message sent to a group arrives at all group members
è If it fails to arrive at any member, no member will process it
è Unreliable network
è Each message should be acknowledged è Acknowledgements can be lost è Message sender might die
13
Rajendra, IIIT Sri City
è General Idea
è Ensure that every recipient acknowledges receipt of the message è Only then allow the application to process the message è If we give up on a recipient then no recipient can process the received message
è Easier said than done!
è What if a recipient dies after acknowledging the message?
è Is it obligated to restart? è If it restarts, will it know to process the message?
è What if the sender (or coordinator) dies partway through the protocol?
14
Rajendra, IIIT Sri City
Retry through network failures & system downtime
è Sender & receivers maintain a persistent log è Each message has a unique ID so we can discard duplicates è Sender – sends the message to all group members – Writes the message to log – Waits for acknowledgement from each group member – Writes the acknowledgement to log – If timeout on waiting for an acknowledgement, retransmit to group member è Receiver logs received non-duplicate message to persistent log and sends an acknowledgement NEVER GIVE UP! – Assume that dead senders or receivers will be rebooted and will restart where they left off
15
Rajendra, IIIT Sri City
All non-faulty group members will receive the message
Ø Assume sender & recipients will remain alive Ø Network may have glitches
Acknowledgements
Ø Send message to each group member Ø Wait for acknowledgement from each group member Ø Retransmit to non-responding members Ø Subject to feedback implosion
Negative acknowledgements
Ø Use a sequence number on each message Ø Receiver requests retransmission of a missed message Ø More efficient but requires sender to buffer messages indefinitely
16
Rajendra, IIIT Sri City
Easiest thing is to wait for an ACK before sending the next message – But that incurs a round-trip delay Optimizing – Pipelining § Send multiple messages – receive ACKs asynchronously § Set timeout – retransmit message for missing ACKs – Cumulative ACKs § Wait a little while before sending an ACK § If you receive others, then send one ACK for everything – Piggybacked ACKs § Send an ACK along with a return message TCP does all of these … but now we have to do this on each recipient
17
Rajendra, IIIT Sri City
Send vs Delivery Global Time Ordering Total Ordering Causal Ordering Sync Ordering FIFO Ordering Unordered multicast
18
Rajendra, IIIT Sri City
An Example 19
Rajendra, IIIT Sri City
An Example 20
Rajendra, IIIT Sri City
Another Example 21
Rajendra, IIIT Sri City
An Example 22
Rajendra, IIIT Sri City
Multicast receiver algorithm decides when to deliver a message to a process A received message may be:
delivered immediately (put on a delivery queue that the process reads) placed on a hold-back queue (because we need to wait for an earlier message) rejected/discarded (duplicate or earlier message that we no longer want)
23
Rajendra, IIIT Sri City
24
Rajendra, IIIT Sri City
Why Not? No global clocks … right?
25
Rajendra, IIIT Sri City
They are sorted in the same order in the delivery queue
If a process sends m before m’ then any other process that delivers m’ will have delivered m If a process delivers m’ before m” then every
26
Rajendra, IIIT Sri City
27
Rajendra, IIIT Sri City
Messages sequenced by Lamport or Vector timestamps
then every process that delivers m’ will have m delivered already
28
Rajendra, IIIT Sri City
29
Rajendra, IIIT Sri City
Each entry = number of the latest message from the corresponding group member that causally precedes the event 30
Rajendra, IIIT Sri City
When Pj sends a message, it increments its own entry and sends the vector
Vj[j] = Vj[j] + 1 Send Vj with the message
When Pi receives a message from Pj
Check that the message arrived in FIFO order from Pj : Vj[j] == Vi[j] + 1 ? Check that the message does not causally depend on something Pi has not seen ∀k, k ≠ j: Vj[k] ≤ Vi[k] ? If both conditions are satisfied, Pi will deliver the message Otherwise, hold the message until the conditions are satisfied
31
Rajendra, IIIT Sri City
Each entry = Number of the latest message from the corresponding group member that causally precedes the event message 32
Rajendra, IIIT Sri City
P2 receives message m1 from P1 with V1=(1,1,0) Is this in FIFO order from P1?
Compare current V on P2: V2=(0,0,0) with received V from P1, V1=(1,1,0) Yes: V2[1] = 0, received V1[1] = 1 ⇒ sequential order
Is V1[i] ≤ V2[i] for all other i?
Compare the same vectors: V2=(0,0,0) vs. V1=(1,1,0)
Therefore: hold back m1 at P2
33
Rajendra, IIIT Sri City
contd) P2 receives message m0 from P0 with V=(1,0,0) (1) Is this in FIFO order from P0?
Compare current V on P2: V2=(0,0,0) with received V from P2, V2=(1,0,0) Yes: V2[0] = 0, received V1[0] = 1 ⇒ sequential
(2) Is V0[i] ≤ V2[i] for all other i?
Yes
Deliver m0
Now check hold-back queue. Can we deliver m1?
34
Rajendra, IIIT Sri City
contd) Is the held-back message m1 in FIFO order from P0?
Compare current V on P2: V2=(1,0,0) with held-back V from P0, V1=(1,1,0) Yes: V2[1] = 0, received V1[1] = 1 ⇒ sequential
Is V0[i] ≤ V2[i] for all other i?
Now yes. Element 0: (1 ≤ 1), element 2: (0 ≤ 0); Deliver m1
More efficient than total ordering:
No need for a global sequencer. No need to send acknowledgements.
35
Rajendra, IIIT Sri City
Synchronization primitive Ensure all pending messages are delivered before any additional (post-sync) messages are accepted 36
Rajendra, IIIT Sri City
37
Rajendra, IIIT Sri City
38
Rajendra, IIIT Sri City
è Process Failures
è Good / Bad ordering è Various Types of Ordering of messaages
è Causal ordering based approach
Rajendra, IIIT Sri City
39
è Design Issues
è Process Failures
è Message Ordering
è Good / Bad ordering è Various Types of Ordering of messaages
è Group Communication
è Causal ordering based approach
è Many more to come up … stay tuned in !!
Rajendra, IIIT Sri City
40
rajendra [DOT] prasath [AT] iiits [DOT] in
è http://www.iiits.ac.in/FacPages/index- rajendra.html OR è http://rajendra.2power3.com 41
Rajendra, IIIT Sri City
and above)
and less than 8.5)
work will also be rewarded)
learning by helping the needy students
42
Rajendra, IIIT Sri City
Rajendra, IIIT Sri City
43