DISTRIBUTED SYSTEMS Department of Computing Science Umea University - PowerPoint PPT Presentation

Formal models for message passing algorithms • System has n processes, p 0 to p n-1 where i is the index of the process • The algorithm run by each p i is modeled as a process automaton a formal description of a sequential algorithm and is associated with a node in the topology. Distributed Systems - D N Ranasinghe 24

Formal models for message passing algorithms • A process automaton is a description of the process state machine • consists of a 5-tuple: { message alphabet, process states, initial states, message generation function, state transition function } – message_alphabet: content of messages exchanged – process_states: the finite set of states that a process can be in – initial_state: the start state of a process – message_gen_function: on the current process state how the next message is to be generated – state_trans_function: on the receipt of a messages, and based on current state, the next state to which the process should transit Distributed Systems - D N Ranasinghe 25

Description of system state • A configuration is a vector C = (q 0 ,…q n-1 ) where q i is a state of p i • In message passing systems two events can take place: computation event of process p i (application of the so called state transition function ), and delivery event, the delivery of message m from process p i to process p j consisting of a message sending event and a corresponding receiving event • Each message is uniquely identified by its sender process, sequence number and may be local clock value • The behaviour of the system over time is modeled as an execution which is a sequence of configurations alternating with events. Distributed Systems - D N Ranasinghe 26

Formal models for message passing algorithms Process Internal Computation (modules of the process) (receive) (send) Outgoing message Incoming message • All possible executions of a distributed abstraction must satisfy two conditions: safety and liveness. Distributed Systems - D N Ranasinghe 27

Formal models for message passing algorithms • Safety : ‘nothing bad has/can happen (yet)’ • e.g., ‘every step by a process p i immediately follows a step by process p 0 ’, or, ‘no process should receive a message unless the message was indeed sent’ • Safety is a property that can be violated at some time t and never be satisfied thereafter; doing nothing will also ensure safety! Distributed Systems - D N Ranasinghe 28

Formal models for message passing algorithms • Liveness : ‘eventually something good happens’ • a condition that must hold a number of times (possibly infinite), e.g., ‘eventually p 1 terminates’ => p 1 ’s termination happens once, or, liveness for a perfect link will require that if a correct process (one which is alive and well behaved) sends a message to a correct destination process, then the destination process should eventually deliver the message • Liveness is a property that for any time t , there is some hope that the property can be satisfied at some time t’ ≥ t Distributed Systems - D N Ranasinghe 29

Asynchronous systems • there is no fixed upper bound for message delivery time or, the time elapse between consecutive steps of a process • notion of ordering of events, local computation, message send or message receive are based on logical clocks an execution α of an asynchronous message passing system • is a finite or infinite sequence of the form C 0 , ϕ 1 , C 1 , ϕ 2 , C 2 ,…., where C k is a configuration of process states, C 0 is an initial configuration and ϕ k is an event that captures all of messages send, computation and message receive events. A schedule σ is a sequence of events in the execution, e.g., • ϕ 1 , ϕ 2 , …., where if the local processes are deterministic then, the execution is uniquely defined by (C 0 , σ ). Distributed Systems - D N Ranasinghe 30

Synchronous systems • There is a known upper bound on message transmission and processing delays • processes execute in lock step; execution is partitioned into ‘rounds’: C 0 , ϕ 1 |,C 1 , ϕ 2 |,C 2 ,…., • very convenient for designing algorithms, but not very practical • leads to some useful possibilities: e.g., timed failure detection – every process crash can be detected by all correct processes, can implement a lease abstraction • in a synchronous system with no failures, only the C 0 matters for a given algorithm, but in an asynchronous system, there can be many executions for a given algorithm Distributed Systems - D N Ranasinghe 31

• synchronous message passing state transition P recv() send( ) Q R Time round 1 round 2 round 3 new state curren upper bound on time t State Distributed Systems - D N Ranasinghe 32

Properties of algorithms • validity and agreement : specific to the objective of the algorithm • termination : an algorithm has terminated when all processes are terminated and there are no messages in transit • an execution can still be infinite, but once terminated, the process stays there taking ‘dummy’ steps • complexity : message (maximum number of messages sent over all possible executions) and time (equal to maximum number of rounds if synchronous; and in asynchronous, this is less straightforward Distributed Systems - D N Ranasinghe 33

Properties of algorithms • Interaction algorithms are possible for each process failure model • fail-stop – processes can fail by crashing but the crashes can be reliably detected by all other processes • fail-silent – where process crashes can never be reliably detected • fail-noisy – processes can fail by crashing, and the crashes can be detected, but not always in a reliable manner • fail-recovery – where processes can crash and later recover and still participate in the algorithm • Byzantine – processes deviate from the intended behaviour in an unpredictable manner • no solutions exist for all models in all interaction abstractions Distributed Systems - D N Ranasinghe 34

Coordination and Agreement Distributed Systems - D N Ranasinghe 35

• under this broad topic we will discuss – Leader election – Consensus – Distributed mutual exclusion • common or uniform decisions by participating processes to various internal and external stimuli is often required, in the presence of failures and synchrony considerations Distributed Systems - D N Ranasinghe 36

Leader election (LE) • a process that is correct and which acts as the coordinator in some steps of a distributed algorithm, is a leader; e.g., commit manager in a distributed database, central server in distributed mutual exclusion • LE abstraction can be straightforwardly implemented using a perfect failure detector (that is in a synchronous situation) Hierarchical LE : assumes the existence of a ranking order • agreed among processes apriori, s.t. a function O associates, with every process, those that precede in ranking, i.e., O (p 1 ) = ∅ , p 1 leader by default; O (p 2 ) = {p 1 }, if p 1 dies p 2 becomes leader; O (p 3 ) = {p 1 , p 2 } etc., Distributed Systems - D N Ranasinghe 37

Leader election (LE) LCR algorithm (LeLann-Chang-Roberts ): a simple ring based algorithm • assumptions: n processes each with a hard coded uid in a logical ring topology, unidirectional message passing-process p i to p (i+1) mod n , processes are not aware of ring size, asynchronous, no process failures, no message loss • leader is defined to be the process with the highest uid Distributed Systems - D N Ranasinghe 38

Leader election (LE) algorithm in prose: • each process forwards its uid to neighbour • if received uid < own uid , then discard, else if received uid > own uid , forward received uid to neighbour, else if received uid =own uid then declare self as leader uid 1 uid 2 P P uid n n 2 P uid 3 3 P 4 uid 4 Distributed Systems - D N Ranasinghe 39

Leader election (LE) • process automaton: message_alphabet : set U of uid ’s for each p i state i : defined by three state variables u ε U, initially uid i send ε U + null, initially uid i status ε { leader, unknown }, initially unknown msg i : place value of send on output channel; trans i : { send = null; receive v ε U on input channel; if v = null or else if v < u then exit; if v > u then send =v; if v = u then status = leader ;} Distributed Systems - D N Ranasinghe 40

Leader election (LE) • expected properties: validity – if a process decides, then the decided value is the largest uid of a process • termination – every correct process eventually decides • agreement – no two correct processes decide differently message complexity: O (n 2 ) • • time complexity: if synchronous, then n rounds until leader is discovered; 2n rounds until terminates • other possible scenarios: synchronous and processes are aware of ring size n (useful if processes fail), bidirectional ring (for a more efficient version of the algorithm) Distributed Systems - D N Ranasinghe 41

Leader election (LE) an O(n log n) message complexity algorithm (Hirschberg-Sinclair) • assumptions: bidirectional ring, where for every i , 0 ≤ i < n, p i has a • channel to left to p i+1 mod n , and p i has a channel to right to p i-1 , n processes each with a hard coded uid in a logical ring topology, processes are not aware of ring size, asynchronous, no process failures, no message loss uid 1 P P uid 2 uid k k 2 P uid 3 3 P 4 uid 4 Distributed Systems - D N Ranasinghe 42

Leader election (LE) algorithm in prose: • as before, a process sends its identifier around the ring and the message of the process with the highest identifier traverses the whole ring and returns • define a k-neighbourhood of a process p i to be the set of processes at distance at most k from p i in either direction, left and right • algorithm operates in phases starting from 0 • in the k th phase a process tries to become a winner for that phase, where it must have the largest uid in its 2 k neighbourhood • only processes that are winners in the k th phase can go to (k+1)th phase Distributed Systems - D N Ranasinghe 43

• to start with, in phase 0 each process attempts to become a phase 0 winner and sends probe messages to its left and right neighbours • if the identifier of the neighbour receiving the probe is higher, then it swallows the probe, else its sends back a reply message if it is at the edge of neighbourhood, else forwards probe to next in line • a process that receives replies from both its neighbours is a winner in phase 0 • similarly in a 2 k neighbourhood the k th phase winner will receive replies from the farthest two processes in either direction • a process which receives its own probe message declares itself winner Distributed Systems - D N Ranasinghe 44

Leader election (LE) pseudo code for p i : send <probe, uid i , phase, hop_count> to left and to right; initially phase=0, and hop_count=1 upon receiving <probe, j, k, d> from left (or right) { if j= uid i then terminate as leader; if j > uid i and d< 2 k then send <probe, j, k, d+1> to right (or left); // forward msg and increase hop count if j > uid i and d ≥ 2 k then // if reached edge, do not forward but send <reply, j, k> to left (or right);} // if j < uid, msg is swallowed upon receiving <reply,j,k> from left (or right) { if j ≠ uid i then send <reply, j,k> to right (or left) // forward else // reply is for own probe if already received <reply, j,k> from right (or left) then send <probe, uid i , k+1, 1> ;} // phase k winner Distributed Systems - D N Ranasinghe 45

Leader election (LE) • other possible scenarios: – synchronous with alternative ‘swallowing’ rules – any thing higher than minimum uid seen so far etc., with tweaking of uid usage – leads to a synchronous leader election algorithm whose message complexity is at most 4n Distributed Systems - D N Ranasinghe 46

DME • shared memory mutual exclusion is a well known aspect in operating systems when there is a need for concurrent threads to access a shared variable or object for read/write purposes • the shared resource is made a critical section with access to it controlled by atomic lock or semaphore operations • the lock or the semaphore variable is seen by all threads consistently • asynchronous shared memory is an alternative possibility: say, P 1 , P 2 and P 3 share M 1 and, P 2 and P 3 share M 2 Distributed Systems - D N Ranasinghe 47

DME • in a distributed system there will be no shared lock variable to look at • processes will have to agree on the process eligible to access the shared resource at any given time, by message passing • assumptions: system of n processes, p i , i=1..n; a process wishing to access an external shared resource must obtain permission to enter the critical section (CS); asynchronous, processes do not fail, messages are reliably delivered Distributed Systems - D N Ranasinghe 48

• correctness properties • ME1 safety : at most one process my execute in the CS at any given time • ME2 liveness : requests to enter and exit CS eventually succeed • ME3 ordering : if one request to enter the CS ‘happened- before’ another, then entry to the CS is granted in that order • ME2 ensures freedom from both starvation and deadlock Distributed Systems - D N Ranasinghe 49

DME • several algorithms exist: Central Server version, Ring, Ricart- Agrawala Central Server version Server 4 Queue of requests 2 3. Grant Token 1. Request 2. Release Token P 4 Token P 1 P 3 P 2 Distributed Systems - D N Ranasinghe 50

DME • In this scenario, there is a central server S that grants permission to the processes to enter CS based on a token request • ME1, ME2 satisfied due to weak assumptions • ME3 not - since arbitrary message delay may cause mis-order at S Distributed Systems - D N Ranasinghe 51

DME Ring algorithm: • assumptions: processes are ordered in a logical ring with unidirectional communication where each process p i communicates only with p (i+1) mod n .; system of n processes, p i , i=1..n; asynchronous, processes do not fail, messages are reliably delivered • mutual exclusion is obtained by sole possession of a token • ME1 and ME2 satisfied • correctness may not be guaranteed under violations of assumptions Distributed Systems - D N Ranasinghe 52

DME Ricart-Agrawala algorithm: • assumptions: each process p i has a unique identifier, uid i and maintains a logical scalar clock LC i ; system of n processes, p i , i=1..n; asynchronous, processes do not fail, messages are reliably delivered Distributed Systems - D N Ranasinghe 53

algorithm in prose: • a process p i desirous of accessing the CS multicasts a request message containing its (uid, timestamp) pair to whole group • a process receiving such a request unless it is already in CS or, is determined to enter CS and has a local clock less than LC i , responds to p i. • if p i receives responses from all then it can enter CS Distributed Systems - D N Ranasinghe 54

DME On initialization state := RELEASED ; To enter the section state := WANTED ; Multicast request to all processes; Ti := request’s timestamp wait until (number of replies received = (N-1)) state := HELD ; On receipt of a request <T i , P i > at p j (i != j) if ( state = HELD or ( state = WANTED and (Tj, pj) < (T i , p i ))) then queue request from p i without replying; else reply immediately to p i ; end if To exit the critical section state := RELEASED ; reply to any queued requests; Distributed Systems - D N Ranasinghe 55

DME 41 41 P 3 P 1 Reply 34 Reply Reply 34 41 P 2 34 • ME1,ME2, ME3 satisfied Distributed Systems - D N Ranasinghe 56

DME • Message complexity is easily derivable • In all three DME algorithms above, i.e., server based, ring based and R-A, process failures might violate termination requirements • message losses are not acceptable • even a perfect failure detector is not applicable since two amongst three algorithms are asynchronous Distributed Systems - D N Ranasinghe 57

Fault tolerant consensus • generally speaking, agreement or consensus by participating processes may be on a common value, on a message delivery order, on abort or commit, on a leader etc., • consensus is specified in terms of two primitives: propose and decide • properties to be satisfied: • termination – every correct process eventually decides some value • validity – if a process decides v , then v was proposed by some process • integrity - no process decides twice • agreement – no two correct processes decide differently Distributed Systems - D N Ranasinghe 58

Fault tolerant consensus • integrity + agreement = safety • validity + termination = liveness • key features: best effort broadcast with no message loss as a mechanism to convey to community of processes, synchronous, process failures – fail stop and Byzantine with key parameter f , the maximum number of processes that can fail, where the system is known as f-resilient • uncertainty in consensus in this failure model arises as a result of the possibility of a partial set of a process’s messages being only delivered at any round Distributed Systems - D N Ranasinghe 59

Flooding consensus – version 1 • assumptions: n processes in a strongly connected undirected graph, processes aware of group size, synchronous, maximally f fail stop processes (hard coded), no message loss, the set of possible decision values {V} is made of all proposed values, each process has exactly one proposed value, objective is ‘uniform’ decision Distributed Systems - D N Ranasinghe 60

Flooding consensus – version 1 algorithm in prose: • processes execute in rounds • each process maintains the set of proposals it has seen by the merger, and this set is augmented when moving from one round to next • in each round every process disseminates its augmented set to all others using best effort broadcast • a process decides a specific value in its set when the number of rounds equals ( f+1 ) Distributed Systems - D N Ranasinghe 61

p1 Consensus round (f+1) p2 p3 p4 t round 1 round 2 round 3 Distributed Systems - D N Ranasinghe 62

Flooding consensus – version 1 process automaton: message_alphabet : subsets of {V} for each p i state i : defined by three state variables rounds ε N, initially 0 decision ε {V} ∪ unknown, initially unknown W ⊆ V, initially the singleton set consisting of v i , p i ’s proposal msg i : if rounds ≤ f then broadcast W to all other processes; trans i : { rounds = rounds +1 ; receive value x j on input channel j; W = W ∪ ∪ ∪ ∪ ∪ j x j ; if rounds = f +1 then if |W | = 1 then decision = v, where W = {v} else decision = default;} Distributed Systems - D N Ranasinghe 63

Flooding consensus – version 1 proof sketch: • termination - all correct processes decide at the end of round f+1, whatever that decision may be • validity – suppose all initial proposals are identical to v , and hence W has only one element v , and v is the only possible decision • agreement – suppose if no process fails, then algorithm runs for 1 round only, and by the basic broadcast property, W seen by all are identical • in the worst case f failures can be distributed amongst each round but there is one final round to uniformatise the decision Distributed Systems - D N Ranasinghe 64

• performance: time complexity: (f+1) rounds • a particular feature of all consensus algorithms message complexity: (f+1)n 2 • • other possible decision functions apart from uniform are majority, minimum, maximum etc., Distributed Systems - D N Ranasinghe 65

Flooding consensus – version 2 • assumptions: n processes in a fully connected undirected graph, processes aware of group size, synchronous, fail stop crashes with perfect failure detector, no message loss, the set of possible decision values {V} is made of all proposed values, each process has exactly one proposed value, any deterministic decision function can be applied Distributed Systems - D N Ranasinghe 66

Flooding consensus – version 2 algorithm in prose: • processes execute in rounds • each process maintains the set of proposals it has seen by the merger, and this set is augmented when moving from one round to next • in each round every process disseminates its set to all others using best effort broadcast • a process decides a specific value in its set when it knows it has gathered all proposals that will ever be seen by any correct process or, it has detected no new failures in two successive rounds • a process so decides broadcasts its decision to the rest in next round; all correct processes so far have not decided will decide on the receipt of a decide message Distributed Systems - D N Ranasinghe 67

Flooding consensus – version 2 • agreement is strictly not violated : but correct processes must decide a value that must be consistent with values decided by processes that might have decided before crashing • suppose a process that receives messages from all others decide but crashes immediately afterwards before broadcasting to others • the rest move to next round detecting a failure and to the next where there may be no further failures and then may decide on a different outcome • problem can be mitigated by employing a reliable broadcast mechanism: a process must decide even if it is able to now, but only after a reliable form of broadcast Distributed Systems - D N Ranasinghe 68

Flooding consensus – version 2 • performance: worse case n rounds if (n-1) processes crash in sequence • impossibility of consensus under asynchronous fail-stop conditions • important result by Fischer, Lynch, Peterson: ‘no algorithm can guarantee to reach consensus in an asynchronous system even with one process crash failure’ • outcome is mainly due to the indistinguishability of a crashed process from a slow process in an asynchronous system Distributed Systems - D N Ranasinghe 69

Flooding consensus – version 2 • proof is complicated, but follows the argument that among many possible executions α there may be at least one that avoids consensus being reached • any alternative? • with ‘unreliable failure detectors’ – consensus can be solved in an asynchronous system with an unreliable failure detector if fewer than n/2 processes crash (Chandra and Toueg) Distributed Systems - D N Ranasinghe 70

Byzantine fault tolerance • Consensus in a synchronous system in the presence of malicious and/or adhoc process failures, known by the metaphor Byzantine failure • Generals commanding divisions of the Byzantine army communicate using reliable messengers • generals should decide on a common plan of action • some generals many be traitors and may prevent loyal generals from agreeing by sending conflicting messages to different generals Distributed Systems - D N Ranasinghe 71

Byzantine fault tolerance Four Generals scenario General 2 Army City Army Army General 3 General 1 Army General 4 Distributed Systems - D N Ranasinghe 72

Byzantine fault tolerance • assumptions: n processes in a fully connected undirected graph, processes aware of group size, synchronous, maximally f Byzantine fail processes (hard coded): a faulty process may send any message with any value at any time or keep silent, no message loss, a correct process detecting the absence of a message associates it with a ‘null’ value, one designated process initiates messages to others processes, messages are unsigned (oral), the set of possible decision values {V} is made of proposed value by designated process, objective is ‘majority’ decision Distributed Systems - D N Ranasinghe 73

Byzantine fault tolerance • properties to be satisfied: • termination – every correct process eventually decides • validity – if the sending process is correct then the message received is identical to the message sent (or, if the commanding general is loyal, then every loyal general obeys the order sent) • agreement – correct processes receive the same message (or, all loyal generals receive the same order) • impossibility with three processes Distributed Systems - D N Ranasinghe 74

Byzantine fault tolerance • G2 is a traitor CG attack attack G1 G2 retreat CG attack retreat • CG is a traitor attack G1 G2 retreat Distributed Systems - D N Ranasinghe 75

Byzantine fault tolerance • algorithm in prose: processes execute in rounds; the designated process initiates by best effort broadcast of message to others; each correct process maintains the set of proposals it has seen by the merger, and this set is augmented when moving from one round to next; in each round every correct process disseminates its set to all others except the designated process using best effort broadcast; a correct process decides a majority value in its set (or fall back to a default) when the number of rounds equals (f+1) • case (a) – three processes with participating general p 3 as traitor, case (b) – three processes with commanding general p 1 as traitor Distributed Systems - D N Ranasinghe 76

Byzantine fault tolerance P1 P1 1:V 1:W 1:X 1:V 2:1:V 2:1:W P2 P3 P2 P3 3:1:X 3:1:u • outcome: termination – satisfied by definition, whatever that decision is; validity – not satisfied for case (a) (p 2 does not follow p 1 ) and not applicable for case (b); agreement – satisfied for case (b) (p 2 and p 3 fall back on default) and not applicable for case (a) Distributed Systems - D N Ranasinghe 77

Byzantine fault tolerance • consensus with four processes: case (a) – four processes with participating general p 3 as traitor, case (b) four processes with commanding general p 1 as traitor Distributed Systems - D N Ranasinghe 78

Byzantine fault tolerance • outcome: case (a) – validity and agreement satisfied ; case (b) – validity not applicable, agreement – satisfied (p 2 , p 3 and p 4 fall back on default) • scenario with signed messages: digitally signing a message uniquely identifies a message and its originator • revisit the three process consensus: case (a) – traitor cannot alter commanding general’s message but can stay silent: validity satisfied (p 2 discards bogus message from p 3 ); case (b) – agreement satisfied (p 2 and p 3 fall back on default) • Byzantine agreement is solvable with three processes with one failure if processes digitally sign the messages Distributed Systems - D N Ranasinghe 79

Byzantine fault tolerance • complexity: time – (f+1) message – O(n f+1 ), an exponential message complexity • • generic result: Byzantine agreement is solvable with at least (3 f +1) processes in ( f +1) rounds where f is the maximum number of Byzantine failures • a constant message size BFT consensus alternative exists: provided n> 4 f and runs for 2( f +1) rounds Distributed Systems - D N Ranasinghe 80

Time and Global states Distributed Systems - D N Ranasinghe 81

• a distributed system by nature has no single clock and it is practically difficult to synchronise physical clocks across a system • notion of a mechanism to globally order events in an asynchronous system is an important requirement for replica management, consensus etc., Distributed Systems - D N Ranasinghe 82

Logical clocks • Leslie Lamport introduced the concept of causal relationship observable in a message passing distributed system Distributed Systems - D N Ranasinghe 83

Logical clocks • A potential causal ordering can be established by looking at ‘happened-before’ relationships (indicated by an arrow → ) between local events within a process as well as sending and receiving events across processes: e.g., p 1 : a → b, p 2 : c → d, p 3 : e → f, p 1 and p 2 : b → c, p 2 and p 3 : d → f etc., transitivity property: if x → y and y → z then x → z • concurrency definition: if ¬ (x → y) and ¬ (y → x) then we say • (x || y) it can be easily established that for p 1 and p 3 : a → f and, a || • e. Distributed Systems - D N Ranasinghe 84

Logical clocks • it is possible to time stamp the events of a distributed system such that rule 1 – if e 1 and e 2 are local events in p i and e 1 → e 2 then • C i (e 1 ) < C i (e 2 ) • rule 2 – if e 1 is the sending event of a message by p i and e 2 is the corresponding receiving of the message by p j the C i (e 1 ) < C j (e 2 ) j as event #j of process p i • generalised notation: e i • local history (possibly an infinite sequence of events) of p i as 1 e i 2 e i 3 .., and the global history of the system as H = h i = e i h 1 ∪ h 2 ∪ ..h n Distributed Systems - D N Ranasinghe 85

Logical clocks • Lamport clock timestamp rules: – given that LC(e i ) = logical time stamp of event e i and LC i = value of logical clock of p i then LC(e i ) = LC i + 1 if e i is an internal event or a send event = max (LC i , TS(m)) + 1 if e i is a receive event - where TS(m) is the time stamp of the received message - after occurrence of event e i on p i , the logical clock of p i is updated as LC i ← LC(e i ) Distributed Systems - D N Ranasinghe 86

Logical clocks properties: e → e’ ⇒ LC(e) < LC(e’) ; but note that • LC(e) < LC(e’) ¬ ⇒ e → e’ rec Pi v T (local clock 2 3 e i e i 1 e i (i)) m send Pj T (local clock 1 2 3 e j e j e j (j)) Lamport’s logical clocks enforce only a partial ordering of events How can a causal order of events be enforced? Vector Clocks by Mattern and Fidge Distributed Systems - D N Ranasinghe 87

Logical clocks • specification: VC(e i ) = vector time stamp of event e i on p i is a vector of size n : each element is VC(e i )[j]; j=1.. n , where n is the group size • for i=j, corresponds to the number of events on p i up to and including e i for i ≠ j, corresponds to the number of events on p j that happened • before e i (1,0,0) (2,0,0) p 1 a b m 1 (2,1,0) (2,2,0) Physical p 2 time c d m 2 (2,2,2) (0,0,1) p 3 e f Distributed Systems - D N Ranasinghe 88

Logical clocks • Vector clock timestamp rules: – VC i = vector clock of p i – if e i is an internal event or send(m) on p i then, ∀ j ≠ i, VC(e i )[j] ← VC i [j] and VC(e i )[i] = VC(e i )[i] + 1 – else {if e i is a receive event on p i of message m with vector timestamp VT(m)} then, VC(e i ) ← max (VC i , VT(m)) and VC(e i )[i] ← VC(e i )[i] +1 – after occurrence of event e k on p i , its vector clock is updated as VC i ← VC(e i ) – comparing two vector clocks: – VC(e) < VC(e’) iff ((VC(e) ≤ VC(e’)) and (VC(e) ≠ VC(e’))) where • VC(e) ≠ VC(e’) iff ∃ j s.t. VC(e)[j] ≠ VC(e’)[j] and • VC(e) ≤ VC(e’) iff ∀ j s.t. VC(e)[j] ≤ VC(e’)[j] Distributed Systems - D N Ranasinghe 89

Logical clocks Vector clock properties: e → e’ ⇔ VC(e) < VC(e’) • e || e’ ⇔ ¬ (VC(e) < VC(e’)) and ¬ (VC(e’) < VC(e)) • • Vector clocks impose a casual order of events Distributed Systems - D N Ranasinghe 90

Global property of a distributed computation properties to look for in a distributed system • garbage collection – objects having no references to it within a process can be discarded • deadlock detection – cyclic waiting for resources • termination detection – not only each process has halted but also there are no messages in transit • debugging – ensuring for example variables across processes remain within defined limits etc., Distributed Systems - D N Ranasinghe 91

Global property of a distributed computation • among these are the class of stable properties • stable ⇒ if once true, then remains true forever • to observe the state there is no omniscient observer who can record an instantaneous snapshot of the system state • useful concept if the system is asynchronous Distributed Systems - D N Ranasinghe 92

Global property of a distributed computation • first a few notations and definitions k be the state of a process p i after the occurrence of event e i k , • let q i 0 the initial state of p i and q i P1 e 1 e 11 1 e 1 e 1 e 2 e 3 12 13 e 1 e 4 14 P2 e 2 e 2 e e 21 1 22 2 P3 e 3 e 1 31 e 3 e 3 e 3 e 2 e e 32 33 3 34 4 Distributed Systems - D N Ranasinghe 93

Global property of a distributed computation • the global state of a distributed computation at any given k1 , q 2 k2 ,…… q n kn ): global state instant is defined by the tuple (q 1 does not include the state of the channels • cut of a distributed computation is defined as a subset of the cn where, h i ci = e i c1 ∪ h 2 c2 ∪ ..h n 1 global history H given by C = h 1 2 …..e i ci the local event history of p i up to event c i e i • Cut C is defined by the tuple (c 1 , c 2 , ….c n ) c2 …..q n c1 , q 2 cn ) corresponds to cut C • the global state (q 1 Distributed Systems - D N Ranasinghe 94

Global property of a distributed computation P1 e 1 e 1 e 1 e 1 11 e e 2 3 12 13 e 1 e 4 14 P2 e 2 e 2 e e 21 1 22 2 P3 e 3 e 1 e 3 e 3 31 e 2 e e 3 32 33 3 e 34 4 • Usefulness of a Cut C: Cut (C) – it is possible to express a global property of a distributed computation such as deadlock, computation terminated etc as a global state predicate Φ which evaluates a observed state to true or false Distributed Systems - D N Ranasinghe 95

Global property of a distributed computation • suppose a process p 0 , outside of the system ask each process p i its local state q i ; process p 0 builds the global state kn ) and Φ evaluates on Q to give { true, k1 , q 2 k2 ,…… q n Q = (q 1 false } consider some predicate Φ , evaluated on a consistent cut C • c2 …..q n cn ) such that Φ (C) = value c1 , q 2 expressed by state (q 1 of Φ on C = {T, F} let cut C precedes a cut C’ iff C ⊂ C’ • • a predicate is said to be stable iff the following property holds: Φ (C) ⇒ for all C ⊂ C’, Φ (C’) Distributed Systems - D N Ranasinghe 96

Global property of a distributed computation 300(CHF) 400 p1 Transfer of 100 150 200 p2 650 500 750 50 400 350 p3 600 Problem…! p4 100 Cut (C) Distributed Systems - D N Ranasinghe 97

• invariant for the bank transfer example: there should not be more money in the accounts than there was originally • global state defined by cut C’: (400, 650, 400, 600); total amount = 2050 > 1550; cut C’ is not consistent • definition: a cut C is consistent iff for all events e, e’ it is such that, e’ ε C and (e → e’) ⇒ e ε C • definition: a consistent global state is a global state defined by a consistent cut • vector clocks can be used to determine if a cut is consistent or not Distributed Systems - D N Ranasinghe 98

Consider • VC(e i )[j] – number of events on p j that happened before e i (on p i ) • VC(e j )[j] – number of events on p j before and including e j • therefore if VC(e i )[j] > VC(e j )[j] then e i is aware of more events on p j than e j it self • that is there was a subsequent event after e j on p j which caused e i • exactly an inconsistent cut Distributed Systems - D N Ranasinghe 99

• a cut C is consistent if and only if ∀ i,j: VC(e j cj )[j] ≥ ci can not be aware of more ci )[j] that is, cut event e i VC(e i cj it self events on p j than e j e i p i p j e j Distributed Systems - D N Ranasinghe 100

DISTRIBUTED SYSTEMS Department of Computing Science Umea University - PowerPoint PPT Presentation

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N Ranasinghe 1 Fundamental Concepts Distributed Systems - D N Ranasinghe 2 About Distributed Computing devising algorithms for a set of

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline Outline

Introduction to Distributed Systems Introduction to Distributed Systems Outline Outline

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

` James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure

Distributed Storage Systems part 1 Marko Vukoli Distributed Systems and Cloud Computing This

Coordinating distributed systems Marko Vukoli Distributed Systems and Cloud Computing Previous

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

WHAT WE TALK ABOUT WHEN WE TALK ABOUT DISTRIBUTED SYSTEMS ALVARO VIDELA DISTRIBUTED SYSTEMS

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

Distributed Databases Distributed database management system A distributed database (DDB) is

CSC2/458 Parallel and Distributed Systems Clocks Sreepathi Pai March 22, 2018 URCS Outline

Scaling Challenges in NAMD: Past and Future Outline NAMD: An Introduction Past Scaling

Message Ordering and Group Communications Course: Distributed Computing Faculty: Dr. Rajendra

Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5

The Design and Engineering of Concurrency Libraries Doug Lea SUNY Oswego Outline Overview of

Calculating - Confluence Compositionally Gordon J. Pace University of Malta, Malta

Time within Distributed Systems Time is important, however, it is problematic in distributed

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT F OR F

DISTRIBUTED SYSTEMS Department of Computing Science Umea University - PowerPoint PPT Presentation

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N Ranasinghe 1 Fundamental Concepts Distributed Systems - D N Ranasinghe 2 About Distributed Computing devising algorithms for a set of

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline Outline

Introduction to Distributed Systems Introduction to Distributed Systems Outline Outline

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

` James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure

Distributed Storage Systems part 1 Marko Vukoli Distributed Systems and Cloud Computing This

Coordinating distributed systems Marko Vukoli Distributed Systems and Cloud Computing Previous

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

WHAT WE TALK ABOUT WHEN WE TALK ABOUT DISTRIBUTED SYSTEMS ALVARO VIDELA DISTRIBUTED SYSTEMS

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Storage Systems part 2 Marko Vukoli Distributed Systems and Cloud Computing

Distributed Databases Distributed database management system A distributed database (DDB) is

CSC2/458 Parallel and Distributed Systems Clocks Sreepathi Pai March 22, 2018 URCS Outline

Scaling Challenges in NAMD: Past and Future Outline NAMD: An Introduction Past Scaling

Message Ordering and Group Communications Course: Distributed Computing Faculty: Dr. Rajendra

Time Synchronization and Logical Clocks CS 240: Computing Systems and Concurrency Lecture 5

The Design and Engineering of Concurrency Libraries Doug Lea SUNY Oswego Outline Overview of

Calculating - Confluence Compositionally Gordon J. Pace University of Malta, Malta

Time within Distributed Systems Time is important, however, it is problematic in distributed

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT F OR F

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges