Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation

verteilte systeme distributed systems
SMART_READER_LITE
LIVE PREVIEW

Verteilte Systeme (Distributed Systems) Karl M. Gschka - - PowerPoint PPT Presentation

Verteilte Systeme (Distributed Systems) Karl M. Gschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Lecture 6: Clocks and Agreement Synchronization of physical clocks Logical clocks


slide-1
SLIDE 1

Verteilte Systeme (Distributed Systems)

Karl M. Göschka Karl.Goeschka@tuwien.ac.at

http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/

slide-2
SLIDE 2

Lecture 6: Clocks and Agreement

 Synchronization of physical clocks  Logical clocks and ordering  Distributed mutual exclusion  Election  Global state

slide-3
SLIDE 3

3

Clock Synchronization

 When each machine has its own clock, an event that

  • ccurred after another event may nevertheless be

assigned an earlier time.  Time is so basic to the way people think!

slide-4
SLIDE 4

4

Physical Clocks (1)

Computation of the mean solar day.

slide-5
SLIDE 5

5

Time and Clocks

 Historically, time has been measured astronomically: Solar day (transit of the sun) and solar second as 1/86400 of a solar day  Earth‘s rotation is not constant (core turbulence) and slowing down (tidal friction, atmospheric drag)  mean solar second (GMT)  9.192.631.770 transitions of Cesium 133 International Atomic Time (TAI) at the BIH  Coordinated Universal Time (UTC): UTC second = TAI second, but leap seconds keep UTC in phase with solar time

slide-6
SLIDE 6

6

Physical Clocks (2)

1 1 1 2 2 2 3 3 3 4 4 4 3 5 5 5 6 6 7 6 TAI UTC solar second leap second TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep UTC in phase with the sun.

slide-7
SLIDE 7

7

Timer

 A timer is a counter that counts clock ticks  Crystal oszillator  Battery backed CMOS RAM (initial setting)  Clock offset, skew, drift (different definitions in literature!)  UTC is provided e.g. by National Institute of Standard Time (NIST): WWV, GEOS, GPS,...  Real-time systems need actual clock time

 synchronize with real-world time (external)  synchronize with each other (internal)

slide-8
SLIDE 8

8

Clock Synchronization Algorithms

 The relation between clock time and UTC ticking at different rates.  Maximum drift rate determines required re-synchronization interval.

slide-9
SLIDE 9

9

Network Time Protocol (1)

Getting the current time from a time server.

T2‘=T2-θ T3‘=T3-θ

slide-10
SLIDE 10

10

Network Time Protocol (2)

 Time must never run backward  All nodes adjust (advance/slow down) their clocks locally  Estimate/measure propagation delay  Estimate offset and compute accuracy  Take best (minimum delay) of eight measures  Use multiple sources to improve accuracy  Hierarchical precision (strata)  ~ms (WAN), ~µs (LAN), ~ns (with hardware support, e.g., IEEE 1588)  Security?

slide-11
SLIDE 11

11

Network Time Protocol (3)

Stratum 0 Stratum 1 Stratum 2 Stratum 3

NTP precision levels

slide-12
SLIDE 12

12

Attacking time synchronization

slide-13
SLIDE 13

14

The Berkeley Algorithm

a) The time daemon asks all the other machines for their clock values b) The machines answer c) The time daemon tells everyone how to adjust their clock

slide-14
SLIDE 14

15

Clock Synchronization in Wireless (1)

 e.g., sensor networks  nodes are resource constrained  multihop routing is expensive  optimize algorithms for energy consumption   RBS – Reference Broadcast Synchronization

 internal sync (no absolute clock)  only receivers synchronize (based on receipt of reference message)  signal propagation time ~ constant (without multihop routing)

slide-15
SLIDE 15

16

Clock Synchronization in Wireless (2)

The usual critical path in determining network delays. The critical path in the case of RBS.

slide-16
SLIDE 16

Lecture 6: Clocks and Agreement

 Synchronization of physical clocks  Logical clocks and ordering  Distributed mutual exclusion  Election  Global state

slide-17
SLIDE 17

19

Time vs. Order (logical time)

 Synchronous system: Algorithms are easier to model, but clock synchronization needed  Asynchronous system: Today‘s reality, but many design problems can not be solved with deterministic algorithms  However, often no global clock and no clock synchronization are needed: It is sufficient to agree on the order of events (logical clocks) – time is relative, anyway  Then, some events are ordered, some are „concurrent“ (partial order)

slide-18
SLIDE 18

20

Making clocks move forward

 In many cases, wall clock time does not matter. All we care about is relative time. (L. Lamport)  (This is not true in some real-time systems.)

Fixed! This situation must be prevented

slide-19
SLIDE 19

21

Happened-before (1)

 Definition of logical clocks based on the happened-before relation to order events sequentially in a distributed system:

 Events in one process are ordered (local clock)  Message send happens before message receive  happened-before is transitive

 Events that are not ordered are concurrent (partial ordering)  Similar to physical causality, therefore also called potential causal ordering

slide-20
SLIDE 20

22

Happened-before (2)

 Feynman (space-time) diagrams document causality  Relationship is transitive: a happened-before f  Imposes a partial order (not total):

 abcdf  e||(a,b,c,d), but ef

p1 p2 p3 a b c d e f m1 m2 Physical time

slide-21
SLIDE 21

23

Logical clock implementation

 Captures happened-before ordering numerically  Lamport timestamps  Each node keeps a counter (LC):

  • 1. Increment LC before each event (computation,

send, receive)

  • 2. On message send, piggyback LC
  • 3. On message receive set local LC to

max(Local LC, Received LC) (time can only move forward) and then apply rule 1 for receipt (+1).

 Total order by adding process ID  ab  L(a) < L(b), but the converse is not true!

slide-22
SLIDE 22

24

Lamport clocks in middleware

The positioning of Lamport’s logical clocks in distributed systems.

slide-23
SLIDE 23

25

Example: Inconsistent replication

 Problem due to message delays and lack of global time  If (non-commutative) updates arrive in different orders at the two sites, the databases will become inconsistent.  We could require all messages to arrive at all nodes in the same

  • rder (Which may be too strong also  see causal).
slide-24
SLIDE 24

26

Synchronizing multicast messages

 Assume data is replicated on several servers  Updates to data are performed by clients  Update request is multicast to all servers  Multicast messages arrive in different orders at different servers  How to ensure consistency of data at all servers?

 Order message deliveries at servers…  Differentiate between receipt and delivery

slide-25
SLIDE 25

31

Totally-Ordered Multicast

 clients multicast their updates with (Lamport) timestamp (FIFO, reliable)  upon receipt, the message is put into local queue

  • rdered by timestamp

 server acknowledges receipt of requests by multicast (for total ordering)  eventually all processes will have the same copy of the local queue  a message that is at the head of the queue and has been acknowledged by all processes is delivered to server process (respective ACKs are deleted)  updates may not be done in “correct (?) order” but they are done in the same order at all nodes

slide-26
SLIDE 26

32

Vector Clocks - Principle

?

 Logical clocks order related events; nothing can be said about unrelated events  Problem with Lamport timestamps: L(a)<L(b) ≠> ab  Rather: L(a)<L(b)  (ab) or (a || b)   too restrictive

Concurrent message transmission using logical clocks.

slide-27
SLIDE 27

33

Vector Clocks - Example

a b c d e f m1 m2 (2,0,0) (1,0,0) (2,1,0) (2,2,0) (2,2,2) (0,0,1) p1 p2 p3 Physical time

slide-28
SLIDE 28

34

Vector Clocks - Algorithm

  • 1. Initially, Vi[j]=0
  • 2. Before Pi timestamps an event, Vi[i]:=Vi[i]+1
  • 3. Pi includes Vi in every message it sends
  • 4. When Pi receives a timestamp t in a message,

it sets Vi[j]:=max(Vi[j],t[j]) (merge operation), and then applies rule 2 for receipt.

slide-29
SLIDE 29

35

Vector Clocks – Usage

 Vi[i] is the number of events Pi has timestamped  Vi[j] (j≠i) is the number of events occurred at Pj on which Pi may causally depend  Comparison of vector clocks:

 V=V’ iff V[j]=V’[j]  j  V≤V’ iff V[j] ≤V’[j]  j  V<V’ iff V≤V’ and V ≠V’

 Now, V(a)<V(b)  ab (and vice-versa)  Disadvantage:

 more storage and message payload 

  • ptimizations exist
slide-30
SLIDE 30

39

Causal ordering using vector timestamps

P1 P2 P3 (0,0,0) (0,0,0) (0,0,0) (0,1,1) (0,1,0) (0,2,1) (0,1,1) (0,1,0) (0,1,1) (0,2,1) (0,1,0)

slide-31
SLIDE 31

Lecture 6: Clocks and Agreement

 Synchronization of physical clocks  Logical clocks and ordering  Distributed mutual exclusion  Election  Global state

slide-32
SLIDE 32

42

Mutual Exclusion

 Coordinate activities, share resources  critical section (monitor, semaphor)  locally assisted by OS in turn assisted by HW in order to guarantee atomic operations  Distributed mutex: based solely on message passing:

 Safety: At most one process may execute in the critical section at a time  Liveness: Requests to enter and exit the critical section eventually succeed (no deadlock, no starvation)  Ordering: Happened-before (fairness)

slide-33
SLIDE 33

43

A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b) Process 2 then asks permission to enter the same critical

  • region. The coordinator does not reply.

c) When process 1 exits the critical region, it tells the coordinator, which then replies to 2

slide-34
SLIDE 34

45

A Distributed Algorithm

a) Two processes want to enter the same critical region at the same moment. They each multicast their intention along with timestamp. b) Process 0 has the lowest timestamp, so it wins. c) When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

slide-35
SLIDE 35

46

A Token Ring Algorithm

a) An unordered group of processes on a network. b) A logical ring constructed in software: Token goes around ring; a process that owns the token can enter the critical region.

slide-36
SLIDE 36

48

A dezentralized probabilistic Algorithm

 Each resource replicated n times  Access requires majority m>n/2  Application in DHTs  Coordinator may reset (forget)  what if k=2m-n coordinators fail (i.e., (n-m)+k=m)?   Voting correctness violated, but probability extremely low (e.g., 10-40)  Scales well (non-deterministic)  BUT: Bad utilization if many competing nodes (starvation)

slide-37
SLIDE 37

49

A Comparison of the Four Algorithms

A comparison of four mutual exclusion algorithms.

slide-38
SLIDE 38

Lecture 6: Clocks and Agreement

 Synchronization of physical clocks  Logical clocks and ordering  Distributed mutual exclusion  Election  Global state

slide-39
SLIDE 39

51

Election algorithms in distributed systems

 Distributed agreement algorithms attempt to establish agreement among a set of processes about the value of a piece of information (e.g. what time is it?)  Election algorithms are one group of agreement algorithms  The problem is for a set of processes (participants) to elect a leader (e.g. who will be

  • ur coordinator?)

 Useful for many algorithms that require a (temporarily central) co-ordinator

slide-40
SLIDE 40

52

The Bully Algorithm (1)

a) Process 4 holds an election b) Process 5 and 6 respond, telling 4 to stop c) Now 5 and 6 each hold an election

slide-41
SLIDE 41

53

The Bully Algorithm (2)

d) Process 6 tells 5 to stop e) Process 6 wins and tells everyone by sending a COORDINATOR message

slide-42
SLIDE 42

56

A Ring Algorithm

 Election algorithm using a ring.  Organize the processes logically along a “ring”.

slide-43
SLIDE 43

58

Another Ring Algorithm

24 15 9 4 3 28 17 24 1

Note: The election was started by process 17. The highest process identifier encountered so far is 24. Participant processes are shown darkened

slide-44
SLIDE 44

60

Elections in Ad hoc networks

 best leader is elected  overlay is constructed (hierarchy)  resource capacities are taken into account (e.g., battery lifetime)   see example

slide-45
SLIDE 45

61

Elections in Wireless Environments (1)

Election algorithm in a wireless network, with node a as the

  • source. (a) Initial network. (b)–(e) The build-tree phase
slide-46
SLIDE 46

62

Elections in Wireless Environments (2)

Election algorithm in a wireless network, with node a as the

  • source. (a) Initial network. (b)–(e) The build-tree phase
slide-47
SLIDE 47

63

Elections in Wireless Environments (3)

(e) The build-tree phase. (f) Reporting of best node to source.

slide-48
SLIDE 48

64

Elections in large-scale P2P Systems

 Requirements for superpeer selection:

  • 1. Normal nodes should have low-latency access

to superpeers.

  • 2. Superpeers should be evenly distributed

across the overlay network.

  • 3. There should be a predefined portion of

superpeers relative to the total number of nodes in the overlay network.

  • 4. Each superpeer should not need to serve

more than a fixed number of normal nodes.

slide-49
SLIDE 49

Lecture 6: Clocks and Agreement

 Synchronization of physical clocks  Logical clocks and ordering  Distributed mutual exclusion  Election  Global state

slide-50
SLIDE 50

66

Global state predicates (1)

p2 p1 message garbage object

  • bject

reference

  • a. Garbage collection

p2 p1 wait-for wait-for

  • b. Deadlock

p2 p1 activate passive passive

  • c. Termination

state of communication channel! state of communication channel!

slide-51
SLIDE 51

67

Global state: Consistent cut

a) A consistent cut b) An inconsistent cut (effect without cause)

slide-52
SLIDE 52

68

Global state predicates (2)

 Stability

  • nce the system enters a state S0 in which the predicate is

true, it remains true in all future states reachable from S0

 Safety

 α is an undesirable predicate of the system‘s global state (e.g. being deadlocked)  Safety(α) at S0: α=false for all states reachable from S0 (i.e. „bad“ α will never happen)

 Liveness

 β is a desirable property (e.g. reach termination)  Lifeness(β) at S0: For any linearization starting from S0 β=true for some state SL reachable from S0 (i.e. „good“ β will eventually happen)

slide-53
SLIDE 53

69

‘Snapshot’ algorithm (1)

a) Organization of a process and channels for a distributed snapshot

Chandy and Lamport (1985): Record a set of process and channel states such that the recorded global state is consistent, even though the combination of recorded states may never have actually

  • ccurred at the same

time.

slide-54
SLIDE 54

71

‘Snapshot’ algorithm (3)

b) Process Q receives a marker for the first time (from other channel) and records its local state c) Q records all incoming message d) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel

slide-55
SLIDE 55

72

Example (1)

p1 p2 (empty) <$1000, 0> <$50, 2000> (empty) c2 c1

  • 1. Global state S0
  • 2. Global state S1
  • 3. Global state S2
  • 4. Global state S3

p1 p2 (Order 10, $100), M <$900, 0> <$50, 2000> (empty) c2 c1 p1 p2 (Order 10, $100), M <$900, 0> <$50, 1995> (five widgets) c2 c1 p1 p2 (Order 10, $100) <$900, 5> <$50, 1995> (empty) c2 c1 (M = marker message)

p1 p2 c2 c1 account widgets $1000 (none) account widgets $50 2000 Final state: P1 <$1000,0>; P2<$50,1995>; c1<five widgets>; c2<> Process P2 has already received an order for five widgets before S0

M

slide-56
SLIDE 56

75

Sinit Sfinal Ssnap actual execution e

0,e1,...

recording recording begins ends pre-snap: e

'0,e'1,...e'R-1

post-snap: e

'R,e'R+1,...

Reachability between states

 if a stable predicate is true in the snapshot, then it is also true in (any) final state.

slide-57
SLIDE 57

77

Summary

 Distributed processes need to synchronize their actions to ensure cooperation or fair competition  Lack of a global clock makes synchronization difficult  Often, ordering is enough: Logical clocks and vector stamps reduce the cost of synchronization  Distributed agreement algorithms are required when processes need to coordinate their actions.  Mutex, Election, Global state, ...