MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de - - PowerPoint PPT Presentation

mc714 sistemas distribu dos
SMART_READER_LITE
LIVE PREVIEW

MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de - - PowerPoint PPT Presentation

MC714: Sistemas Distribu dos Prof. Lucas Wanner Instituto de Computac ao, Unicamp Coordenac ao Aula 9: Sincronizac ao de rel ogios Aula 10: Rel ogios l ogicos e vetoriais Aula 11: Exclus ao m utua


slide-1
SLIDE 1

MC714: Sistemas Distribu´ ıdos

  • Prof. Lucas Wanner

Instituto de Computac ¸ ˜ ao, Unicamp

Coordenac ¸ ˜ ao Aula 9: Sincronizac ¸ ˜ ao de rel´

  • gios

Aula 10: Rel´

  • gios l´
  • gicos e vetoriais

Aula 11: Exclus˜ ao m´ utua Aula 12: Eleic ¸ ˜ ao de lider

slide-2
SLIDE 2

Clock Synchronization

Physical clocks Logical clocks Vector clocks

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 47

slide-3
SLIDE 3

Time and Frequency

Types of Information Time of day Time interval Frequency Question How is time defined?

3 / 47

slide-4
SLIDE 4
slide-5
SLIDE 5

Time and Frequency

Definition Second:

1 31,556,925.9747 of the tropical year for 1900.

The duration of 9192631770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom

Frequency: Events per second (Hz). Generation Oscillators can create signals that alternate periodically at a certain frequency Inductor-Capacitor (LC circuit) Resistor-Capacitor (RC circuit) Crystal Oscillator

5 / 47

slide-6
SLIDE 6

Crystal Oscillators

Principles Crystal resonator that strains (expands or contracts) when a voltage is applied When the voltage is reversed, the strain is reversed Voltage signal is taken from the resonator, amplified, and fed back to it. Rate of expansion and contraction is determined by size and cut of the crystal Sources of innaccuracy Crystal oscillators may deviate from their nominal frequency Cut (manufacturing) Environmental: temperature, pressure, vibration Accuracy is measured in PPM–parts per million

6 / 47

slide-7
SLIDE 7

Clock Innacuracies

Observation Two unsynchronized clocks will drift apart.

7 / 47

slide-8
SLIDE 8

Physical clocks

Problem Sometimes we simply need the exact time, not just an ordering. Solution Universal Coordinated Time (UTC): Based on the number of transitions per second of the cesium 133 atom (pretty accurate). At present, the real time is taken as the average of some 50 cesium-clocks around the world. Introduces a leap second from time to time to compensate that days are getting longer. Note UTC is broadcast through short wave radio and satellite. Satellites can give an accuracy of about ±0.5 ms.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 8 / 47

slide-9
SLIDE 9

Physical clocks

Problem Suppose we have a distributed system with a UTC-receiver somewhere in it ⇒ we still have to distribute its time to each machine. Basic principle Every machine has a timer that generates an interrupt H times per second. There is a clock in machine p that ticks on each timer interrupt. Denote the value of that clock by Cp(t), where t is UTC time. Ideally, we have that for each machine p, Cp(t) = t, or, in other words, dC/dt = 1.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 9 / 47

slide-10
SLIDE 10

Physical clocks

Fast clock P e r f e c t c l

  • c

k S l

  • w

c l

  • c

k Clock time, C dC dt > 1 dC dt = 1 dC dt < 1 UTC, t

In practice: 1−ρ ≤ dC

dt ≤ 1+ρ.

Goal Never let two clocks in any system differ by more than δ time units ⇒ synchronize at least every δ/(2ρ) seconds.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 10 / 47

slide-11
SLIDE 11

Global positioning system

Basic idea You can get an accurate account of time as a side-effect of GPS.

Height x (-6,6) r = 10 (14,14) r = 16 Point to be ignored

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 11 / 47

slide-12
SLIDE 12

Global positioning system

Problem Assuming that the clocks of the satellites are accurate and synchronized: It takes a while before a signal reaches the receiver The receiver’s clock is definitely out of synch with the satellite

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 12 / 47

slide-13
SLIDE 13

Global positioning system

Principal operation ∆r: unknown deviation of the receiver’s clock. xr, yr, zr: unknown coordinates of the receiver. Ti: timestamp on a message from satellite i ∆i = (Tnow −Ti)+∆r: measured delay of the message sent by satellite i. Measured distance to satellite i: c ×∆i (c is speed of light) Real distance is di = c∆i −c∆r =

  • (xi −xr)2 +(yi −yr)2 +(zi −zr)2

Observation 4 satellites ⇒ 4 equations in 4 unknowns (with ∆r as one of them)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 13 / 47

slide-14
SLIDE 14

Clock synchronization principles

Principle I Every machine asks a time server for the accurate time at least once every δ/(2ρ) seconds (Network Time Protocol). Note Okay, but you need an accurate measure of round trip delay, including interrupt handling and processing incoming messages.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 14 / 47

slide-15
SLIDE 15

Clock synchronization principles

Principle II Let the time server scan all machines periodically, calculate an average, and inform each machine how it should adjust its time relative to its present time. Note Okay, you’ll probably get every machine in sync. You don’t even need to propagate UTC time. Fundamental You’ll have to take into account that setting the time back is never allowed ⇒ smooth adjustments.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 15 / 47

slide-16
SLIDE 16

Cristian’s algorithm

Assuming we have a process P a server S with UTC Algorithm

1

P requests the time from S and measures T1 locally

2

After receiving the request from P , S prepares a response and appends the time Ts from its own clock.

3

P receives the response, measures T2 locally and then sets its time to be Tnew = Ts +(T2 −T1)/2 Problems Single point of failure / bottleneck Impostor or faulty server

16 / 47

slide-17
SLIDE 17

NTP – Network Time Protocol

17 / 47

slide-18
SLIDE 18

NTP – Network Time Protocol

Algorithm

1

A reads local timestamp t0, sends to A

2

B reads local timestamp t1 at reception

3

B reads local timestamp t2 at transmision, sends t1 and t2 to B

4

B computes round trip delay δ and offset θ.

18 / 47

slide-19
SLIDE 19

NTP – Network Time Protocol

Round trip delay δ = (t3 −t0)−(t2 −t1) Offset θ = (t1 −t0)+(t2 −t3) 2

19 / 47

slide-20
SLIDE 20

The Happened-before relationship

Problem We first need to introduce a notion of ordering before we can order anything. The happened-before relation If a and b are two events in the same process, and a comes before b, then a → b. If a is the sending of a message, and b is the receipt of that message, then a → b If a → b and b → c, then a → c Note This introduces a partial ordering of events in a system with concurrently operating processes.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 20 / 47

slide-21
SLIDE 21

Logical clocks

Problem How do we maintain a global view on the system’s behavior that is consistent with the happened-before relation? Solution Attach a timestamp C(e) to each event e, satisfying the following properties: P1 If a and b are two events in the same process, and a → b, then we demand that C(a) < C(b). P2 If a corresponds to sending a message m, and b to the receipt of that message, then also C(a) < C(b). Problem How to attach a timestamp to an event when there’s no global clock ⇒ maintain a consistent set of logical clocks, one per process.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 21 / 47

slide-22
SLIDE 22

Logical clocks

Solution Each process Pi maintains a local counter Ci and adjusts this counter according to the following rules: 1: For any two successive events that take place within Pi, Ci is incremented by 1. 2: Each time a message m is sent by process Pi, the message receives a timestamp ts(m) = Ci. 3: Whenever a message m is received by a process Pj, Pj adjusts its local counter Cj to max{Cj,ts(m)}; then executes step 1 before passing m to the application. Notes Property P1 is satisfied by (1); Property P2 by (2) and (3). It can still occur that two events happen at the same time. Avoid this by breaking ties through process IDs.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 22 / 47

slide-23
SLIDE 23

Logical clocks – example

6 12 18 24 30 36 42 48 54 60 8 16 24 32 40 48 56 64 72 80 10 20 30 40 50 60 70 80 90 100 m1 m2 m3 m4 6 12 18 24 30 36 42 48 70 76 8 16 24 32 40 48 61 69 77 85 10 20 30 40 50 60 70 80 90 100 m1 m2 m3 m4 P adjusts its clock P adjusts its clock (b) (a) P1 P2 P3 P1 P2 P3

2 1

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 23 / 47

slide-24
SLIDE 24

Logical clocks – example

Note Adjustments take place in the middleware layer

Application layer Middleware layer Network layer Message is delivered to application Adjust local clock Message is received Adjust local clock and timestamp message Application sends message Middleware sends message

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 24 / 47

slide-25
SLIDE 25

Example: Totally ordered multicast

Problem We sometimes need to guarantee that concurrent updates on a replicated database are seen in the same order everywhere: P1 adds $100 to an account (initial value: $1000) P2 increments account by 1% There are two replicas

Update 1 Update 2 Update 1 is performed before update 2 Update 2 is performed before update 1 Replicated database

Result In absence of proper synchronization: replica #1 ← $1111, while replica #2 ← $1110.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 25 / 47

slide-26
SLIDE 26

Example: Totally ordered multicast

Solution Process Pi sends timestamped message msgi to all others. The message itself is put in a local queue queuei. Any incoming message at Pj is queued in queuej, according to its timestamp, and acknowledged to every other process. Pj passes a message msgi to its application if: (1) msgi is at the head of queuej (2) for each process Pk, there is a message msgk in queuej with a larger timestamp. Note We are assuming that communication is reliable and FIFO ordered.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 26 / 47

slide-27
SLIDE 27

Vector clocks

Observation Lamport’s clocks do not guarantee that if C(a) < C(b) that a causally preceded b

6 12 18 24 30 36 42 48 70 76 8 16 24 32 40 48 61 69 77 85 10 20 30 40 50 60 70 80 90 100 m1 m2 m3 m5 m4 P1 P2 P3

Observation Event a: m1 is received at T = 16; Event b: m2 is sent at T = 20. Note We cannot conclude that a causally precedes b.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 27 / 47

slide-28
SLIDE 28

Vector clocks

Solution Each process Pi has an array VCi[1..n], where VCi[j] denotes the number of events that process Pi knows have taken place at process Pj. When Pi sends a message m, it adds 1 to VCi[i], and sends VCi along with m as vector timestamp vt(m). Result: upon arrival, recipient knows Pi’s timestamp. When a process Pj delivers a message m that it received from Pi with vector timestamp ts(m), it (1) updates each VCj[k] to max{VCj[k],ts(m)[k]} (2) increments VCj[j] by 1. Question What does VCi[j] = k mean in terms of messages sent and received?

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 28 / 47

slide-29
SLIDE 29

Vector Clocks

Comparing events with Vector Clocks VC1 ≤ VC2 iff VC1[i] ≤ VC2[i] ∀i = 1,...,N VC1 < VC2 iff VC1 ≤ VC2 & ∃j such that 1 ≤ j ≤ N and VC1[j] < VC2[j] VC1 and VC2 are causally related if VC1 < VC2 VC1|||VC2 iff VC1 VC2 and VC2 VC1 VC1 and VC2 are concurrent if VC1|||VC2

29 / 47

slide-30
SLIDE 30

Causally ordered multicasting

Observation We can now ensure that a message is delivered only if all causally preceding messages have already been delivered. Adjustment Pi increments VCi[i] only when sending a message, and Pj “adjusts” VCj when receiving a message (i.e., effectively does not change VCj[j]). Pj postpones delivery of m until: ts(m)[i] = VCj[i]+1. ts(m)[k] ≤ VCj[k] for k = i.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 30 / 47

slide-31
SLIDE 31

Causally ordered multicasting

Example

P0 P1 P2

  • VC = (0,0,0)

2

VC = (1,0,0)

2

VC = (1,1,0)

1

VC = (1,0,0) VC = (1,1,0) VC = (1,1,0)

2

m m*

Example Take VC2 = [0,2,2], ts(m) = [1,3,0] from P0. What information does P2 have, and what will it do when receiving m (from P0)?

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 31 / 47

slide-32
SLIDE 32

Mutual exclusion

Problem A number of processes in a distributed system want exclusive access to some resource. Basic solutions Via a centralized server. Completely decentralized, using a peer-to-peer system. Completely distributed, with no topology imposed. Completely distributed along a (logical) ring.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 32 / 47

slide-33
SLIDE 33

Mutual exclusion: centralized

(a) (b) (c) 1 1 1 3 3 3 2 2 2 2 Request Request Release OK OK Coordinator Queue is empty No reply

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 33 / 47

slide-34
SLIDE 34

Decentralized mutual exclusion

Principle Assume every resource is replicated n times, with each replica having its own coordinator ⇒ access requires a majority vote from m > n/2 coordinators. A coordinator always responds immediately to a request. Assumption When a coordinator crashes, it will recover quickly, but will have forgotten about permissions it had granted. In the book: robustness analysis showing low probability of violation under reasonable availability assumptions.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 34 / 47

slide-35
SLIDE 35

Mutual exclusion Ricart & Agrawala

Principle The same as Lamport except that acknowledgments aren’t sent. Instead, replies (i.e. grants) are sent only when The receiving process has no interest in the shared resource; or The receiving process is waiting for the resource, but has lower priority (known through comparison of timestamps). In all other cases, reply is deferred, implying some more local administration.

1 1 1 2 2 2 8 8 8 12 12 12 OK OK OK OK Accesses resource Accesses resource (a) (b) (c)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 35 / 47

slide-36
SLIDE 36

Mutual exclusion: Token ring algorithm

Essence Organize processes in a logical ring, and let a token be passed between them. The one that holds the token is allowed to enter the critical region (if it wants to).

1 2 3 4 5 6 7 2 4 7 1 6 5 3 (a) (b)

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 36 / 47

slide-37
SLIDE 37

Mutual exclusion: comparison

Algorithm # msgs per Delay before entry Problems entry/exit (in msg times) Centralized 3 2 Coordinator crash Decentralized 2mk + m, k = 1,2,... 2mk Starvation, low efficiency Distributed 2 (n – 1) 2 (n – 1) Crash of any process Token ring 1 to ∞ 0 to n – 1 Lost token, process crash

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 37 / 47

slide-38
SLIDE 38

Election algorithms

Principle An algorithm requires that some process acts as a coordinator. The question is how to select this special process dynamically. Note In many systems the coordinator is chosen by hand (e.g. file servers). This leads to centralized solutions ⇒ single point of failure. Question If a coordinator is chosen dynamically, to what extent can we speak about a centralized or distributed solution? Question Is a fully distributed solution, i.e. one without a coordinator, always more robust than any centralized/coordinated solution?

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 38 / 47

slide-39
SLIDE 39

Election by bullying

Principle Each process has an associated priority (weight). The process with the highest priority should always be elected as the coordinator. Issue How do we find the heaviest process? Any process can just start an election by sending an election message to all other processes (assuming you don’t know the weights of the others). If a process Pheavy receives an election message from a lighter process Plight, it sends a take-over message to Plight. Plight is out of the race. If a process doesn’t get a take-over message back, it wins, and sends a victory message to all other processes.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 39 / 47

slide-40
SLIDE 40

Election by bullying

1 2 4 5 6 3 7 1 2 4 5 6 3 7 1 2 4 5 6 3 7 1 2 4 5 6 3 7 Election Election E l e c t i

  • n

Election OK OK Previous coordinator has crashed E l e c t i

  • n

Election 1 2 4 5 6 3 7 OK Coordinator (a) (b) (c) (d) (e) Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 40 / 47

slide-41
SLIDE 41

Election in a ring

Principle Process priority is obtained by organizing processes into a (logical) ring. Process with the highest priority should be elected as coordinator. Any process can start an election by sending an election message to its successor. If a successor is down, the message is passed on to the next successor. If a message is passed on, the sender adds itself to the list. When it gets back to the initiator, everyone had a chance to make its presence known. The initiator sends a coordinator message around the ring containing a list of all living processes. The one with the highest priority is elected as coordinator.

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 41 / 47

slide-42
SLIDE 42

Election in a ring

Question Does it matter if two processes initiate an election? Question What happens if a process crashes during the election?

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 42 / 47

slide-43
SLIDE 43

Superpeer election

Issue How can we select superpeers such that: Normal nodes have low-latency access to superpeers Superpeers are evenly distributed across the overlay network There is be a predefined fraction of superpeers Each superpeer should not need to serve more than a fixed number of normal nodes

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 43 / 47

slide-44
SLIDE 44

Superpeer election

DHTs Reserve a fixed part of the ID space for superpeers. Example: if S superpeers are needed for a system that uses m-bit identifiers, simply reserve the k = ⌈log2 S⌉ leftmost bits for superpeers. With N nodes, we’ll have, on average, 2k−mN superpeers. Routing to superpeer Send message for key p to node responsible for p AND 11···11

  • k

00···00

  • m−k

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 44 / 47

slide-45
SLIDE 45

Exerc´ ıcios

1

Considere dois servidores em um sistema distribu´ ıdo. Ambos tˆ em rel´

  • gios que

deveriam contar 1000 ticks a cada milissegundo. Um deles tˆ em um rel´

  • gio preciso, e
  • outro tem um rel´
  • gio que conta apenas 990 ticks a cada milissegundo. Depois de

uma hora, qual ser´ a o desvio de tempo entre os dois servidores?

2

Quando uma m´ aquina sincroniza o seu rel´

  • gio com uma outra m´

aquina, em geral ´ e uma boa ideia manter um hist´

  • rico de medic

¸ ˜

  • es. Porquˆ

e? Dˆ e um exemplo onde este hist´

  • rico pode ser usado para melhorar o processo de sincronizac

¸ ˜ ao?

3

Explique como rel´

  • gios l´
  • gicos (Lamport) funcionam.

4

Explique como multicast totalmente ordenado pode ser implementado com rel´

  • gios

  • gicos.

5

Crie uma estrat´ egia alternativa para multicast totalmente ordenado.

6

No multicast completamente ordenado com rel´

  • gios l´
  • gicos (Lamport), ´

e estritamente necess´ ario que a recepc ¸ ˜ ao de cada mensagem seja confirmada (ACK)?

45 / 47

slide-46
SLIDE 46

Revis˜ ao: Exerc´ ıcios

7

Escreva um algoritmo para exclus˜ ao m´ utua decentralizada.

8

Considerando seus vetores de rel´

  • gio, responda se os eventos abaixo s˜

ao concorrentes (a ||| b), se a precede b (a → b) ou b precede a (b → a). a(4, 3, 1, 1) b(1, 3, 1, 2) a(4, 3, 1, 1) b(1, 3, 1, 1) a(4, 3, 1, 1) b(4, 4, 1, 2) a(4, 3, 1, 1) b(5, 4, 1, 2) a(4, 3, 1, 1) b(1, 1, 1, 5)

46 / 47

slide-47
SLIDE 47

Exerc´ ıcios

9

Quatro processos falharam, mas deixaram logs possivelmente incompletos de suas execuc ¸ ˜

  • es. Cada coluna representa o log de um processo, em que est˜

ao registrados eventos internos (e) e de envio (send) e recepc ¸ ˜ ao (rec) de mensagens, com o rel´

  • gio

do evento entre parˆ enteses:

P0 P1 P2 P3 e(0) e(0) e(0) e(0) send m1 to P1 (1) rec m1 from P0 (2) e(1) send m3 to P1(1) e(2) send m2 to P2 (3) rec m2 from P1 (4) e(2) e(3) rec m3 from P3 (4) send m4 to P1 (5) rec m5 from P0 (5)

Um corte C do estado de execuc ¸ ˜ ao de m´ ultiplos processos ´ e consistente se para todos eventos a, b, (b ∈ C ∧a → b) ⇒ (a ∈ C). Intuitivamente, se um evento b ´ e parte de um corte, ent˜ ao todos os eventos que aconteceram antes de b tamb´ em devem ser parte do corte. Apresente o diagrama espac ¸o-tempo referente aos logs acima e indique se terminam em um corte consistente. Caso contr´ ario, qual(is) evento(s) precisa(m) ser descartado(s) para se conseguir um estado consistente?

47 / 47