Time within Distributed Systems Time is important, however, it is - - PowerPoint PPT Presentation

time within distributed systems
SMART_READER_LITE
LIVE PREVIEW

Time within Distributed Systems Time is important, however, it is - - PowerPoint PPT Presentation

Time within Distributed Systems Time is important, however, it is problematic in distributed systems as we cannot synchronize time perfectly Introducing Time Time is a quantity that we often want to measure accurately Algorithms that


slide-1
SLIDE 1

Time within Distributed Systems

Time is important, however, it is problematic in distributed systems as we cannot synchronize time perfectly

slide-2
SLIDE 2

Introducing Time

  • Time is a quantity that we often want to measure accurately
  • Algorithms that depend upon clock synchronization have

been developed in a lot of areas (not just within the distributed systems arena)

  • Physical time is problematic within distributed systems (for

lots of reasons)

slide-3
SLIDE 3

Clocks, Events and Process States

  • We define an event to be the occurrence of a single action

that a process carries out as it executes

  • An event is a communication action or a state-

transforming action

  • Clocks - every computer has one, and it can be used to

timestamp any event

slide-4
SLIDE 4

Clock Skew

  • Computer clocks are like all other clocks in that they tend

not to be in perfect agreement

  • Skew or Clock Drift is a factor
  • For ordinary clocks based on a quartz crystal, clock drift is

about 10-6 seconds/second - giving a difference of 1 second every 1,000,000 seconds (or 11.6 days)

  • The drift rate of a "high precision" quartz clock is about 10-7
  • r 10-8 seconds/second
slide-5
SLIDE 5

CTU (UTC)

  • CTU stands for Coordinated Universal Time and is set

from atomic clocks (which have a drift rate of one part in 10+13)

  • CTU (which is actually abbreviated as UTC) is an

international standard for timekeeping

  • Timing signals can be broadcast via radio signals (set to

UTC devices) as can satellite GPS systems

  • Computers with the appropriate (and expensive) receivers

attached can synchronize their clocks with UTC

slide-6
SLIDE 6

Synchronizing Physical Clocks

  • External synchronization - setting the time to some

external source of time

  • Internal synchronization - setting the time based on "local

agreement" (local time)

  • In a synchronous distributed system, bounds are known for

the drift rate of clocks, the maximum transmission delay is known, and the time to execute each processing step is set

  • so, synchronizing clocks is "easier"
  • Unfortunately, most distributed systems are asynchronous
slide-7
SLIDE 7

Cristian's Synchronizing Clocks

  • Cristian suggested the use of a time server, connected to a

device that receives signals from a source of UTC

slide-8
SLIDE 8

More on Cristian's Algorithm

  • Basic idea: Getting the current time from a “time server”,

using periodic client requests

  • Major problem – what happens if the time from the time

server is less than the client – resulting in time running backwards on the client! (Which cannot happen – time does not go backwards)

  • Minor problem results from the delay introduced by the

network request/response: latency

slide-9
SLIDE 9

Discussing Cristian's Algorithm

  • Single point of failure (if only one server used)
  • The time server may fail and thus render synchronization

temporarily impossible

  • Solution: a group of synchronized time servers can be

configured to which clients multicast requests

  • Research showed that if F is the number of faulty server

clocks out of a total of N servers, then we must have N > 3F if the other, correct, clocks are still to be able to achieve agreement

slide-10
SLIDE 10

The Berkeley Algorithm

  • A coordinator is chosen to act as the "master" clock
  • The master periodically polls the other computers (the

"slaves") to determine their local time

  • An average time is then calculated by the master and

distributed to the slaves to allow them to adjust their clocks to the "correct time"

slide-11
SLIDE 11

Berkeley in Action

Clocks running fast slow down (so that the other can catch up), clocks running slow skip forward to the correct time

slide-12
SLIDE 12

Discussing Berkeley's Algorithm

  • Faulty clocks can be dealt with due to the master's ability to

take a "fault-tolerant average" - a subset of clocks is chosen that do not differ from one another by more than a specified amount, and the average is taken of the time readings from

  • nly these clocks
  • An experiment involving 15 computers showed that Berkeley

could synchronize clocks to within 20-25 milliseconds

  • If the master suffers a failure, protocols exist to elect a

predecessor (that is, a new master)

slide-13
SLIDE 13

The Network Time Protocol (NTP)

Defines an architecture for a time service and a protocol to distribute time information over the Internet

slide-14
SLIDE 14

NTP Design Goals

  • To provide a service enabling clients across the Internet to

be synchronized accurately to UTC

  • To provide a reliable service that can survive lengthy losses
  • f connectivity
  • To enable clients to resynchronize sufficiently frequently to
  • ffset the rates of drift found in most computers
  • To provide protection against interference with the time

service, whether malicious or accidental

slide-15
SLIDE 15

How NTP Works

  • Provides a network of servers located across the Internet
  • Primary servers - attached to a UTC time source
  • Secondary servers - connected to a primary for

synchronization

  • The servers are connected in a logical hierarchy called a

"synchronization subnet", whose levels are called "strata"

  • The synchronization subnet can reconfigure as servers

become unreachable or failures occur

  • Messages a delivered using UDP
slide-16
SLIDE 16

Example Synchronization Subnet

1 2 3 2 3 3

Note: Arrows denote synchronization control, numbers denote strata.

slide-17
SLIDE 17

NTP's Modes

  • Multicast mode - used on high-speed LANs, requests are multicast

to a collection of NTP servers, then clients set their clocks assuming a small network delay (achieving relatively low accuracies)

  • Procedure-call mode - one computer accepts requests, replies with

a timestamp, which is then used to update client clocks (higher accuracies achievable)

  • Symmetric mode - intended to be used at strata level 1, where the

highest accuracies are to be achieved; pairs of servers exchange timing messages bearing timing information, and this information is retained over time allowing the two servers to very closely synchronize their clocks

slide-18
SLIDE 18

Logical Clocks

  • Synchronization is based on “relative time”.
  • Note that (with this mechanism) there is no requirement for

“relative time” to have any relation to the “real time”.

  • What’s important is that the processes in the Distributed

System agree on the ordering in which certain events

  • ccur.
  • Such “clocks” are referred to as Logical Clocks.
slide-19
SLIDE 19

Lamport’s Logical Clocks

  • First point: if two processes do not interact, then their

clocks do not need to be synchronized – they can operate concurrently without fear of interfering with each other

  • Second (critical) point: it does not matter that two

processes share a common notion of what the “real” current time is. What does matter is that the processes have some agreement on the order in which certain events occur

  • Lamport used these two observations to define the

“happens-before” relation (also often referred to within the context of Lamport’s Timestamps)

slide-20
SLIDE 20

The Happens-Before Relation, 1 of 4

  • If A and B are events in the same process, and A occurs

before B, then we can state that: A “happens-before” B is true

  • Equally, if A is the event of a message being sent by one

process, and B is the event of the same message being received by another process, then A “happens-before” B is also true

  • Note that a message cannot be received before it is sent,

since it takes a finite, nonzero amount of time to arrive … and, of course, time is not allowed to run backwards

slide-21
SLIDE 21

The Happens-Before Relation, 2 of 4

  • Obviously, if A “happens-before” B and B “happens-before”

C, then it follows that A “happens-before” C

  • If the “happens-before” relation holds, deductions about the

current clock “value” on each DS component can then be made

  • It therefore follows that if C(A) is the time on A, then C(A) is

less than C(B), and so on

slide-22
SLIDE 22

The Happens-Before Relation, 3 of 4

  • Now, assume three processes are in a DS: A, B and C
  • All have their own physical clocks (which are running at

differing rates due to “clock skew”, etc.)

  • A sends a message to B and includes a “timestamp”
  • If this sending timestamp is less than the time of arrival at B,

things are OK, as the “happens-before” relation still holds (i.e. A “happens-before” B is true)

  • However, if the timestamp is more than the time of arrival at

B, things are NOT OK (as A “happens-before” B is not true, and this cannot be as the receipt of a message has to occur after it was sent)

slide-23
SLIDE 23

The Happens-Before Relation, 4 of 4

  • The question to ask is: How can some event that “happens-

before” some other event possibly have occurred at a later time??

  • The answer is: it can’t!
  • So, Lamport’s solution is to have the receiving process

adjust its clock forward to one more than the sending timestamp value. This allows the “happens-before” relation to hold, and also keeps all the clocks running in a synchronized state. The clocks are all kept in sync relative to each other

slide-24
SLIDE 24

Lamports Clocks in Action

slide-25
SLIDE 25

Problem: Totally-Ordered Multicasting

  • Updating a replicated database and leaving it in an inconsistent state:

Update 1 adds 100 euro to an account, Update 2 calculates and adds 1% interest to the same account. Due to network delays, the updates may not happen in the correct order. Whoops!

slide-26
SLIDE 26

Solution: Totally-Ordered Multicasting

  • A multicast message is sent to all processes in the group,

including the sender, together with the sender’s timestamp

  • At each process, the received message is added to a local

queue, ordered by timestamp

  • Upon receipt of a message, a multicast

acknowledgment/timestamp is sent to the group

  • Due to the “happens-before” relationship holding, the

timestamp of the acknowledgment is always greater than that of the original message

slide-27
SLIDE 27

More on Totally Ordered Multicasting

  • Only when a message is marked as acknowledged by all

the other processes will it be removed from the queue and delivered to a waiting application

  • Lamport’s clocks ensure that each message has a unique

timestamp, and consequently, the local queue at each process eventually contains the same contents

  • In this way, all messages are delivered/processed in the

same order everywhere, and updates can occur in a consistent manner

slide-28
SLIDE 28

Totally-Ordered Multicasting, Revisited

  • Update 1 is time-stamped and multicast. Added to local queues
  • Update 2 is time-stamped and multicast. Added to local queues
  • Acknowledgments for Update 2 sent/received. Update 2 can now be processed
  • Acknowledgments for Update 1 sent/received. Update 1 can now be processed
  • Note: all queues are the same, as the timestamps have been used to ensure the

“happens-before” relation holds.

slide-29
SLIDE 29

In Summary

  • Handling Time with a DS is tricky!
  • So, we rarely try to deal with “real time”
  • Relative time (using Lamport's logical clocks) is the

preferred method when ensuring the correct ordering

  • f events within a DS