Virtualize Everything but Time Timothy Broomhead ( - - PowerPoint PPT Presentation

virtualize everything but time
SMART_READER_LITE
LIVE PREVIEW

Virtualize Everything but Time Timothy Broomhead ( - - PowerPoint PPT Presentation

Virtualize Everything but Time Timothy Broomhead ( t.broomhead@ugrad.unimelb.edu.au ) Laurence Cremean ( l.cremean@ugrad.unimelb.edu.au ) Julien Ridoux ( jrid@unimelb.edu.au ) Darryl Veitch


slide-1
SLIDE 1

Centre for Ultra-Broadband Information Networks THE UNIVERSITY OF MELBOURNE

Virtualize Everything but Time

Timothy Broomhead ( t.broomhead@ugrad.unimelb.edu.au ) Laurence Cremean ( l.cremean@ugrad.unimelb.edu.au ) Julien Ridoux ( jrid@unimelb.edu.au ) Darryl Veitch ( dveitch@unimelb.edu.au )

1

slide-2
SLIDE 2

Introduction

!

Clock synchronization, who cares?

Network monitoring / Traffic analysis

Telecommunications Industry; Finance; Gaming, ...

Distributed `scheduling’: timestamps instead of message passing

!

Status quo under Xen

Based on ntpd, amplifies its flaws

Fails under live VM migration

!

We propose a new architecture

Based on RADclock client synchronization solution

Robust, accurate, scalable

Enables dependent clock paradigm

Seamless migration

2

2

slide-3
SLIDE 3

Key Idea

!

Each physical host has a single clock which never migrates

!

Only a (stateless) clock read function migrates

3

3

slide-4
SLIDE 4

!

Hypervisor

minimal kernel managing physical resources

!

Para-virtualization

Guest OS’s have access to hypervisor via hypercalls

Fully-virtualized more complex, not addressed here

!

Focus on Xen

But approach has general applicability !

Focus on Linux OS’s ( 2.6.31.13 Xen pvops branch )

Guest OS’s:

  • Dom0: privileged access to hardware devices
  • DomU: access managed by Dom0

Use Hypervisor 4.0 mainly

Para-Virtualization and Xen

4

4

slide-5
SLIDE 5

!

Clocks built on local hardware (oscillators ! counters)

HPET, ACPI, TSC

Counters imperfect, they drift (temperature driven)

Affected by OS

  • ticking rate
  • access latency

!

TSC (counts CPU cycles)

Highest resolution and lowest latency - preferred! but..

May be unreliable

  • multi-core ! multiple unsynchronised TSCs
  • power management ! variable rate, including stopping !

!

HPET

Reliable, but

Lower resolution, higher latency

Hardware Counters

5

5

slide-6
SLIDE 6

A hardware/software hybrid timer provided by the hypervisor

!

Purpose

Combine reliability of HPET with low latency of TSC

Compensate for TSC unreliability

Provides 1GHz 64-bit counter

!

Performance of XCS versus HPET

XCS performs well: low latency and high stability

HPET not that far behind, and a lot simpler

Xen Clocksource

6

6

slide-7
SLIDE 7

!

Timekeeping and timestamping are distinct

!

Raw timestamps and clock timestamps are distinct

!

A scaled counter is not a good clock: drift !

!

Purpose of clock sync algo is to correct for drift

!

Network based sync is convenient, exchange timing packets:

Clock Fundamentals

7

! Two key problems

Dealing with delay variability (complex, but possible)

Path asymmetry (simple, but impossible)

Server Host time Network

d↑ d↓

r

7

slide-8
SLIDE 8

!

NTP (ntpd)

Status Quo

Feedback based

  • Event timestamps are system clock stamps
  • Feedback controller (PLL,FLL) tries to lock onto rate

Intimate relationship with system clock (API, dynamics..)

In Xen, ntpd uses Xen Clocksource

!

RADclock (Robust Absolute and Difference Clock)

Algo developed in 2004, extensively tested

Feedforward based

  • Event timestamps are raw stamps
  • Clock error estimates made and removed when clock read

`System clock’ has no dynamics, just a function call

Can use any raw counter: here use HPET, Xen Clocksource

Synchronisation Algorithms

8

8

slide-9
SLIDE 9

Experimental Methodology

9 Unix PC NTP Server

Stratum 1

GPS Receiver

Hub Host DAG Card

PPS Sync. NTP flow UDP flow Timestamping

SW-GPS DAG-GPS

External Monitor Internal Monitor

UDP Sender & Receiver

Atomic Clock

RADclock RADclock

Hypervisor

ntpd-NTP ntpd-NTP

DomU Dom0

9

slide-10
SLIDE 10

Wots the problem? ntpd can perform well

!

Ideal Setup

Quality Stratum-1 time-server

Client is on the same LAN, lightly loaded, barely any traffic

Constrained and small polling period: 16 sec

10

5 10 15 20 20 40 60 80 Time [day] Clock error [µs] ntpd

10

slide-11
SLIDE 11

Or less well...

!

Different configuration (ntpd recommended!)

Multiple servers

Relax constraint on polling period

Still no load, no traffic, high quality servers

11

12 24 36 48 60 72 84 96 108 120 132 144 156 168 180 −1000 −500 500 1000 Single server 3 Co−Located Servers 3 Nearby Servers Hours Clock Error [µs] ntpd−NTP

When/Why? Loss of stability a complex function of parameters ⇒ unreliable

11

slide-12
SLIDE 12

The Xen Context

!

Three examples of inadequacy of ntpd based solution

1) Dependent ntpd clock 2) Independent ntpd clock 3) Migrating independent ntpd clock

12

12

slide-13
SLIDE 13

1) Dependent ntpd Clock

!

The Solution

Only Dom0 runs ntpd

Periodically updates a `boot time’ variable in hypervisor

DomU uses Xen Clocksource to interpolate

!

The Result (2.6.26 kernel)

13

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 −4000 −2000 2000 4000 Time [Hours] Clock error [µs] ntpd dependent

13

slide-14
SLIDE 14

2) Independent ntpd Clock (current solution)

!

The Solution

All guests run entirely separate ntpd daemons

Resource hungry

!

The Result

When all is well, works as before but with a bit more noise

When works: (parallel comparison on Dom0, stratum-1 on LAN)

14

5 10 15 20 20 40 60 80 Time [day] Clock error [µs] ntpd RADclock

14

slide-15
SLIDE 15

2) Independent ntpd Clock (current solution)

!

The Solution

All guests run entirely separate ntpd daemons

Resource hungry

!

The Result

Increased noise makes instability more likely

When fails: (DomU with some load, variable polling period, guest churn)

15

2 4 6 8 10 12 14 16 −5000 5000 Clock error [µs] Time [Hours] ntpd

15

slide-16
SLIDE 16

3) Migrating Independent ntpd Clock

!

The Solution

Independent clock as before, migrates

Starts talking to new system clock, new counter

!

The Result

16

Migration Shock!

More Soon

16

slide-17
SLIDE 17

RADclock Architecture

Principles

!

Timestamping:

๏ raw counter reads, not clock reads ๏ independent of the clock algorithm

!

Synchronization Algorithm:

๏ based on raw timestamps and server timestamps (feedforward) ๏ estimates clock parameters and makes available ๏ concentrated in a single module (in userland)

!

Clock Reading

๏ combines a raw timestamp with retrieved clock parameters ๏ stateless

17

17

slide-18
SLIDE 18

More Concretely

!

Timestamping

read chosen counter, say HPET(t)

!

Sync Algorithm maintains:

Period: a long term average (barely changes) ⇒ rate stability

K: sets origin to desired timescale (e.g. UTC)

E: estimate of error ⇒ updates on each stamp exchange

!

Clock Reading

Absolute clock: Ca(t) = Period *HPET(t) + K - E(t)

  • used for absolute, and differences above critical scale

Difference clock: Cd(t1,t2) = Period * ( HPET(t2) - HPET(t1) )

  • used for time differences under some critical time scale

18

18

slide-19
SLIDE 19

Implementation

!

Timestamping `feedforward support’

create cumulative and wide (64-bit) form of counter

make accessible from both kernel and user context

  • under Linux, modify Clocksource abstraction

!

Sync Algorithm

Make clock parameters available via a user thread

!

Clock reading

Read counter, retrieve clock data, compose

Fixed-point code to enable clock to be read from kernel

19

19

slide-20
SLIDE 20

On Xen!

!

Dependent Clock now very natural

Dom0 maintains a RADclock daemon, talks to timeserver

Makes Period, K, E available through Xenstore filesystem

Each DomU can just reads counter, retrieve clockdata, compose

!

All Guest Clocks identically the same, but:

Small delay (~1ms) in Xenstore update

  • stale data possible but very unlikely
  • small impact

Latency to read counter higher on DomU

!

Support Needed

Expose HPET to Clocksource in guest OSs

Add hypercall to access platform timer (HPET here)

Add read/right functions to access clockdata from Xenstore

20

Feedforward paradigm a perfect match to para-virtualisation

20

slide-21
SLIDE 21

Independent RADclock on Xen

๏ Concurrent test on two DomU’s, separate NTP streams

21

−10 10 1 2 3 x 10

−3

RADclock error [µs] HPET Med: −2.5 IQR: 9.3

50 100 150 200 250 300 350 −20 −10 10 20 RADclock Error [µs] Time [mn] Xen Clocksource HPET

−10 10 1 2 3 x 10

−3

RADclock error [µs] XEN Med: 3.4 IQR: 9.5

21

slide-22
SLIDE 22

Migration On Xen!

!

Clocks don’t migrate, only a clock reading function does!

Each Dom0 has its own RADclock daemon

DomU only ever calls a function, no state is migrated

!

Caveats

Local copy of clockdata used to limit syscalls - needs refreshing

Host asymmetry will change, result in small clock jump

  • asymmetry effects different for Dom0 (hence clock itself) and DomU

22

Feedforward paradigm a perfect match to migration

22

slide-23
SLIDE 23

Migration Comparison!

23

1 2 3 4 5 −50 50 100 150 200 250 Time [Hours] Clock error [µs] Dom0 − Tastiger Dom0 − Kultarr Migrated Guest RADclock Migrated Guest ntpd

!

Setup

Two machines, each Dom0 running a RADclock

One DomU migrates with a

dependent RADclock

independent ntpd

23

slide-24
SLIDE 24

Noise Overhead of Xen and Guests

24 Native Dom0 1 guest 2 guests 3 guests 4 guests 30 40 50 60 70 RTT Host [µs] 1 guest 2 guests 3 guests 4 guests 100 150 200 RTT Host [µs] DomU #1 DomU #2 DomU #3 DomU #4

24

slide-25
SLIDE 25

Noise Penalty Under C-States

25

C0 C1 C2 C3 50 60 70 80 90 100 110 RTT Host [µs] Xen Clocksource HPET Hypervisor

25

slide-26
SLIDE 26

Algo Performance Under C-States

26

C0 C1 C2 C3 −20 −15 −10 −5 5 10 15 20 RADclock Error: E−median(E) [µs] RADclock Xen RADclock HPET

26

slide-27
SLIDE 27

Conclusion

!

Feed-Forward approach has many advantages

Difference clock defined

Absolute clock can be made much more robust

Time can be replayed

Simpler kernel support

!

Good match to needs of para-virtualisation

Enables clock dependent mode that works

Allows seamless live migration

!

RADclock project

Aims to replace ntpd

Client and Server code

Packages for FreeBSD and Linux (Xen now supported)

http://www.cubinlab.ee.unimelb.edu.au/radclock/

27

27