Timekeeping in the Linux Kernel Stephen Boyd Qualcomm Innovation - - PowerPoint PPT Presentation

timekeeping in the linux kernel
SMART_READER_LITE
LIVE PREVIEW

Timekeeping in the Linux Kernel Stephen Boyd Qualcomm Innovation - - PowerPoint PPT Presentation

Timekeeping in the Linux Kernel Stephen Boyd Qualcomm Innovation Center, Inc. 1 / 40 In the beginning ... 2 / 40 there was a counter 0000ec544fef3c8a 3 / 40 Calculating the Passage of Time (in ns) c cycles c cycles c cycles = = ( )


slide-1
SLIDE 1

Timekeeping in the Linux Kernel

Stephen Boyd Qualcomm Innovation Center, Inc. 1 / 40

slide-2
SLIDE 2

In the beginning ...

2 / 40

slide-3
SLIDE 3

there was a counter

0000ec544fef3c8a

3 / 40

slide-4
SLIDE 4

Calculating the Passage of Time (in ns)

= = (

ccycles fHz ccycles f( )

1 seconds

ccycles f

)seconds ( = ⋅ 1e9 =

ccycles f

)seconds

ccycles f

Tns

4 / 40

slide-5
SLIDE 5

Calculating the Passage of Time (in ns)

Problems

Division is slow Floating point math Precision/overflow/underflow problems

= = (

ccycles fHz ccycles f( )

1 seconds

ccycles f

)seconds ( = ⋅ 1e9 =

ccycles f

)seconds

ccycles f

Tns

5 / 40

slide-6
SLIDE 6

Calculating the Passage of Time (in ns) Better

static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift) { return ((u64) cycles * mult) >> shift; }

6 / 40

slide-7
SLIDE 7

Calculating the Passage of Time (in ns) Better

static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift) { return ((u64) cycles * mult) >> shift; }

Where do mult and shift come from?

clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec)

7 / 40

slide-8
SLIDE 8

Abstract the Hardware!

Hardware Counter struct clocksource

Read

struct clocksource { cycle_t (*read)(struct clocksource *cs); cycle_t mask; u32 mult; u32 shift; ... }; clocksource_register_hz(struct clocksource *cs, u32 hz); clocksource_register_khz(struct clocksource *cs, u32 khz);

Time diff:

struct clocksource *cs = &system_clocksource; cycle_t start = cs->read(cs); ... /* do something for a while */ cycle_t end = cs->read(cs); clocksource_cyc2ns(end - start, cs->mult, cs->shift);

8 / 40

slide-9
SLIDE 9

POSIX Clocks

CLOCK_BOOTTIME CLOCK_MONOTONIC CLOCK_MONOTONIC_RAW CLOCK_MONOTONIC_COARSE CLOCK_REALTIME CLOCK_REALTIME_COARSE CLOCK_TAI 9 / 40

slide-10
SLIDE 10

POSIX Clocks Comparison

CLOCK_BOOTTIME CLOCK_MONOTONIC CLOCK_REALTIME 10 / 40

slide-11
SLIDE 11

Read Accumulate Track (RAT)

Best acronym ever

11 / 40

slide-12
SLIDE 12

RAT in Action (Read)

struct tk_read_base { struct clocksource *clock; cycle_t (*read)(struct clocksource *cs); cycle_t mask; cycle_t cycle_last; u32 mult; u32 shift; u64 xtime_nsec; ktime_t base; }; static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, cycle_t delta) { u64 nsec = delta * tkr->mult + tkr->xtime_nsec; return nsec >> tkr->shift; } static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) { cycle_t delta = (tkr->read(tkr->clock) - tkr->cycle_last) & tkr->mask; return timekeeping_delta_to_ns(tkr, delta); }

12 / 40

slide-13
SLIDE 13

RAT in Action (Accumulate + Track)

static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, u32 shift, unsigned int *clock_set) { u64 interval = tk->cycle_interval << shift; tk->tkr_mono.cycle_last += interval; tk->tkr_mono.xtime_nsec += tk->xtime_interval << shift; *clock_set |= accumulate_nsecs_to_secs(tk); ... } static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk) { u64 nsecps = (u64)NSEC_PER_SEC << tk->tkr_mono.shift; unsigned int clock_set = 0; while (tk->tkr_mono.xtime_nsec >= nsecps) { int leap; tk->tkr_mono.xtime_nsec -= nsecps; tk->xtime_sec++; ... }

13 / 40

slide-14
SLIDE 14

Juggling Clocks

struct timekeeper { struct tk_read_base tkr_mono; struct tk_read_base tkr_raw; u64 xtime_sec; unsigned long ktime_sec; struct timespec64 wall_to_monotonic; ktime_t offs_real; ktime_t offs_boot; ktime_t offs_tai; s32 tai_offset; struct timespec64 raw_time; };

14 / 40

slide-15
SLIDE 15

Handling Clock Drift

Vs.

⋅ 1e9 =

1 19200000

52.083

¯¯ ¯ ns

⋅ 1e9 =

1 19200008

52.083311ns

15 / 40

slide-16
SLIDE 16

Handling Clock Drift

Vs. After 100k cycles we've lost 2 ns

⋅ 1e9 =

100000 19200000

520833ns ⋅ 1e9 =

100000 19200008

5208331ns

16 / 40

slide-17
SLIDE 17

Mult to the Rescue!

Vs. Approach: Adjust mult to match actual clock frequency

(100000 ⋅ 873813333) ≫ 24 = 5208333ns (100000 ⋅ 873813109) ≫ 24 = 5208331ns

17 / 40

slide-18
SLIDE 18

Making Things Fast and Ecient

static struct { seqcount_t seq; struct timekeeper timekeeper; } tk_core ____cacheline_aligned; static struct timekeeper shadow_timekeeper; struct tk_fast { seqcount_t seq; struct tk_read_base base[2]; }; static struct tk_fast tk_fast_mono ____cacheline_aligned; static struct tk_fast tk_fast_raw ____cacheline_aligned;

18 / 40

slide-19
SLIDE 19

A Note About NMIs and Time

19 / 40

slide-20
SLIDE 20

Where We Are

20 / 40

slide-21
SLIDE 21

What if my system doesn't have a counter?

Insert #sadface here Can't use NOHZ Can't use hrtimers in "high resolution" mode Relegated to the jiffies clocksource:

static cycle_t jiffies_read(struct clocksource *cs) { return (cycle_t) jiffies; } static struct clocksource clocksource_jiffies = { .name = "jiffies", .rating = 1, /* lowest valid rating*/ .read = jiffies_read, ... };

21 / 40

slide-22
SLIDE 22

Let's talk about jies

22 / 40

slide-23
SLIDE 23

Let's talk about jies

Jiy = 1 / CONFIG_HZ

23 / 40

slide-24
SLIDE 24

Let's talk about jies

Jiy = 1 / CONFIG_HZ Updated during the "tick"

24 / 40

slide-25
SLIDE 25

The tick?

25 / 40

slide-26
SLIDE 26

The tick

Periodic event that updates jiffies process accounting global load accounting timekeeping POSIX timers RCU callbacks hrtimers irq_work 26 / 40

slide-27
SLIDE 27

Implementing the tick in hardware

Timer Value: 4efa4655 Match Value: 4efa4666

27 / 40

slide-28
SLIDE 28

Abstract the Hardware!

struct clock_event_device { void (*event_handler)(struct clock_event_device *); int (*set_next_event)(unsigned long evt, struct clock_event_device *); int (*set_next_ktime)(ktime_t expires, struct clock_event_device *); ktime_t next_event; u64 max_delta_ns; u64 min_delta_ns; u32 mult; u32 shift; unsigned int features; #define CLOCK_EVT_FEAT_PERIODIC 0x000001 #define CLOCK_EVT_FEAT_ONESHOT 0x000002 #define CLOCK_EVT_FEAT_KTIME 0x000004 int irq; ... }; void clockevents_config_and_register(struct clock_event_device *dev, u32 freq, unsigned long min_delta, unsigned long max_delta)

28 / 40

slide-29
SLIDE 29

Three event_handlers

struct clock_event_device { void (*event_handler)(struct clock_event_device *); int (*set_next_event)(unsigned long evt, struct clock_event_device *); int (*set_next_ktime)(ktime_t expires, struct clock_event_device *); ktime_t next_event; u64 max_delta_ns; ... }

Handler Usage tick_handle_periodic() default tick_nohz_handler() lowres mode hrtimer_interrupt() highres mode 29 / 40

slide-30
SLIDE 30

Ticks During Idle

tick_handle_periodic()

t1

tick

t2

tick

idle

tick

idle

tick

t3 time 30 / 40

slide-31
SLIDE 31

Tick-less Idle (i.e. CONFIG_NOHZ_IDLE)

tick_handle_periodic() tick_nohz_handler()

t1

tick

t2

tick

idle

tick

idle

tick

t3 time t1

tick

t2

tick

idle

tick

t3 time 31 / 40

slide-32
SLIDE 32

High Resolution Mode

tick_nohz_handler() hrtimer_interrupt()

t1

tick

t2

tick

idle

tick

t3 time t1

tick

t

hrt

2

tick

idle

hrt hrt tick

t3 time 32 / 40

slide-33
SLIDE 33

Tick Devices

enum tick_device_mode { TICKDEV_MODE_PERIODIC, TICKDEV_MODE_ONESHOT, }; struct tick_device { struct clock_event_device *evtdev; enum tick_device_mode mode; }; DEFINE_PER_CPU(struct tick_device, tick_cpu_device);

tick_device clockevent Hardware event 33 / 40

slide-34
SLIDE 34

Running the Tick

struct tick_sched { struct hrtimer sched_timer; ... };

34 / 40

slide-35
SLIDE 35

Running the Tick (Per-CPU)

struct tick_sched { struct hrtimer sched_timer; ... }; DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched);

35 / 40

slide-36
SLIDE 36

Stopping the Tick

Not always as simple as

hrtimer_cancel(&ts->sched_timer)

Could be that we need to restart the timer so far in the future

hrtimer_start(&ts->sched_timer, tick, HRTIMER_MODE_ABS_PINNED)

Needs to consider timers hrtimers RCU callbacks jiffie update responsibility clocksource's max_idle_ns (timekeeping max deferment) Details in tick_nohz_stop_sched_tick() 36 / 40

slide-37
SLIDE 37

Tick Broadcast

For when your clockevents FAIL AT LIFE i.e., they don't work during some CPU idle low power modes Indicated by CLOCK_EVT_FEAT_C3_STOP flag 37 / 40

slide-38
SLIDE 38

Timers

Operates with jiffies granularity Requirements jiffies increment clockevent softirq 38 / 40

slide-39
SLIDE 39

HRTimers

Operates with ktime (nanoseconds) granularity Requirements timekeeping increment clockevent tick_device 39 / 40

slide-40
SLIDE 40

What we covered

clocksources timekeeping clockevents jiffies NOHZ tick broadcast timers hrtimers

What's dicult

Timekeeping has to handle NTP and drift Tick uses multiple abstraction layers NOHZ gets complicated when starting/stopping the tick Tick broadcast turns up NOHZ to 11

Summary

40 / 40