TimerShield Protecting High-Priority Tasks from Low-Priority Timer - - PowerPoint PPT Presentation

timershield
SMART_READER_LITE
LIVE PREVIEW

TimerShield Protecting High-Priority Tasks from Low-Priority Timer - - PowerPoint PPT Presentation

TimerShield Protecting High-Priority Tasks from Low-Priority Timer Interference Pratyush Patel 1,2 , Manohar Vanga 1 , Bjrn Brandenburg 1 1 MPI-SWS, 2 Carnegie Mellon University RTAS 2017 April 18, 2017 Pittsburgh, USA Kaiserslautern,


slide-1
SLIDE 1

TimerShield

Protecting High-Priority Tasks from Low-Priority Timer Interference

Pratyush Patel1,2, Manohar Vanga1, Björn Brandenburg1

1MPI-SWS, 2Carnegie Mellon University

RTAS 2017 April 18, 2017 Pittsburgh, USA

Kaiserslautern, Germany

slide-2
SLIDE 2

This Paper

2

slide-3
SLIDE 3

PREEMPT_RT

hrtimers

This Paper

3

slide-4
SLIDE 4

PREEMPT_RT

hrtimers

Default high-resolution timer subsystem

4

This Paper

slide-5
SLIDE 5

PREEMPT_RT

hrtimers

5

This Paper

Unnecessary low- priority timer-interrupt interference Default high-resolution timer subsystem

slide-6
SLIDE 6

PREEMPT_RT

hrtimers

6

This Paper

Unnecessary low- priority timer-interrupt interference Default high-resolution timer subsystem

TimerShield

slide-7
SLIDE 7

PREEMPT_RT

hrtimers

7

This Paper

Unnecessary low- priority timer-interrupt interference Default high-resolution timer subsystem A drop-in replacement for hrtimers

TimerShield

slide-8
SLIDE 8

Unnecessary low- priority timer-interrupt interference

hrtimers

Default high-resolution timer subsystem

TimerShield

A drop-in replacement for hrtimers

8

This Paper

PREEMPT_RT

slide-9
SLIDE 9

Unnecessary low- priority timer-interrupt interference PREEMPT_RT

hrtimers

Default high-resolution timer subsystem Eliminates low-priority timer-interrupt interference

TimerShield

9

This Paper

A drop-in replacement for hrtimers

slide-10
SLIDE 10

Talk Overv rview

Timers and the Interference Problem TimerShield Design Evaluation

10

slide-11
SLIDE 11

Talk Overv rview

Timers and the Interference Problem TimerShield Design Evaluation

11

slide-12
SLIDE 12

Hig igh-Resolution Tim imers

12

Core 1 Timer Core 2 Timer Core 3 Timer Core 4 Timer

slide-13
SLIDE 13

Hig igh-Resolution Tim imers

13

Core-local timers with cycle precision Core 1 Timer Core 2 Timer Core 3 Timer Core 4 Timer

slide-14
SLIDE 14

Hig igh-Resolution Tim imers

14

Can be programmed to raise an interrupt at a desired time Core-local timers with cycle precision Core 1 Timer Core 2 Timer Core 3 Timer Core 4 Timer

slide-15
SLIDE 15

Timers in in Real-Time OSes

15

slide-16
SLIDE 16

Timers in in Real-Time OSes

Job Releases Tasks can be woken up periodically using timers

16

slide-17
SLIDE 17

Timers in in Real-Time OSes

Job Releases Tasks can be woken up periodically using timers Budget Enforcement Schedulers use timers to prevent budget overruns

17

slide-18
SLIDE 18

Timers in in Real-Time OSes

Job Releases Tasks can be woken up periodically using timers Budget Enforcement Schedulers use timers to prevent budget overruns Self-Suspensions Tasks can use POSIX clock_nanosleep() to suspend themselves

18

slide-19
SLIDE 19

Assumptions

19

Uniprocessor Partitioned Multiprocessor Fixed-priority scheduling

slide-20
SLIDE 20

LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task

20

slide-21
SLIDE 21

LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task Calls clock_nanosleep(6)

21

slide-22
SLIDE 22

LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task Calls clock_nanosleep(6) Timer hardware is programmed to fire at the specified time

22

slide-23
SLIDE 23

HP LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task High-Priority Task High-priority task is released

23

slide-24
SLIDE 24

HP LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task High-Priority Task At t = 6, timer hardware fires an interrupt

24

slide-25
SLIDE 25

HP LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler At t = 6, timer hardware fires an interrupt HP is preempted to service the interrupt (LP task is woken up)

25

slide-26
SLIDE 26

HP is preempted to service the interrupt (LP task is woken up) HP task resumes LP HP HP LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler

26

slide-27
SLIDE 27

Unnecessary interference LP HP HP LP

Timer-Interrupt In Interference

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler

27

slide-28
SLIDE 28

Why Does In Interference Occur?

13 8 17 1 11 20 15

28

Linux’s hrtimer subsystem

slide-29
SLIDE 29

Why Does In Interference Occur?

13 8 17 1 11 20 15

29

Multiplexes many software timers on a single hardware timer using a time-ordered red-black tree Linux’s hrtimer subsystem

slide-30
SLIDE 30

Why Does In Interference Occur?

13 8 17 1 11 20 15

30

Earliest expiring timer is programmed into hardware Linux’s hrtimer subsystem

slide-31
SLIDE 31

Why Does In Interference Occur?

13 8 17 1 11 20 15

31

Earliest expiring timer is programmed into hardware Linux’s hrtimer subsystem But, earliest timer could belong to the lowest-priority task!

slide-32
SLIDE 32

Why Does In Interference Occur?

13 8 17 1 11 20 15

32

Earliest expiring timer is programmed into hardware Linux’s hrtimer subsystem But, earliest timer could belong to the lowest-priority task! May interrupt a higher-priority task!

slide-33
SLIDE 33

Why Does In Interference Occur?

13 8 17 1 11 20 15

33

Earliest expiring timer is programmed into hardware Linux’s hrtimer subsystem But, earliest timer could belong to the lowest-priority task! May interrupt a higher-priority task!

Key Problem hrtimers does not take into account the priority of the process that created the timer

slide-34
SLIDE 34

Talk Overv rview

Timers and the Interference Problem TimerShield Design Evaluation

34

slide-35
SLIDE 35

How Does TimerShield Work?

LP

2 4 6 8 10 12

Low-Priority Task

35

slide-36
SLIDE 36

How Does TimerShield Work?

LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task

36

slide-37
SLIDE 37

How Does TimerShield Work?

LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Mask all the low-priority timers

37

slide-38
SLIDE 38

How Does TimerShield Work?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Mask all the low-priority timers

38

slide-39
SLIDE 39

Process the expired low-priority timers

How Does TimerShield Work?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Mask all the low-priority timers

39

slide-40
SLIDE 40

Mask all the low-priority timers

How Does TimerShield Work?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler Process the expired low-priority timers Timer processing shifted

40

slide-41
SLIDE 41

How Does TimerShield Work?

LP HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler Timer processing (interrupt top-half) is safely deferred

41

slide-42
SLIDE 42

How is is TimerShield Im Implemented?

LP

2 4 6 8 10 12

Low-Priority Task

42

Timer inherits task priority

slide-43
SLIDE 43
  • 1. Find and reprogram

the earliest timer with priority ≥ HP

How is is TimerShield Im Implemented?

LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task

43

slide-44
SLIDE 44
  • 1. Find and reprogram

the earliest timer with priority ≥ HP

How is is TimerShield Im Implemented?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task

44

slide-45
SLIDE 45
  • 2. Process expired timers
  • f the highest priority

(lower priority ones can still be deferred)

How is is TimerShield Im Implemented?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task

45

  • 1. Find and reprogram

the earliest timer with priority ≥ HP

slide-46
SLIDE 46

How is is TimerShield Im Implemented?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler

46

  • 1. Find and reprogram

the earliest timer with priority ≥ HP

  • 2. Process expired timers
  • f the highest priority

(lower priority ones can still be deferred)

slide-47
SLIDE 47

How is is TimerShield Im Implemented?

HP LP

2 4 6 8 10 12

Low-Priority Task High-Priority Task Timer Handler

47

These operations need to be inexpensive to work well in practice

  • 1. Find and reprogram

the earliest timer with priority ≥ HP

  • 2. Process expired timers
  • f the highest priority

(lower priority ones can still be deferred)

slide-48
SLIDE 48

Priority-Based Earliest Tim imer

48

1: Find the earliest timer at each priority level

slide-49
SLIDE 49

Priority-Based Earliest Tim imer

49

1: Find the earliest timer at each priority level 2: Among these, find the earliest timer in the priority range [curr_task_prio, max_system_prio]

slide-50
SLIDE 50

Priority-Based Earliest Tim imer

50

1: Find the earliest timer at each priority level

A Range Minimum Query! (RMQ)

2: Among these, find the earliest timer in the priority range [curr_task_prio, max_system_prio]

slide-51
SLIDE 51

1 2 3

……

140 NULL Priority Level

51

1: Replicating Red-Black Trees

slide-52
SLIDE 52

1 2 3

……

140 NULL Priority Level

52

Earliest timer for each priority level

1: Replicating Red-Black Trees

slide-53
SLIDE 53

10 20 10 30 20 40 10

[0] [1] [2] [3] min [0,1] min [2, 3] min [0, 3]

53

2: Range Minimum Query – Segment Tree

slide-54
SLIDE 54

10 20 10 30 20 40 10

[0] [1] [2] [3] min [0,1] min [2, 3] min [0, 3]

54

Leaf nodes are the earliest timers for each priority level

2: Range Minimum Query – Segment Tree

slide-55
SLIDE 55

10 20 10 30 20 40 10

[0] [1] [2] [3] min [0,1] min [2, 3] min [0, 3]

55

Provides an efficient, O(log N) mechanism to find the earliest timer in the priority range [curr_task_prio, max_sys_prio]

2: Range Minimum Query – Segment Tree

slide-56
SLIDE 56

10 20 10 30 20 40 10

[0] [1] [2] [3] min [0,1] min [2, 3] min [0, 3]

56

2: Range Minimum Query – Segment Tree

N = number of (fixed) priority levels Constant time

  • peration!

Provides an efficient, O(log N) mechanism to find the earliest timer in the priority range [curr_task_prio, max_sys_prio]

slide-57
SLIDE 57

57

Further details in the paper! Open-source implementation at

https://people.mpi-sws.org/~bbb/papers/details/rtas17p/

TimerShield Im Implementation

slide-58
SLIDE 58

Talk Overv rview

Timers and the Interference Problem TimerShield Design Evaluation

58

slide-59
SLIDE 59

Evaluation

Prototyped in PREEMPT_RT

Intel Core-i5 4 x 3.2Ghz ARM Cortex-A53 4 x 1.2Ghz

59

slide-60
SLIDE 60

Evaluation

Prototyped in PREEMPT_RT

Intel Core-i5 4 x 3.2Ghz ARM Cortex-A53 4 x 1.2Ghz

Details in paper

60

slide-61
SLIDE 61

Evaluation

How costly are the new queueing data structures? How effective is TimerShield at isolating high-priority tasks from low-priority timer interrupts?

61

How is the context-switch duration affected?

slide-62
SLIDE 62

Evaluation

How costly are the new queueing data structures? How effective is TimerShield at isolating high-priority tasks from low-priority timer interrupts?

62

How is the context-switch duration affected?

slide-63
SLIDE 63

63

HP Task Response Time

We measured the response time of a high-priority task with varying number of low-priority, timer-using tasks

slide-64
SLIDE 64

64

HP Task Response Time

Taskset parameters based on S. Kramer, D. Ziegenbein, and A. Hamann, “Real world automotive benchmark for free,” in WATERS, 2015.

We measured the response time of a high-priority task with varying number of low-priority, timer-using tasks 1 KHz control loop with

  • approx. 200μs

computation time

slide-65
SLIDE 65

We measured the response time of a high-priority task with varying number of low-priority, timer-using tasks

65

HP Task Response Time

1 KHz control loop with

  • approx. 200μs

computation time

Taskset parameters based on S. Kramer, D. Ziegenbein, and A. Hamann, “Real world automotive benchmark for free,” in WATERS, 2015.

cyclictest tasks which periodically call clock_nanosleep()

slide-66
SLIDE 66

We measured the response time of a high-priority task with varying number of low-priority, timer-using tasks

66

HP Task Response Time

From 1 to 100 LP cyclictest tasks 1 KHz control loop with

  • approx. 200μs

computation time cyclictest tasks which periodically call clock_nanosleep()

Taskset parameters based on S. Kramer, D. Ziegenbein, and A. Hamann, “Real world automotive benchmark for free,” in WATERS, 2015.

slide-67
SLIDE 67

HP Task Response Time

67

slide-68
SLIDE 68

HP Task Response Time

68

Means 60% of the measured samples have a response time ≤ 214.8us

slide-69
SLIDE 69

Response Tim ime - hrtimers

69

1 LP cyclictest

slide-70
SLIDE 70

Response Tim ime - hrtimers

70

1 LP cyclictest 50 LP cyclictests 100 LP cyclictests

slide-71
SLIDE 71

Response Tim ime - hrtimers

71

1 LP cyclictest 50 LP cyclictests 100 LP cyclictests Long tail, high unpredictability

slide-72
SLIDE 72

Response Tim ime - Tim imerShield

72

slide-73
SLIDE 73

Response Tim ime - Tim imerShield

73

Response time with 1, 50 or 100 LP timers remains consistent!

slide-74
SLIDE 74

Response Tim ime - Tim imerShield

74

Response time with 1, 50 or 100 LP timers remains consistent! Slight shift due to cache effects of increasing number

  • f low-priority tasks
slide-75
SLIDE 75

75

How Bad Can It It Get?

Linux (and POSIX) provide no protection, and specifies no upper limit on timer creation

slide-76
SLIDE 76

76

How Bad Can It It Get?

We measured the response time of a high-priority task with a single, unprivileged, user-space task that spawned timers Linux (and POSIX) provide no protection, and specifies no upper limit on timer creation

slide-77
SLIDE 77

We measured the response time of a high-priority task with a single, unprivileged, user-space task that spawned timers

77

How Bad Can It It Get?

Using Linux’s timerfd API Linux (and POSIX) provide no protection, and specifies no upper limit on timer creation

slide-78
SLIDE 78

Response Tim ime - hrtimers

78

Idle system 100 LP timers 1000 LP timers

slide-79
SLIDE 79

Response Tim ime - hrtimers

79

Nearly 45us (22%) response- time increase with 1000 low-priority timers

slide-80
SLIDE 80

Response Tim ime - Tim imerShield

80

1000 LP timers 100 LP timers Idle System

slide-81
SLIDE 81

Response Tim ime - Tim imerShield

81

1000 LP timers 100 LP timers Idle System

TimerShield protects high-priority task response times from low-priority timer interrupts!

slide-82
SLIDE 82

Evaluation

How costly are the new queueing data structures? How effective is TimerShield at isolating high-priority tasks from low-priority timer interrupts?

82

How is the context-switch duration affected?

slide-83
SLIDE 83

83

Note: Results for a scenario without a timer-heavy load can be found in the paper.

Additional Context xt-Switch Delay

During context-switches, TimerShield processes expired timers, performs a RMQ, and optionally reprograms hardware

slide-84
SLIDE 84

84

We measured the total additional time incurred by TimerShield during context-switches in a timer-heavy scenario

Note: Results for a scenario without a timer-heavy load can be found in the paper.

Additional Context xt-Switch Delay

During context-switches, TimerShield processes expired timers, performs a RMQ, and optionally reprograms hardware

slide-85
SLIDE 85

1 high-priority and 50 low-priority timer-using tasks of the same priority

85

We measured the total additional time incurred by TimerShield during context-switches in a timer-heavy scenario

Note: Results for a scenario without a timer-heavy load can be found in the paper.

Additional Context xt-Switch Delay

During context-switches, TimerShield processes expired timers, performs a RMQ, and optionally reprograms hardware

slide-86
SLIDE 86

Additional Context xt-Switch Delay

86

1 2 3 4 5 6 7 8 9 Mean Median 99.9th percentile Max microseconds (us)

slide-87
SLIDE 87

Additional Context xt-Switch Delay

87

1 2 3 4 5 6 7 8 9 Mean Median 99.9th percentile Max microseconds (us)

Mean and median delay (typical case) is much less than a microsecond

slide-88
SLIDE 88

Additional Context xt-Switch Delay

88

1 2 3 4 5 6 7 8 9 Mean Median 99.9th percentile Max microseconds (us)

These reflect batch processing of multiple timers that were deferred

slide-89
SLIDE 89

Timer Processing Delay

89

2 4 6 8 10 12 14 16 Timer Processing Delay microseconds (us) Hrtimers TimerShield

We measured the worst- case increase in HP task response time under hrtimers with the same experimental setup

slide-90
SLIDE 90

Batch Processing is is Better!

90

2 4 6 8 10 12 14 16 Timer Processing Delay microseconds (us) Hrtimers TimerShield

hrtimers takes longer due to the repetitive switches to interrupt context!

slide-91
SLIDE 91

Batch Processing is is Better!

91

2 4 6 8 10 12 14 16 Timer Processing Delay microseconds (us) Hrtimers TimerShield

Context-switch delay due to TimerShield is small, and its batch processing of timers is faster than hrtimers

slide-92
SLIDE 92

Evaluation

How costly are the new queueing data structures? How effective is TimerShield at isolating high-priority tasks from low-priority timer interrupts?

92

How is the context-switch duration affected?

slide-93
SLIDE 93

Data-Structure Overheads

93

Note: Results for a scenario with a timer-heavy load can be found in the paper.

The worst case for TimerShield’s data-structures is with a single timer, because each operation modifies the segment tree

slide-94
SLIDE 94

Data-Structure Overheads

94

Note: Results for a scenario with a timer-heavy load can be found in the paper.

We measured the timer enqueue and dequeue cost on both subsystems for this setup The worst case for TimerShield’s data-structures is with a single timer, because each operation modifies the segment tree

slide-95
SLIDE 95

Timer Enqueue Cost

95

Lower is Better

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Mean Median 99.9th percentile Max microseconds (us) Hrtimers TimerShield

slide-96
SLIDE 96

Timer Enqueue Cost

96

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Mean Median 99.9th percentile Max microseconds (us) Hrtimers TimerShield

Performs negligibly worse on average Favourable towards the max, but the difference is miniscule Lower is Better

slide-97
SLIDE 97

Timer Dequeue Cost

97

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Mean Median 99.9th percentile Max microseconds (us) Hrtimers TimerShield

Both subsystems have very similar dequeue costs Lower is Better

slide-98
SLIDE 98

Evaluation Summary ry

Impossible for high-priority tasks to be interrupted by low-priority timers under TimerShield

98

Note: Further experiments, including results for ARM, can be found in the paper.

slide-99
SLIDE 99

Evaluation Summary ry

99

Note: Further experiments, including results for ARM, can be found in the paper.

Additional context-switch delay is small, and batch timer processing is faster with TimerShield Impossible for high-priority tasks to be interrupted by low-priority timers under TimerShield

slide-100
SLIDE 100

Evaluation Summary ry

TimerShield’s data structure costs are comparable to hrtimers

100

Note: Further experiments, including results for ARM, can be found in the paper.

Additional context-switch delay is small, and batch timer processing is faster with TimerShield Impossible for high-priority tasks to be interrupted by low-priority timers under TimerShield

slide-101
SLIDE 101

Implementation currently assumes unchanging timer priorities

101

Dynamic Tim imer Priorities

slide-102
SLIDE 102

Implementation currently assumes unchanging timer priorities

102

Real-time locking protocols, or users, may change task priorities

Dynamic Tim imer Priorities

slide-103
SLIDE 103

Implementation currently assumes unchanging timer priorities

103

Real-time locking protocols, or users, may change task priorities

Dynamic Tim imer Priorities

Works implicitly for the immediate priority ceiling protocol Implicitly works if priority is changed with no pending timers

slide-104
SLIDE 104

Implementation currently assumes unchanging timer priorities

104

Real-time locking protocols, or users, may change task priorities

Dynamic Tim imer Priorities

Works implicitly for the immediate priority ceiling protocol Implicitly works if priority is changed with no pending timers Can be easily extended to deal with dynamic priorities

slide-105
SLIDE 105

Future Work

Support for Earliest Deadline First (EDF) schedulers Applying similar techniques to other, multiplexed interrupt sources such as network packet interrupts

105

slide-106
SLIDE 106

Summary ry

106

slide-107
SLIDE 107

Summary ry

Low-priority timer interrupts have a significant negative impact on high-priority task execution

107

FP scheduling on uniprocessor/partitioned multiprocessors

slide-108
SLIDE 108

Summary ry

Low-priority timer interrupts have a significant negative impact on high-priority task execution

108

Existing high-resolution timer subsystems, such as Linux hrtimers, are not priority aware

FP scheduling on uniprocessor/partitioned multiprocessors

slide-109
SLIDE 109

Summary ry

Low-priority timer interrupts have a significant negative impact on high-priority task execution Existing high-resolution timer subsystems, such as Linux hrtimers, are not priority aware TimerShield completely avoids low-priority timer interrupt interference with small overheads

109

FP scheduling on uniprocessor/partitioned multiprocessors

slide-110
SLIDE 110

Thank you!

110

Source Code https://people.mpi-sws.org/~bbb/papers/details/rtas17p/

slide-111
SLIDE 111

Appendix

111

slide-112
SLIDE 112

LP

112

Why Not Global Scheduling?

Not deferring the wakeup of a low-priority task might allow it to execute on a different, possibly idle CPU HP HP LP

2 4 6 8 10 12 2 4 6 8 10 12

CPU 1 CPU 2

slide-113
SLIDE 113

Segment Tree

10 20 10 30 20 40 10

[0] [1] [2] [3] min [0,1] min [2, 3] min [0, 3] Leaf nodes correspond to the earliest timer

  • btained from each

red-black tree Priority Level

113

slide-114
SLIDE 114

Segment Tree

10 20 10 30 20 40 10

[0] [1] [2] [3] min [0,1] min [2, 3] Parent nodes store the minimum of their child nodes, and depict the earliest timer for the resulting priority range min [0, 3] Priority Range

114

slide-115
SLIDE 115

Code Siz ize and Memory ry

Increase in text segment 2 KiB Increase in data segment 35 KiB per core

115

How big is TimerShield code, and what are it’s memory requirements?

slide-116
SLIDE 116

0.2 0.4 0.6 0.8 1 1.2 Mean Median 99.9th percentile Max microseconds (us) Hrtimers TimerShield

Timer Enqueue Cost (tim imer-heavy vy)

116

Lower is Better

slide-117
SLIDE 117

Timer Dequeue Cost (timer-heavy vy)

117

0.1 0.2 0.3 0.4 0.5 0.6 Mean Median 99.9th percentile Max microseconds (us) Hrtimers TimerShield

Lower is Better

slide-118
SLIDE 118

HP Task Throughput Reduction

118

20 40 60 80 100 120 140 Hrtimers TimerShield Requests/ms

With 1000 background LP timers

Idle Throughput: 7044.4 requests/ms

Lower is Better