Writing Temporally Predictable Code Peter Puschner Benedikt Huber - - PowerPoint PPT Presentation

writing
SMART_READER_LITE
LIVE PREVIEW

Writing Temporally Predictable Code Peter Puschner Benedikt Huber - - PowerPoint PPT Presentation

Writing Temporally Predictable Code Peter Puschner Benedikt Huber slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015 Task Execution Time 1.


slide-1
SLIDE 1

Writing Temporally Predictable Code

Peter Puschner Benedikt Huber

slides credits: P. Puschner, R. Kirner, B. Huber

VU 2.0 182.101 SS 2015

slide-2
SLIDE 2

Task Execution Time

2

  • 1. Sequence of actions

(execution path)

  • 2. Duration of each
  • ccurrence of an

action on the path Actual path and timing of an execution depends on task inputs

a1 a2 a3 a4 a5 a6 a7 a9 a8

slide-3
SLIDE 3

WCET Analysis

Many different execution times Non-trivial analysis of (in)feasible paths Complex modeling of task timing on hardware

3

BCET WCET

t frequency

WCET Bound

slide-4
SLIDE 4

Task Timing Goals

Prioritized goals:

  • 1. Temporal predictability / stability first
  • 2. Performance second

4

➭ Strategy: Get the overall timing constant:

  • Instruction padding
  • Delay termination until end of WCET-bound

time budget

  • Single-path code transformation
slide-5
SLIDE 5

Instruction Padding

Idea: add NOPs to make execution times of alternatives with input-dependent conditions equal

5

if cond if cond alt1 NOPs alt2 alt1 alt2

slide-6
SLIDE 6

Instruction Padding

Padding of input- dependent loops

6

MAX cnt1+cnt2=MAX cnt1 cnt2

NOPs

min min

slide-7
SLIDE 7

Instruction Padding and Cache

Duration of instruction fetch (cache hit vs. miss) is variable because it depends on the execution history à we cannot remove execution-time variations from branching code

7

A B

Example: loop with instructions A and B, executing two iterations A B A A A A ... cache miss ... cache hit

t t

B A

t

B B

t

slide-8
SLIDE 8

Instruction Padding

Applicable to simple architectures: execution times

  • f instructions do not depend on execution state
  • WCET bound of transformed code ≈ original WCET bound
  • Instruction padding increases code size

? Why not use a delay to make sure every

execution consumes the same time budget, equalling the computed WCET bound?

8

slide-9
SLIDE 9

Constant Exec. Time Using a Delay

Strategy:

  • 1. Def: task time budget = computed WCET bound
  • 2. Insert delay(until end of time budget) at the end of the

task (or at some points in between)

Problem: bad resource utilization due to

  • Pessimism in path analysis (all architectures)
  • Pessimism in hardware modelling (complex arch.)

ð

Full flavour of WCET analysis problems ...

9

slide-10
SLIDE 10

Time-Predictable Single-Path Code

10

Don‘t let the environment dictate

  • Sequence of actions
  • Durations of actions
slide-11
SLIDE 11

Take control decisions offline!!!

slide-12
SLIDE 12

Goal-oriented approach towards temp. predictability: Control sequencing of all actions instead of being controlled by the environment (data, interrupts)

12

Single-path code:

  • no input-data dependent branches
  • predicated execution (poss. with speculation)
  • control-flow orientation à data flow focus
slide-13
SLIDE 13

Remove Data Dependent Control Flow

  • Hardware with invariable timing
  • Single-path conversion of code

if cond res := expr1 res := expr2 P := cond (P) res := expr1 (not P) res := expr2

➭ Predicated execution

slide-14
SLIDE 14

Predicated or Guarded Execution

... refers to the conditional execution of an instruction based on the value of a boolean source operand, referred to as the predicate [Hsu et al. 1986] Predicated instructions: Unconditional fetch of instruction If predicate is true: normal execution of instruction If predicate is false: instruction does not modify the processor state

14

slide-15
SLIDE 15

Branching vs. Predicated Code

15

predlt Pi, rA, rB (Pi) swp rA, rB cmplt rA, rB bf skip swp rA, rB skip: if rA < rB then swap(rA, rB); Code example: Predicated code Branching code

slide-16
SLIDE 16

How to Generate Single-Path Code

Introduce the transformation in two steps:

  • 1. Transformation model: set of rules for the

transformation from branching code to predicated code (assuming full support for predicated execution)

  • 2. Implementation details: adaptation of single-path

code for execution on platforms with limited support for predication (partial predication, no predication at all)

16

slide-17
SLIDE 17

Single-Path Transformation Rules

Only constructs with input-data dependent control flow are transformed, the rest of the code remains unchanged à two steps:

  • 1. data-flow analysis: mark variables and conditional

constructs that are input dependent à result available through predicate ID(...)

  • 2. actual transformation of input-data dependent

constructs into predicated code

17

slide-18
SLIDE 18

Single-Path Transformation Rules

Recursive transformation function based on syntax tree: SP[[ p ]]σδ p … code construct to be transformed into single path σ … inherited precondition from previously transformed code

  • constructs. The initial value of the inherited precondition is

‘T’ (logical true). δ ... counter, used to generate variable names needed for the

  • transformation. The initial value of δ is zero.

18

slide-19
SLIDE 19

Single-Path Transformation Rules (1)

19

SP[[ S ]]σδ S simple statement: S if σ = T : if σ = F : (σ) S

  • therwise:

// no action // unconditional // predicated (guarded)

slide-20
SLIDE 20

Single-Path Transformation Rules (2)

20

sequence: S = S1; S2 SP[[ S1; S2 ]]σδ guardδ := σ; SP[[ S1 ]]〈guardδ〉〈δ+1〉 ; SP[[ S2 ]]〈guardδ〉〈δ+1〉

slide-21
SLIDE 21

Single-Path Transformation Rules (3)

21

alternative: S = if cond then S1 else S2 endif SP[[ if cond then S1 else S2 endif ]]σδ guardδ := cond; SP[[ S1 ]]〈σ ∧ guardδ〉〈δ+1〉; SP[[ S2 ]]〈σ ∧ ¬guardδ〉〈δ+1〉 if ID(cond):

  • therwise:

if cond then SP[[ S1 ]]σδ else SP[[ S2 ]]σδ endif

slide-22
SLIDE 22

Single-Path Transformation Rules (4)

22

loop: S = while cond max N times do S1 endwhile SP[[ while cond max N times do S1 endwhile ]]σδ endδ := F; // loop-body-disable flag for countδ := 1 to N do // “hardwired loop” SP[[ if ¬cond then endδ := T endif ]]σ〈δ+1〉 ; SP[[ if ¬endδ then S1 endif ]]σ〈δ+1〉 endfor if ID(cond):

slide-23
SLIDE 23

Single-Path Transformation Rules (5)

23

loop: S = while cond max N times do S1 endwhile while cond max N times do SP[[ S1 ]]σδ endwhile if ¬ID(cond): SP[[ while cond max N times do S1 endwhile ]]σδ

slide-24
SLIDE 24

Single-Path Transformation Rules (6)

24

procedure call: S = proc(act-pars) SP[[ proc(act-pars) ]]σδ proc(act-pars) proc-sip(σ, act-pars) if σ = T :

  • therwise:
slide-25
SLIDE 25

Single-Path Transformation Rules (7)

25

proc p-sip(precond-par, form-pars) SP[[ S ]]〈precond-par 〉〈0〉 end procedure definitions: proc p(form-pars) S end SP[[ proc p(form-pars) S end ]]σδ

slide-26
SLIDE 26

HW-Support for Predicated Execution

Predicate registers Instructions for manipulating predicates (define, set, clear, load, store) Predicated instructions

  • Support for full predication: execution of all instructions is

controlled by a predicates

  • Support for partial predication:

limited set of predicated instructions (e.g., conditional move, select, set, clear)

26

slide-27
SLIDE 27

Implications of Partial Predication

Speculative code execution

  • instructions that do not allow for predication are executed

unconditionally, and

  • the results are stored in temporary variables;
  • subsequently, predicates determine which values of temporary

variables are further used

Cave: speculative instructions must not raise exceptions! (e.g., div. by zero, referencing an invalid memory address)

27

(pred) cmov dest, src1 (not pred) cmov dest, src2 Example:

slide-28
SLIDE 28

Fully vs. Partially Predicated Code

28

(Pred) div dest, src1, src2 if src2 ≠ ¡0 then dest := src1/ src2; Original code: Pred := (src2 ≠ 0) Fully predicated code:

slide-29
SLIDE 29

Fully vs. Partially Predicated Code (2)

29

if src2 ≠ ¡0 then dest := src1/ src2; Original code: Partially predicated code, first attempt: div tmp_dst, src1, src2 (Pred) cmov dest, tmp_dst Pred := (src2 ≠ 0) may raise an exception

  • n division

by zero

slide-30
SLIDE 30

Fully vs. Partially Predicated Code (3)

30

if src2 ≠ ¡0 then dest := src1/ src2; Original code: Partially predicated code: Pred := (src2 ≠ 0) mov tmp_src, src2 (not Pred) cmov tmp_src, $safe_val div tmp_dst, src1, tmp_src (Pred) cmov dest, tmp_dst if src2 equals 0, then replace it by a safe value (e.g., 1) to avoid division by zero

slide-31
SLIDE 31

“Minimal” Predicated-Exec. Support

Conditional Move instruction: Semantics: if CC then destination := source else no operation

31

movCC destination, source

slide-32
SLIDE 32

If-conversion with conditional move

32

t1 := expr1 ’ t2 := expr2 ’ test cond movT res, t1 movF res, t2 avoid side effects!

if cond res := expr1 res := expr2

slide-33
SLIDE 33

Emulation of conditional move

In architectures without predicate support, conditional moves can be emulated with bit-mask operations

33

if (cond) x=y; else x=z; Example: t0 = 0 – cond; // fat bool: 0..false, -1..true t1 = ~t0; // bitwise negation (fat bool) t2 = t0 & y; t3 = t1 & z; x = t2 | t3;

assumption: the types of all values have the same size

slide-34
SLIDE 34

Example

34

for(i=SIZE-1; i>0; i--) { for(j=1; j<=i; j++) { if (a[j-1] > a[j]) { t = a[j]; a[j] = a[j-1]; a[j-1] = t; } } } for(i=SIZE-1; i>0; i--) { for(j=1; j<=i; j++) { t1 = a[j-1]; t2 = a[j]; (t1>t2): t = a[j]; (t1>t2): a[j] = a[j-1]; (t1>t2): a[j-1] = t; } } Bubble sort: input array a[SIZE]

slide-35
SLIDE 35

Example

35

for(i=SIZE-1; i>0; i--) { for(j=1; j<=i; j++) { if (a[j-1] > a[j]) { t = a[j]; a[j] = a[j-1]; a[j-1] = t; } } } for(i=SIZE-1; i>0; i--) { for(j=1; j<=i; j++) { cond = (a[j-1] > a[j]); (cond): t = a[j]; (cond): a[j] = a[j-1]; (cond): a[j-1] = t; } } Bubble sort: input array a[SIZE]

slide-36
SLIDE 36

Example

36

for(i=SIZE-1; i>0; i--) { for(j=1; j<=i; j++) { if (a[j-1] > a[j]) { t = a[j]; a[j] = a[j-1]; a[j-1] = t; } } } for(i=SIZE-1; i>0; i--) { for(j=1; j<=i; j++) { t1 = a[j-1]; t2 = a[j]; test (t1>t2); movt: a[ j-1] = t2; movt: a[ j] = t1; } } Bubble sort: input array a[SIZE]

slide-37
SLIDE 37

Single-Path Properties

Every execution has the same instruction trace, i.e., the same sequence of references to instruction memory Path analysis is trivial – there is only one path Two executions starting from the same instruction-cache state have identical hit/miss sequences on accesses to instruction memory

37

slide-38
SLIDE 38

Single-Path and Timing

Every execution uses the same sequence (and thus number) of instructions à good basis for obtaining invariable timing variable, data-dependent instruction execution times cause execution-time jitter starting from a different memory state may cause different access times to instruction and data memory, and thus variable execution times

38

slide-39
SLIDE 39

Enforcing Invariable Timing

➭ Always start from the same state of instruction cache,

pipeline, branch prediction logic, etc.

➭ Enforce invariable access times for data objects ➭ Invariable durations of all processor operations ➭ All interference must be predictable (preemptions)

39

Don‘t let the environment dictate

  • Sequence of actions
  • Durations of actions
slide-40
SLIDE 40

Invariable Duration of Operations

Processor operations have to be implemented such that they execute in constant time, i.e., independent of operand values (e.g., shift, mul, div) In particular, predicated instructions need to execute in constant time à if predicate is false: allow instruction to execute, but disallow changes of the processor state in the write-back stag ARM7 experiment: use strCC-strNCC pairs of store operations to obtain constant time despite variable strCC timing

40

slide-41
SLIDE 41

Performance of Single-Path Code

Execution times of input-dependent alternatives sum up due to serialization

ð Execution times of single-path code are long if the control

flow of its source is strongly input dependent

41

B C A D E A B C D E

slide-42
SLIDE 42

Performance of Single-Path Code (2)

CPUs with deep pipelines need a number of cycles to re-fill the pipeline after a (mis-predicted) branch

ð predicated execution can be cheaper than jumping ð this is where modern compilers/processors use

predicated execution to improve performance

42

slide-43
SLIDE 43

Example: Speedup by if-conversion

43

predlt Pi, rA, rB (Pi) swp rA, rB cmplt rA, rB bf skip swp rA, rB skip: if rA < rB then swap(rA, rB); Predicated code Branching code 5 cycles 6 cycles 4 cycles

IF DE EX IF DE EX

Execution in three-stage pipeline

slide-44
SLIDE 44

Transformation Properties

Completeness: every piece of code with boundable WCET can be transformed Transformed code has a single path WCET analysis is trivial: execute and measure WCET analysis yields exact WCET Execution times are long (if we are not careful)

44

slide-45
SLIDE 45

Execution Times

45

t # t #

Code execution times before and after single-path transformation

slide-46
SLIDE 46

Summary

Because of the complexity of WCET analysis we looked for methods to generate time-predictable code Instruction padding: suitable for simple architectures Single-path conversion for general case

  • Cave: execution times may increase significantly

ð Use adequate coding techniques

46