Modular WCET Analysis of ARM Processors Andreas Engelbredt Dalsgaard - - PowerPoint PPT Presentation

modular wcet analysis of arm processors
SMART_READER_LITE
LIVE PREVIEW

Modular WCET Analysis of ARM Processors Andreas Engelbredt Dalsgaard - - PowerPoint PPT Presentation

Modular WCET Analysis of ARM Processors Andreas Engelbredt Dalsgaard Mads Christian Olesen Martin Toft Ren e Rydhof Hansen Kim Guldstrand Larsen Introduction Challenges Tool-Chain Value Analysis Demo Conclusion The Problem Problem


slide-1
SLIDE 1

Modular WCET Analysis of ARM Processors

Andreas Engelbredt Dalsgaard Mads Christian Olesen Martin Toft Ren´ e Rydhof Hansen Kim Guldstrand Larsen

slide-2
SLIDE 2

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

The Problem

Problem

Given a program in executable form, for an ARM9 processor, determine a safe and tight worst-case execution time (WCET) Goals: Model the pipeline and cache(s) of the ARM9 in a precise manner Make the model modular, such that other ARM9 processors can easily be modelled

1/20

slide-3
SLIDE 3

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Real-Time Systems

Real-time systems (RTS) are systems that need to respond to real-life events in a timely manner A number of processes with associated WCETs and deadlines Tasks are periodic or sporadic

2/20

slide-4
SLIDE 4

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

WCET Distribution

Estimates should be on the safe side! However, too much on the safe side ⇒ inefficient system

3/20

slide-5
SLIDE 5

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Challenge I: Modern Processors

Modern processors optimise for the average case, using: Caching: allowing quick access to recently used memory items Pipelining: executing instructions in parallel

Fetch Fetch instruction from instruction cache

  • r main memory

ARM decode Thumb decode

  • Reg. address

decode

  • Reg. address

decode Register read Register read Decode Writeback Memory Memory data access Execute ALU Shifter load data writeback ALU result and/or

4/20

slide-6
SLIDE 6

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Can we be ignorant?

No! Some processors have “timing anomalies”, i.e. local worst-case ⇒ global worst-case Even without “timing anomalies” assuming the local worst-case can give an over-approximation by a factor 30 The ARM9 processor does not exhibit “timing anomalies” Quicker analysis, less overapproximation Processors without “timing anomalies” are sufficient for most real-time systems

5/20

slide-7
SLIDE 7

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Challenge II: Making the Analysis Modular

A modular analysis allows more flexibility, e.g. how would this program perform: . . . if the cache was larger? . . . with an extra processor core? . . . on an entirely different processor? And different accuracy/performance tradeoffs: Abstract interpretation for (abstract) cache analysis Model checking for (concrete) cache analysis Use simple always-miss cache, if no need to do more precise analysis

6/20

slide-8
SLIDE 8

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-9
SLIDE 9

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-10
SLIDE 10

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-11
SLIDE 11

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-12
SLIDE 12

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-13
SLIDE 13

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-14
SLIDE 14

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Tool-Chain Overview

7/20

slide-15
SLIDE 15

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Overview of Our Model

8/20

slide-16
SLIDE 16

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Path Analysis

Timed automaton for every function Transitions emulate instruction execution 0x00 cmp r0, 1 0x04 push lr ... ... 0x50 bx lr

fib_branch? fib_branch! fetch! fetch! i0x0_cmp_r0_1 MORE FUNCTION BODY instradr[PFS] = 0, instrtype[PFS] = INSTR_OTHER, dataadr[PFS] = INVALID_ADDRESS, ... ... i0x50_bx_lr loop_counter_1 = 0 i0x4_push_lr_

Functions handled flow-sensitively

9/20

slide-17
SLIDE 17

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Cache Analysis

ARM9: Separate data and instruction caches

16 kB in size, 64-way associative, 8 words (32 byte) per line Write-through and write-back policies Pseudo-random and round-robin replacement policies

Modelled concretely as timed automata in UPPAAL

Cache Memory Main Memory l1, way 1 l2, way 2 l3, way 1 l4, way 2 l5, way 1 l6, way 2 l7, way 1 l8, way 2 m1 m2 m3 m4 m5 m6

}

Cache set 1 Cache set 2 Cache set 3 Cache set 4

} } }

10/20

slide-18
SLIDE 18

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Value Analysis

The cache analysis needs concrete memory addresses Registers are used as base and offset in all memory accesses Value analysis: Find an over-approximation of possible register values at all execution points of a process Weighted push-down systems (WPDSs) used for inter-procedural, control-flow sensitive value analysis Presented by Reps et al. in Program Analysis using Weighted Push-Down Systems

11/20

slide-19
SLIDE 19

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Weighted Push-Down Systems

Use the PDS-stack as call-stack: Sequential: p, nmain ֒ → p, n2 Function call: p, n4 ֒ → p, n8n5 Function return: p, n12 ֒ → p, ǫ Each rule has an associated weight, describing the effect of the transition. Weights can be: Combined (“join”): w1 ⊕ w2 = w3 Extended (sequential progression): w1 ⊗ w2 = w3 The effect of executing a program to a set of configurations (T) (“Meet

  • ver all paths”):

{w1 ⊗ . . . ⊗ wn|w1, . . . , wn is the weights associated with a path of rules leading to a configuration in T }

12/20

slide-20
SLIDE 20

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Our Value Analysis

Implemented simple value analysis, using: Loop unrolling Simple (syntactical) register-value tracking No tracking of values in memory Finds good amount of values for some programs, but could be much better

13/20

slide-21
SLIDE 21

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Our Value Analysis

Weights = functions representing the effect of an instruction or a sequence of instructions, e.g.: w1 r0 r1

  • =

“r1 + 2” id

  • ,

w2 r0 r1

  • =
  • id

“r0 ∗ 2 + r1<<3”

  • Special values: id, ⊥ and ⊤

Combine and extend handled syntactically (string equality, and string replacement)

14/20

slide-22
SLIDE 22

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Implementation = WALi + Python

The open source Weighted Automata Library (WALi) implements a number of WPDS algorithms Easy to extend with e.g. new weight domains Our weights are, very conveniently, valid Python expressions Process automata are annotated with the results

fetch! i0x8330_push_lr_ ... dataadr[PFS] = (loop_counter_33652 == 0) ? 127992 : INVALID_ADDRESS, ...

15/20

slide-23
SLIDE 23

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Disassembler — Dissy

16/20

slide-24
SLIDE 24

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

WCET Guarantee in Three Easy Steps

Demo

17/20

slide-25
SLIDE 25

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Experiments

Evaluated on the M¨ alardalen WCET benchmarks

18/20

slide-26
SLIDE 26

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Experiments

Evaluated on the M¨ alardalen WCET benchmarks The most interesting findings:

Taking the instruction cache into account yields WCETs that are up to 97% sharper (78% on average at -O2) Taking the data cache into account yields WCETs that are up to 68% sharper (31% on average at -O2) Almost all results are obtained within five minutes 18/20

slide-27
SLIDE 27

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Experiments

Evaluated on the M¨ alardalen WCET benchmarks The most interesting findings:

Taking the instruction cache into account yields WCETs that are up to 97% sharper (78% on average at -O2) Taking the data cache into account yields WCETs that are up to 68% sharper (31% on average at -O2) Almost all results are obtained within five minutes

Some programs fail due to

State space explosion (6) Write to program counter (2) Floating point operations 1 Value analysis problems

1need to manually find good loop-bounds for very optimised assembler

18/20

slide-28
SLIDE 28

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Experiments

Evaluated on the M¨ alardalen WCET benchmarks The most interesting findings:

Taking the instruction cache into account yields WCETs that are up to 97% sharper (78% on average at -O2) Taking the data cache into account yields WCETs that are up to 68% sharper (31% on average at -O2) Almost all results are obtained within five minutes

Some programs fail due to

State space explosion (6) Write to program counter (2) Floating point operations 1 Value analysis problems

We are able to analyse 17 out of the 25 non-floating point benchmarks!

1need to manually find good loop-bounds for very optimised assembler

18/20

slide-29
SLIDE 29

Introduction Challenges Tool-Chain Value Analysis Demo Conclusion

Conclusion - Future Work

Prototype implementation successful UPPAAL provides a good framework for modularising the models The analysis times seem acceptable Better path analysis Precise value analysis essential for tight bounds (work in progress)

Modelling the stack Modelling memory regions

Support other typical embedded processors:

ARM7 (3-stage pipeline, cache) Atmel AVR 8bit (3-stage pipeline, no cache, 1-2 cycle instructions) H8/300 (old Lego Mindstorms)

Modelling the cache abstractly

19/20

slide-30
SLIDE 30

Our master’s thesis, the accompanying source code, and these slides are available at http://metamoc.martintoft.dk

Thank you for your attention!

slide-31
SLIDE 31

1

Introduction The Problem Real-Time Systems WCET Distribution

2

Challenges Challenge I: Modern Processors Can we be ignorant? Challenge II: Making the Analysis Modular

3

Tool-Chain Tool-Chain Overview Overview of Our Model Path Analysis Cache Analysis

4

Value Analysis Value Analysis Weighted Push-Down Systems Our Value Analysis

slide-32
SLIDE 32

Implementation = WALi + Python Disassembler — Dissy

5

Demo WCET Guarantee in Three Easy Steps

6

Conclusion Experiments Conclusion - Future Work

20/20