Last Time
Response time analysis Blocking terms Priority inversion
And solutions
Release jitter Release jitter Other extensions
Last Time Response time analysis Blocking terms Priority inversion - - PowerPoint PPT Presentation
Last Time Response time analysis Blocking terms Priority inversion And solutions Release jitter Release jitter Other extensions Today Timing analysis Answers a question we commonly ask: At most long can this
Response time analysis Blocking terms Priority inversion
And solutions
Release jitter Release jitter Other extensions
Timing analysis
Answers a question we commonly ask:
Response time over CAN
Worst-case message times Holistic scheduling
Worst case execution time (WCET): Longest
execution time of a program on a given platform, considering all possible inputs
Precise timing analysis problem: Compute WCET
Trivially reduces to halting problem
Timing analysis problem: Compute a conservative
estimate of the WCET
I.e., estimate of WCET can be > true WCET This is decidable
WCET is often estimated by looking for the
maximum execution time over many executions
This is easy However, it does not solve the problem!
True ions WCET estimate True WCET Longest
ET #2 Execution time Number of executio Longest
ET #1
Always true:
Longest observed ET true WCET WCET estimate
Question: What is the requirement for correctly
estimating WCET using testing?
Static timing analysis: Estimate WCET without
running a program
Problem 1: Can’t do this from source code
Which variables go into registers? Which functions are inlined? Which switches become jump tables vs. cascaded tests?
Solution: Analyze compiler output Problem 2: Understanding what’s going on in HW
Where are the branch mispredicts? Where are the icache / dcache misses?
Solution: Build model of the hardware
int foo1 (int a, int b) { int c = b + 31*a; int e = 120 - c - a; return e; } link a6,#0 link a6,#0 move.l 8(a6),d0 moveq #-32,d1 muls.l d1,d0 sub.l 12(a6),d0 addi.l #120,d0 unlk a6 rts
What does it take to estimate WCET of this code?
void foo2 (int a) { if (a) { x += 3*a; } else { y -= x-a; }} link a6,#0 move.l 8(a6),d2 tst.l d2 tst.l d2 beq.s *+16 moveq #3,d0 muls.l d0,d2 add.l d2,_x bra.s *+24 move.l _x,d1 sub.l d2,d1 move.l _y,d0 sub.l d1,d0 move.l d0,_y unlk a6 rts
void foo3 (int a) { do { y++; } while (a--); } link a6,#0 move.l 8(a6),d2 move.l 8(a6),d2 move.l _y,d1 move.l d2,d0 addq.l #1,d1 subq.l #1,d2 tst.l d0 bne.s *-8 move.l d1,_y unlk a6 rts
1.
Programmer annotates loops with bounds
Not very fun Doesn’t work well for library code However, could argue that in critical software programmer
should always know the loop bounds
2.
Analyzer tries to figure out loop bounds
2.
Analyzer tries to figure out loop bounds
Doesn’t always work Derived bound might be too high
Reasonable answer: Analyzer figures out simple
loops, programmer annotates difficult ones
void foo4 (void) { if (y==0) { if (x>5) { x++;
if y == 0 foo4 max(7,13) + 2 = 15 15+3 = 18
x++; } else { x = 1; } } else { x *= 3; } }
if x>5 x++ x = 1 x=1 Return Return Return 3 3 3 3+2 = 5 3+1 = 4 3+10 = 13 max(5,4) + 2 = 7
Timing models for complex processors are difficult
to create
Probably impossible for processors like Pentium 4
However, easy for ColdFire, ARM, and lots of others
Caches, TLBs, branch predictors enormously Caches, TLBs, branch predictors enormously
complicate WCET analysis
Need to estimate cache, TLB, predictor state at every
program point
Difficult, imprecise, and computationally expensive
Pointers and heap allocation are very difficult to
analyze
But critical software typically doesn’t do much of these
Start with some simple code: Measure time per loop iteration for k = 1..32
aiT from Absint
MPC5xx
Analysis steps:
instructions accessing memory
cache misses or hits
path of the program
Avoid recursion Avoid deeply nested loops Avoid if/else where one branch is a lot faster than
the other
Avoid data-dependent loops Avoid data-dependent loops
Use fixed iteration count whenever possible
Avoid variable-time data structures
E.g. hash tables are usually very unpredictable
Avoid unpredictable thread blocking
E.g. on disk, network, etc.
Avoid unpredictable processors
This is any processor that is much faster than its memory
WCET estimation for simple hardware + simple
software
Largely a solved problem Technology far less mature than e.g. compiler technology
WCET estimation for complex hardware + complex
software software
Open problem May not be possible May not even be a good idea
Products exist What kind of chip should one use for a high-
performance, time-critical embedded system?
Basic idea:
Processors scheduled using priorities We know WCET of tasks CAN scheduled using priorities We know WCTT of messages Can put it all together using “holistic scheduling” Can put it all together using “holistic scheduling”
Why do we care?
Accelerometer on your pickup is on CAN bus Airbags are also on CAN bus Want to guarantee airbag deployment within 150 ms of when
you start to roll the truck
Recall:
CAN message stores 0-64 bits of data 47 bits of message overhead Bit stuffing occurs – in worst case every 5 bits has a 6th
added
34 overhead bits stuffed 34 overhead bits stuffed
For an n-byte message:
WC number of stuff bits = floor ((34 + 8n -1)/4)
bit i i i
Priority of a message determined by message type Message must have minimum period Ti Blocking term Bi = 135bit Release jitter Ji equal to queuing delay
Usually equal to worst-case response time of the task the Usually equal to worst-case response time of the task the
queues the message
Now we just reuse the processor scheduling
equation!
∀
) (i hp j j j i i i i i
How to compute minimum queuing time of a
message?
Could be clever, but 0 is safe
How to compute maximum queuing time of a
message?
Equal to worst-case response time of task that queues the
message
How to compute release jitter of a task that awaits a
message i?
Jdest(i) = Ri – 47bit
This is nice but there’s a problem…
Circular dependency between task and message jitters
Solution to circular dependencies:
Start out with all jitters set to zero Iterate between processor and network scheduling until
convergence
This is the same trick we used to solve the dependency of
task response time on itself last lecture task response time on itself last lecture
Finally: We have worst-case response time for every
task and message
…and we can figure out if the airbag deploys on time
What if response times are too long?
Can fiddle with message priorities Finding an optimal priority ordering (that minimizes global
response times) is NP-hard
Can reason about an entire network of processors
plus their network using holistic scheduling
Pretty cool result This is used in practice