A Rapid Cache-aware Procedure Positioning Optimization to Favor Incremental Development
Enrico Mezzetti, Tullio Vardanega RTAS 2013
19th IEEE Real-Time and Embedded Technology and Applications Symposium Philadelphia, USA April 9 - 11, 2013
A Rapid Cache-aware Procedure Positioning Optimization to Favor - - PowerPoint PPT Presentation
A Rapid Cache-aware Procedure Positioning Optimization to Favor Incremental Development Enrico Mezzetti , Tullio Vardanega RTAS 2013 19 th IEEE Real-Time and Embedded Technology and Applications Symposium Philadelphia, USA April 9 - 11, 2013
19th IEEE Real-Time and Embedded Technology and Applications Symposium Philadelphia, USA April 9 - 11, 2013
1 The case for incremental development 2 Incremental procedure positioning 3 Evaluation 4 Conclusion
2 of 15
Natural incarnation of the divide-et-impera approach into hardware and software development To better master complexity and costs of industrial process
Relies on composability and early availability of timing bounds
Hindered by context-dependent hardware resources
Intra-task timing behaviour determined by memory layout Not robust to software increments
Only available on the final executable
3 of 15
Improves both performance and predictability
Granularity of procedures is industrially appealing
require specialized tool support
Reduces the potential jitter by pinpointing a memory layout
Weighted Call Graph (WCGP) for a program P is a (undirected) weighted graph with V = {p | p is a procedure in P} E ∈ V × V ={(p, p′) | p calls p′ ∨ p′ calls p} Wp,p′ → call frequency between p and p′ in P.
Nodes pairwise merged according to maxWpi,pj Induced procedure ordering actual memory layout
4 of 15
✗ Historically focused on average-case optimization
✗ Poorly scalable to large-scale industrial systems
iterations of static WCET analysis
✗ Only applicable at the tail end of development
✓ An alternative program representation, other than WCG
✓ An optimization method based on program structure
5 of 15
WCG representation may be ambiguous With negative consequences on the computed layout
Fails to account for the importance of loop nests
structural information
6 of 15
Procedure involved in the same loop are the most critical source of cache conflicts Need to explicitly consider loop nests
LCTP for a program P is an ordered directed tree with V = {p | p ∈ Proc(P)} ∪ { lp
i | lp i is the ith loop in p}
E ∈ V × V = {(p, p′)| p → p′} ∪ {(p, lp
i )} ∪ {(lp i , p′)| lp i → p′} ∪
{(lp
i , lp j )| loop lp j is nested inside lp i }
Blp
i → statically computed loop bound
7 of 15
Naturally exhibits loop-induced relation between procedures Subtrees can be ordered wrt depth and execution frequency
Post-order depth-first traversal
Procedures on the same subtree independent pools Incrementally merged together Pool independency broken by procedures appearing in different subtrees
8 of 15
placeholder 9 of 15
Select first nodes 9 of 15
Merge P and Q 9 of 15
Keep on merging 9 of 15
Q already in the pool... 9 of 15
...just remind it 9 of 15
[Merge S and T] 9 of 15
[Merge optionally with displacement] 9 of 15
[U does not fit in the gap] 9 of 15
[U fits in the gap] 9 of 15
Qualification status should be incrementally preserved
When it comes to caches
LCT intrinsically fit to incremental addition
Naturally absorbs changes that are local to a module
Problems arise with shared procedures
10 of 15
Targeting the LEON2 (SPARC V8) processor Focusing on reference and domain-specific benchmarks
alardalen, Mediabench, MiBench, AOCS software
11 of 15
[Average-case hit ratio] [Worst-case hit ratio] 12 of 15
Fairly proportional due to the relatively simple HW platform and setting (e.g., D-cache disabled)
[Global WCET reduction] 13 of 15
Modules from the AOCS benchmark (GNC, PRO, TMTC) Confirms constant WCET behavior for GNC
countermeasure is taken
[WCET variation across releases] 14 of 15
More accurate program representation Improves both avg and wc performance Robust against incremental development
Still need a better solution to handle regression in the presence
Iterative (but costly) WCET-oriented approaches may provide better WCET performance
Implement our approach as a plugin to standard GCC compiler Undergo an extensive evaluation of different ordering heuristics
15 of 15