SLIDE 1 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 1
Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs
Boris Dreyer
dreyer@rs.tu-darmstadt.de
- Prof. Dr.-Ing. Christian Hochberger
Computer Systems Group Department of Electrical Engineering and Information Technology Technische Universität Darmstadt, Germany Boris Dreyer, Christian Hochberger, Alexander Lange, Simon Wegener and Alexander Weiss
SLIDE 2
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 2
This work was funded within the project CONIRAS by the German Federal Ministry for Education and Research with the funding ID 01IS13029. The responsibility for the content remains with the authors.
SLIDE 3 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 3
Agenda
– Measurement-based Execution Time Estimation – Program Flow Trace (PFT)
- Waypoint based Worst Case Execution Time Estimation
– Waypoint Graph – Context Model
– TACLeBench
SLIDE 4
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 4
Execution Time Estimation
Boris Dreyer, Christian Hochberger, Simon Wegener, and Alexander Weiss.
Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation.
In Francisco J. Cazorla, editor, 15th International Workshop on Worst-Case Execution Time Analysis (WCET 2015), volume 47 of OpenAccess Series in Informatics (OASIcs), pages 45-54, Dagstuhl, Germany, 2015. Schloss Dagstuhl—Leibniz-Zentrum für Informatik.
SLIDE 5 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 5
WCET Estimation – BB Approach
BB Address Max First Max Further 0x1004 8 us 0 us 0x2000 22 us 11 us 0x202c 16 us 8 us 0x2034 2 us 1 us 0x2050 12 us 8 us 0x205c 2 us 1 us 0x206c 8 us 4 us 0x207c 6 us 0 us 0x101c 6 us 0 us
CFG
statistics to host
Executable
analysis module
Application
FPGA-based Timing Analysis Target Processor
Basic Block (BB)
Context Sensitive BB Statistics
SLIDE 6 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 6
WCET Estimation – BB Approach
BB Address Max First Max Further 0x1004 8 us 0 us 0x2000 22 us 11 us 0x202c 16 us 8 us 0x2034 2 us 1 us 0x2050 12 us 8 us 0x205c 2 us 1 us 0x206c 8 us 4 us 0x207c 6 us 0 us 0x101c 6 us 0 us
Overall execution time estimate Our method: 191 us Context insensitive: 258 us
- 5. Annotate CFG
- 6. Find longest path
(ILP based)
Context Sensitive BB Statistics
SLIDE 7
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 7
Embedded Trace Units
ETU type Nexus 5001 ARM Coresight Implementation Traditional branch messages Branch history messages ETMv3 ETMv4 PFT Program Flow Observation Level Branch Branch Instruction Branch Branch Cycle count Yes No No Yes Yes Yes Applicable for hybrid WCET measurement Yes No No Yes Yes Yes ARM Cortex A
SLIDE 8
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 8
Basic Block Graph
CFG
SLIDE 9
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 9
CFG
Basic Block Graph
SLIDE 10
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 10
Maximization Equivalence
CFG WPG
Basic Block Graph vs. Waypoint Graph
Waypoint instruction Waypath
SLIDE 11 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 11
WCET Estimation Using Waypoint Graphs
Waypoint Graph
statistics to host
Executable
analysis module
Application
FPGA-based Timing Analysis Target Processor
Context Sensitive Waypath Statistics
Waypoint Graph
Waypath ID Max First Max Further 8 us 0 us 1 22 us 11 us 2 16 us 8 us 3 2 us 1 us 4 12 us 8 us 5 2 us 1 us 6 8 us 4 us 7 6 us 0 us 8 6 us 0 us
SLIDE 12 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 12
WCET Estimation Using Waypoint Graphs
Overall execution time estimate
- 5. Annotate
- 6. Find longest path
(ILP based)
Waypoint Graph
Context Sensitive Waypath Statistics
Waypath ID Max First Max Further 8 us 0 us 1 22 us 11 us 2 16 us 8 us 3 2 us 1 us 4 12 us 8 us 5 2 us 1 us 6 8 us 4 us 7 6 us 0 us 8 6 us 0 us
SLIDE 13 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 13
Execution Time Estimation - Architecture
Loop Automata Cluster Min, Max, Avg. Statistics WPG Generation (CFG Analysis) Executable
- 1. Offline Pre-processing
- 2. Context Sensitive
Statistics
Timing Analysis Timing Report
- 3. Offline Post-processing
SLIDE 14
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 14
Loop Automata Cluster
Innermost Loop Selector
Waypath event
Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton
Loop Automata Cluster
Loop Statistics
Loop context Models one loop
Tracepath Statistics
Loop bounds Statistics Module Waypath ID, Cycles
Instruction Reconstruction
SLIDE 15
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 15
Loop Automata
Loop Automaton Loop Iteration Context FSM Loop Iteration Counter FSM Loop Event Generator
Waypath event Loop context Loop bounds Waypath ID, Cycles
Instruction Reconstruction Loop Statistics Tracepath Statistics
Statistics Module
SLIDE 16
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 16
Loop Event Generator
Comparator Tree Set
SLIDE 17 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 17
Evaluation Setting
- Xilinx Zynq XC7Z020-1CLG484C
– Dual-core ARM Cortex-A9 (666 MHz) – 32 kilobytes of L1 cache – 512 kilobytes of L2 cache (disabled) – SRAM data memory
- DDR3 instruction memory (533 MHz)
- TACLeBench benchmark collection
– Executing each benchmark ten times – With and without L1 instruction cache enabled
SLIDE 18 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 18
Context-Insensitive Overestimation (Ratio)
1 2 3 4 5 6 7 500 1000 1500 2000 2500 3000
Context-insensitive runtime estimation / End-to-End runtime L1 disabled L1 enabled Average: 2,20 Average: 2,76
*trace buffer overflow
SLIDE 19 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 19
Context-Sensitive Overestimation (Ratio)
0,5 1 1,5 2 2,5 3 3,5 4 4,5 500 1000 1500 2000 2500 3000
Context-sensitive runtime estimation / End-to-End runtime (L1 enabled) Context-insensitivity overhead (L1 enabled)
Average: 2,02
SLIDE 20 05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 20
Conclusion
Continuous
- We perform direct online aggregation at runtime.
Non-intrusive
- We use the hardware support of modern SoCs.
Hybrid WCET Estimation Using Waypoint Graphs
- We measure waypath execution times online.
- We estimate the overall runtime offmine.
SLIDE 21
05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 21
Thank you!