Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint - - PowerPoint PPT Presentation

continuous non intrusive hybrid wcet estimation using
SMART_READER_LITE
LIVE PREVIEW

Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint - - PowerPoint PPT Presentation

Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs Boris Dreyer, Christian Hochberger, Alexander Lange, Simon Wegener and Alexander Weiss Boris Dreyer dreyer@rs.tu-darmstadt.de Prof. Dr.-Ing. Christian Hochberger Computer


slide-1
SLIDE 1

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 1

Continuous Non-Intrusive Hybrid WCET Estimation Using Waypoint Graphs

Boris Dreyer

dreyer@rs.tu-darmstadt.de

  • Prof. Dr.-Ing. Christian Hochberger

Computer Systems Group Department of Electrical Engineering and Information Technology Technische Universität Darmstadt, Germany Boris Dreyer, Christian Hochberger, Alexander Lange, Simon Wegener and Alexander Weiss

slide-2
SLIDE 2

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 2

This work was funded within the project CONIRAS by the German Federal Ministry for Education and Research with the funding ID 01IS13029. The responsibility for the content remains with the authors.

slide-3
SLIDE 3

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 3

Agenda

  • Motivation

– Measurement-based Execution Time Estimation – Program Flow Trace (PFT)

  • Waypoint based Worst Case Execution Time Estimation

– Waypoint Graph – Context Model

  • Evaluation

– TACLeBench

  • Conclusion
slide-4
SLIDE 4

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 4

Execution Time Estimation

Boris Dreyer, Christian Hochberger, Simon Wegener, and Alexander Weiss.

Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation.

In Francisco J. Cazorla, editor, 15th International Workshop on Worst-Case Execution Time Analysis (WCET 2015), volume 47 of OpenAccess Series in Informatics (OASIcs), pages 45-54, Dagstuhl, Germany, 2015. Schloss Dagstuhl—Leibniz-Zentrum für Informatik.

slide-5
SLIDE 5

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 5

WCET Estimation – BB Approach

BB Address Max First Max Further 0x1004 8 us 0 us 0x2000 22 us 11 us 0x202c 16 us 8 us 0x2034 2 us 1 us 0x2050 12 us 8 us 0x205c 2 us 1 us 0x206c 8 us 4 us 0x207c 6 us 0 us 0x101c 6 us 0 us

  • 1. Reconstruct

CFG

  • 4. Transfer BB

statistics to host

Executable

  • 2. Adapt timing

analysis module

  • 3. Run

Application

FPGA-based Timing Analysis Target Processor

Basic Block (BB)

Context Sensitive BB Statistics

slide-6
SLIDE 6

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 6

WCET Estimation – BB Approach

BB Address Max First Max Further 0x1004 8 us 0 us 0x2000 22 us 11 us 0x202c 16 us 8 us 0x2034 2 us 1 us 0x2050 12 us 8 us 0x205c 2 us 1 us 0x206c 8 us 4 us 0x207c 6 us 0 us 0x101c 6 us 0 us

Overall execution time estimate Our method: 191 us Context insensitive: 258 us

  • 5. Annotate CFG
  • 6. Find longest path

(ILP based)

Context Sensitive BB Statistics

slide-7
SLIDE 7

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 7

Embedded Trace Units

ETU type Nexus 5001 ARM Coresight Implementation Traditional branch messages Branch history messages ETMv3 ETMv4 PFT Program Flow Observation Level Branch Branch Instruction Branch Branch Cycle count Yes No No Yes Yes Yes Applicable for hybrid WCET measurement Yes No No Yes Yes Yes ARM Cortex A

slide-8
SLIDE 8

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 8

Basic Block Graph

CFG

slide-9
SLIDE 9

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 9

CFG

Basic Block Graph

slide-10
SLIDE 10

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 10

Maximization Equivalence

CFG WPG

Basic Block Graph vs. Waypoint Graph

Waypoint instruction Waypath

slide-11
SLIDE 11

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 11

WCET Estimation Using Waypoint Graphs

  • 1. Reconstruct

Waypoint Graph

  • 4. Transfer waypoint

statistics to host

Executable

  • 2. Adapt timing

analysis module

  • 3. Run

Application

FPGA-based Timing Analysis Target Processor

Context Sensitive Waypath Statistics

Waypoint Graph

Waypath ID Max First Max Further 8 us 0 us 1 22 us 11 us 2 16 us 8 us 3 2 us 1 us 4 12 us 8 us 5 2 us 1 us 6 8 us 4 us 7 6 us 0 us 8 6 us 0 us

slide-12
SLIDE 12

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 12

WCET Estimation Using Waypoint Graphs

Overall execution time estimate

  • 5. Annotate
  • 6. Find longest path

(ILP based)

Waypoint Graph

Context Sensitive Waypath Statistics

Waypath ID Max First Max Further 8 us 0 us 1 22 us 11 us 2 16 us 8 us 3 2 us 1 us 4 12 us 8 us 5 2 us 1 us 6 8 us 4 us 7 6 us 0 us 8 6 us 0 us

slide-13
SLIDE 13

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 13

Execution Time Estimation - Architecture

Loop Automata Cluster Min, Max, Avg. Statistics WPG Generation (CFG Analysis) Executable

  • 1. Offline Pre-processing
  • 2. Context Sensitive

Statistics

Timing Analysis Timing Report

  • 3. Offline Post-processing
slide-14
SLIDE 14

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 14

Loop Automata Cluster

Innermost Loop Selector

Waypath event

Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton Loop Automaton

Loop Automata Cluster

Loop Statistics

Loop context Models one loop

Tracepath Statistics

Loop bounds Statistics Module Waypath ID, Cycles

Instruction Reconstruction

slide-15
SLIDE 15

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 15

Loop Automata

Loop Automaton Loop Iteration Context FSM Loop Iteration Counter FSM Loop Event Generator

Waypath event Loop context Loop bounds Waypath ID, Cycles

Instruction Reconstruction Loop Statistics Tracepath Statistics

Statistics Module

slide-16
SLIDE 16

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 16

Loop Event Generator

Comparator Tree Set

slide-17
SLIDE 17

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 17

Evaluation Setting

  • Xilinx Zynq XC7Z020-1CLG484C

– Dual-core ARM Cortex-A9 (666 MHz) – 32 kilobytes of L1 cache – 512 kilobytes of L2 cache (disabled) – SRAM data memory

  • DDR3 instruction memory (533 MHz)
  • TACLeBench benchmark collection

– Executing each benchmark ten times – With and without L1 instruction cache enabled

  • Xilinx SDK 2016.1
slide-18
SLIDE 18

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 18

Context-Insensitive Overestimation (Ratio)

1 2 3 4 5 6 7 500 1000 1500 2000 2500 3000

Context-insensitive runtime estimation / End-to-End runtime L1 disabled L1 enabled Average: 2,20 Average: 2,76

*trace buffer overflow

slide-19
SLIDE 19

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 19

Context-Sensitive Overestimation (Ratio)

0,5 1 1,5 2 2,5 3 3,5 4 4,5 500 1000 1500 2000 2500 3000

Context-sensitive runtime estimation / End-to-End runtime (L1 enabled) Context-insensitivity overhead (L1 enabled)

  • Avg. overhead: 6 %

Average: 2,02

slide-20
SLIDE 20

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 20

Conclusion

Continuous

  • We perform direct online aggregation at runtime.

Non-intrusive

  • We use the hardware support of modern SoCs.

Hybrid WCET Estimation Using Waypoint Graphs

  • We measure waypath execution times online.
  • We estimate the overall runtime offmine.
slide-21
SLIDE 21

05.07.16 TU Darmstadt | Computer Systems Group | Boris Dreyer 21

Thank you!