Target-Specific Compiler Optimizations Oliver Bringmann RESEARCH ON - PowerPoint PPT Presentation

FZI Forschungszentrum Informatik at the University of Karlsruhe Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations Oliver Bringmann RESEARCH ON YOUR BEHALF 1

Outline  Embedded Software – Challenges  TLM2 Platform Modeling  Source-Level Timing Instrumentation  Consideration of Compiler Optimizations  Experimental Results 2

Trend Towards Multi-Core Embedded Systems Example: Automotive Domain • Transition from passive to active safety • Active systems: Innovation by interaction of ECUs, added-value by synergetic networking • Multi-sensor data fusion and image recognition for automated situation interpretation in proactive cars Well-tailored Embedded Platforms • Increasing computation and energy requirements • Distributed embedded platforms with energy- efficient multi-core embedded processors Challenges • Early verification of global safety and timing requirements • Consideration of the actual software implementation w.r.t. the underlying hardware • Scalable verification methodology for multi-core & distributed embedded systems 3

Platform Composition CPU IP RAM AXI APB CPU I/O Component Component IP-XACT Platform Comm. Proc. Characteristics Models (UML) (C, MATLAB, UML) Exploration and Model Analysis Transformation Refinement VP Generation VP Exploration and Virtual Prototype Analysis Refinement Refinement  Modeling techniques providing a holistic system  Derivation of an optimized network architecture  Generation of abstract executable models (virtual prototypes) 4

TLM Timing and Platform Model Abstractions Timing abstractions  Untimed (UT) Modeling • notion of simulation time is not required, each process runs up to the next explicit synchronization point before yielding  Loosely Timed (LT) Modeling • Simulation time is used, but processes are temporally decoupled from simulation time until it reaches an explicit synchronization point  Approximately Timed (AT) Modeling • Processes run in lock-step with SystemC simulation time. Annotated delays are implemented using timeouts (wait) or timed event notifications Platform model abstractions CP = Communicating Processes; parallel processes with parallel point-to-point communication CPT = Communicating Processes + Timing PV = Programmers View; scheduled SW-computation and/or scheduled communication PVT = Programmers View + Timing CC = Cycle Callable; cycle count accurate timing behavior for computation and communication 5

SystemC TLM 2.0 Loosely Timed Modeling Style   SystemC (lock-step sync.) SystemC + TLM 2.0 Loosely Timed Modeling Style (LT) … … local_offset += sc_time(1, SC_MS); wait (1, SC_MS); … … local_offset += sc_time(1, SC_MS); wait (1, SC_MS); do_communication(local_offset); do_communication(); local_offset += sc_time(1, SC_MS); wait (1, SC_MS); if (local_offset >= local_quantum) { … wait (local_offset); local_offset = SC_ZERO_TIME; } … Time Quantum Thread 1 Thread 1 Thread 2 Thread 2 advance simulation time 6

Inaccuracies induced by Temporal Decoupling Core 1 • parallel accesses to shared resources (cache, bus) • conflicts may delay concurrent accesses Resource • temporally decoupled simulation (LT) Core 2 Time Quantum t=0 t=1 t=2 t=0 t=1 Simulation: Core 1 Core 1 Reality: Core 2 t=0 Core 2 t=0 • higher priority access simulated after lower priority access  preemption not detected • explicit synchronization entails severe performance penalty • alternative approach: early completion with retro-active adjustments Core 1 Resource Cache Model Core 2 7

Conflict Resolution in TLM Platforms  TLM+ Resource Model • access arbitration for each relevant simulation step despite temporal decoupling • delayed activation of a core’s simulation thread upon conflict • arbitration induces no additional context switches in SystemC simulation kernel • based on SystemC TLM-2.0 (downward compatible)  Universal approach for fast and accurate TLM simulation • Arbitration using „Resource Model“ shared by all users of a resource • synchronization of bus accesses • simulation of parallel RTOS software tasks Task 1 OS Core 1 Task 2 Bus Task 3 Core 2 8

Simulation-Based Timing Analysis Interpretation of Binary Code Software Simulation Software as Binary System Model SW SW SW SW HW interpreted during simulation HW HW SW SW HW Hardware Model HW SW HW SW HW HW HW HW • • Software and hardware model separated Common system model for SW and HW • • Independent Compilation Combined compilation of HW and SW • • HW: RTL model or instruction set simulator High simulation speed • • Software timing induced by hardware Problem: Precise timing analysis is model difficult at source-code level • Problem: Long simulation time 9

Source-Level Timing Instrumentation Goal int f( int a, int b, int c, int d ) • Static timing prediction of basic { blocks with dynamic error correction int res; res = (a + b) << c – d; Proposed Approach Back-annotation delay( 3, ms ); • Compilation into binary code return res; Compilation enriched with debugging information } • Static execution time analysis with respect to architectural details 3 ms 00000000 <f>: (e.g. pipeline mode, cache model, …) <f+0>: add %o0, %o1, %g1 • Back-annotation of the analyzed timing information into the original <f+4>: sub %o2, %o3, %o1 C/C++ source code <f+8>: retl <f+C>: sll %g1, %o1, %o0 Advantages Important: • Consideration of architectural details • Requires accurate relation between • Efficient compilation onto simulation source code and binary code host • Run-Time Models for Branch Prediction • Considering the influences of dynamic and Caching have to be incorporated timing effects 10

Combined Source-Level Simulation and Target Code Analysis: State of the Art  Schnerr, Bringmann et al. [DAC 2008] • static pipeline analysis to obtain basic block execution times • instrumentation code to determine cache misses dynamically • no compiler optimizations  Wang, Herkersdorf [DAC 2009]; Bouchhima et al. [ASP-DAC 2009]; Gao, Leupers et al. [CODES+ISSS 2009] • use modified compiler backend to emit annotated „source code“ • supports compiler optimizations as binary code and annotated source have same structure  Lin, Lo, Tsay [ASP-DAC 2010] • very similar to approach of [DAC2008] • claims to support compiler optimizations, no details  Castillo, Villar et al. [GLSVLSI 2010] • improves cache simulation method of [DAC2008] • supports compiler optimizations without control flow changes 11

Timing Instrumentation and Platform Integration Cycle Calculation Functions • Use an architectural model of the processor for the cycle calculation Architectural Model C code corresponding to a basic block Cache C code corresponding to the cache Model analysis blocks of the basic block Function delay Update Branch delay( statically predicted number of cycle s); Prediction Model • Is used for delay(cycleCalculationICache( iStart,iEnd )); Adjust fine granular delay(cycleCalculationForConditionalBranch()); accumulation of time synchronization CPU CPU consume( cycles collected with delay ); Sync e.g. I/O access Bus Function consume CPU I/O • VP synchronization Virtual Hardware with respect to accumulated delays Usage of the Loosely-Timed (LT) Modeling Approach 12

Compiler Optimizations and the Relation between Source Code and Binary Code  Dead Code Elimination • binary-level control flow gets simpler • no real problem for back-annotation  Moving Code (e.g. Loop Invariant Code Motion) • not necessarily modifies binary-level control flow • blurs relation between binary-level and source-level basic blocks  Loop Unrolling • complete unrolling is simple (annotate delays in front of loop) • partial unrolling requires dynamic delay compensation  Function Inlining • may induce radical changes in control flow graph • introduces ambiguity as several binary-level basic reference identical source locations  Complex Loop Optimizations • basic block structure may change completely (Loop Unswitching) • execution frequency of basic blocks due to transformation of iteration space (Loop Skewing) 13

Effects of Compiler Optimizations Loop Invariant Function Code Loop Transformations Transformation Code Motion Inlining 14

Effects of Compiler Optimizations Loop Invariant Binary Code Function Loop Loop Unrolling Transformation Code Motion Generation Inlining 15

Using Debug Information to Relate Source Code and Optimized Binary Code Debug Information Source-Level Binary-Level CFG CFG • Compilers usually do not generate accurate debug information for optimized code • Structure of source code and binary code can be completely different  No 1:1 relation between source-level and binary-level basic blocks  Simply annotating delay attributes does not work  To perform an accurate source-level simulation without modifying the compiler • relation between source code and binary code must be reconstructed from debug information • binary-level control must be approximated during source-level simulation 16

Target-Specific Compiler Optimizations Oliver Bringmann RESEARCH ON - PowerPoint PPT Presentation

FZI Forschungszentrum Informatik at the University of Karlsruhe Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations Oliver Bringmann RESEARCH ON YOUR BEHALF 1 Outline Embedded Software

Concepts Introduced in Chapter 9 introduction to compiler optimizations basic blocks and

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

Compiler Construction Lecture 16: Introduction to optimizations 2020-03-03 Michael Engel

Khem Raj Embedded Linux Conference 2014, San Jose, CA } What is GCC } General Optimizations

Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com Hotspot User

What is a Compiler? A compiler translates a source specification into a target specification.

2 3 Motivations 4 Motivations 5 Motivations 6 Motivations 7 8 System Implementation and

Verifying Optimizations using SMT Solvers Nuno Lopes technology Why verify optimizations? from

Implementing Data Layout Optimizations Implementing Data Layout Optimizations in the LLVM

Memory Hierarchy Optimizations with Compilers/Software Jaejin Lee Advanced Compiler Research

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Compiler Development (CMPSC 401) Janyl Jumadinova January 17, 2018 Janyl Jumadinova Compiler

Principles of Compiler Design - The Brainf*ck Compiler - Clifford Wolf - www.clifford.at

Semaphores and Monitors: High-level Synchronization Constructs 1 Synchronization Constructs

Course Introduction What this course is about Hardware/Software interface: Compilers,

Outline Background 8271 discussion of: Transparent LBR-based approach ROP Exploit Mitigation

Fixed point lecture 2 encourage you to participate in studies such as these. Fixed point means

Syndrome-Coupled Rate-Compatible Error-Correcting Codes for Flash Memories Pengfei Huang 1 , Yi

Dynamically diagnosing type errors in unsafe code Stephen Kell stephen.kell@cl.cam.ac.uk

The Performance of -Kernel Based Systems Hermann Hrtig Michael Hohmuth Jochen Liedtke

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and