Latency-preserving software pipelining of predicated reservation - PowerPoint PPT Presentation

Latency-preserving software pipelining of predicated reservation tables for distributed hard real-time applications Thomas Carle – Dumitru Potop-Butucaru INRIA Paris-Rocquencourt, FRANCE 14/12/11 1 Team AOSTE

Outline ● Throughput optimization problem ● Previous work (software pipelining) ● System models (pipelined and non-pipelined) ● Pipelining algorithms ● A complex example ● Conclusion and future work 14/12/11 2

Application areas Complex embedded control applications: ● Cyclic, periodic execution ● Safety-critical applications • Hard Real-Time constraints • Focus on functional and temporal correctness ● Distributed implementations 14/12/11 3

Static scheduling Our work focuses on static schedules: scheduling/reservation tables Validated by industrial standards: ARINC 653, AUTOSAR, FlexRay, ... Defines one cycle of execution, repeated periodically 14/12/11 4

Motivating example Resource P1 P2 P3 f 0 Computation if(c) g 1 cycle if (¬ c) m h 2 Time Code: g: 1;m: 2 h: 1 f: 1 v 2 :=v 2 _init; loop P1 P2 P3 (v 1 ,c)=f(v 2 ); if c then v 2 :=g(v 1 ) else m(v 1 ); RAM c,v 1 ,v 2 h(v 2 ); 14/12/11 5 end

Motivating example Resource P1 P2 P3 f 0 Computation End to Throughput -1 if(c) g 1 cycle end = latency if (¬ c) m latency h 2 Time f 3 if(c) g 4 if (¬ c) m h 5 Latency: number of time units between the beginning and the end of the execution of a cycle 14/12/11 6 Throughput: number of cycles executed in one time unit

Our objective Resource P1 P2 P3 f 0 Throughput -1 Prolog End to ≤ latency Throughput -1 if(c 0 ) g end 1 = latency if( ¬ c 0 ) m latency f h 2 unchanged Time if(c 1 ) g f 3 if( ¬ c 1 ) m Steady- If(c 1 ) g f h 4 state if( ¬ c 1 ) m if(c 2 ) g h 5 if( ¬ c 2 ) m ... ... ... ... Goal : increase throughput while keeping the system's latency, 14/12/11 7 I/O function, and periodic behaviour

Our objective Resource P1 P2 P3 f 0 Prolog if(c 0 ) g 1 if( ¬ c 0 ) m f h 2 Kernel if(c 1 ) g 3 if( ¬ c 1 ) m Steady- f h 4 state if(c 2 ) g 5 if( ¬ c 2 ) m ... ... ... ... Prolog and steady-state are instances of the kernel 14/12/11 8

Previous work(1): Software Pipelining Scheduling techniques for parallelizing loop computations : ● Developped since the 1980's, ● First aimed at massively parallel architectures such as VLIW and superscalar machines, ● Now common optimization, present in most compilers , ● Similar to hardware pipelining: out-of-order execution, ● Reordering done in the compiler instead of in the processor. 14/12/11 9

Software Pipelining vs our work ● Low-level vs coarse-grain code generation technique ● Goal : optimize average-case throughput by reorganizing operations order to take advantage of parallelism vs optimize worst-case throughput without degrading the cycles latency by preserving the intra- cycle scheduling ● No periodicity for applications with data-dependent control vs preservation of the periodic behaviour of the application ● Low degree of control over operators/functional units for conditional execution vs exploitation of conditional 14/12/11 10 execution to improve the pipelining process

Previous work(2): Retiming ● Optimization method in which registers in a synchronous circuit are relocated in order to improve the throughput or memory consumption of an application, ● Very similar to our techniques e.g. no increase in latency after applying the retiming techniques, preservation of the I/O function , ● Nevertheless: no support for conditional execution/predication . 14/12/11 11

Previous work(3): Real-Time Software Pipelining ● Builds a pipelined schedule for the application, ● Demonstrated on e.g. multimedia streaming applications, ● Again, no optimization for conditional execution 14/12/11 12

Elements of our approach Architecture model Initial non-pipelined Pipelined Algorithms scheduling table scheduling table We design low-level implementation models that can be integrated at the end of the development cycle 14/12/11 13

Architecture model Bipartite undirected graph: A=<P,M,C> , where: ● P: "processors", i.e. computation and communication resources capable of independent execution (Processors, DMAs, ...), ● M: RAM blocks, ● (P,M) ∈ C indicates that processor P has direct access to memory block M. RAM blocks: sets of disjoint untyped memory cells Example: P1 P2 P3 RAM v 1 ;v 2 ;c 14/12/11 14

Reservation/Scheduling table S=<p,O,Init> , where : ● p : activation period of execution cycles, equal to the length of the reservation table, ● O : Set of scheduled operations, ● Init : set of initial values of all memory cells (can be nil or a constant). 14/12/11 15

Reservation/Scheduling table Scheduled operation o: ● In(o): set of memory cells whose data is used as input by o, ● Out(o): set of memory cells written by o, ● Guard(o): execution condition of o (predicate over memory cells), ● Res(o): set of "processors" used during the execution of o, ● t(o): start date of o, ● d(o): duration of o, maximum time budget which can be ensured throught WCET analysis. 14/12/11 16

Reservation/Scheduling table Well-formed properties: ● Exclusive resource use: for O 1, O 2 scheduled on the same resource, if Guard(O 1 ) ∧ Guard(O 2 ) ≠ false, then t(O 1 ) ≥ t(O 2 )+d(O 2 ) or t(O 2 ) ≥ t(O 1 )+d(O 1 ) , ● No data races : if O 1 writes variable v 1 and O 2 uses (reads or writes) v 1 , then t(O 1 ) ≥ t(O 2 )+d(O 2 ) or t(O 2 ) ≥ t(O 1 )+d(O 1 ) , or Guard(O 1 ) ∧ Guard(O 2 ) = false, ● Causal correctness. Enough to describe non-pipelined schedules. 14/12/11 17

Pipelined Reservation/Scheduling table For pipelined schedules , each scheduled operation o also has a start index fst(o). It accounts for the prologue phase, where operations progressively start to execute. If operation o has fst(o)=n, it will first be executed in the pipelined cycle of index n. Due to periodicity, the description of the schedule of the kernel with start indexes is enough to describe the whole execution of the system. Memory elements can be modified to take into account 14/12/11 18 the variable replication process (described later)

Pipelined Reservation/Scheduling table Resource P1 P2 P3 0 f Prolog if(C 0 ) g 1 if( ¬ C 0 ) m Time f h 2 if(C 1 ) g 3 if( ¬ C 1 ) m Steady- state 4 f h if(C 2 ) g 5 if( ¬ C 2 ) m 14/12/11 19

Pipelined Reservation/Scheduling table Resource P1 P2 P3 0 f h Pipelined Pipelined fst=1 Iteration 0 table if(C 0 ) g 1 if( ¬ C 0 ) m Time 2 f h Pipelined fst=1 Iteration 1 3 if(C 1 ) g if( ¬ C 1 ) m 4 f h Pipelined Iteration 2 if(C 2 ) g 5 if( ¬ C 2 ) m 14/12/11 20

Pipelining algorithm ● Constraints: • need to respect inter-cycle data dependency • no two operations can use a "processor" at the same time • no memory cell can be written by an operation and used (written or read) by another at the same time ● Our algorithm • Enforces the fulfilment of these constraints • Incrementally builds the Data Dependency Graph of the application • Takes advantage of guards during pipelining (better than existing work) • Specific memory handling 14/12/11 21

Pipelining algorithm ● Relies on the incremental construction of the Data Dependency Graph (DDG) of the application i.e. the set {(o 1 ,o 2 ,n)} for all o 1 and o 2 such that In(o 2 ) ∩ Out(o 1 ) ≠∅ , and o 1 happens n cycles before o 2 , ● Uses an SSA transformation before performing a symbolic execution of the different iterations in order to construct the DDG. 14/12/11 22

Pipelining algorithm P2 P3 P4 P1 c:= ¬ c 0 if(c) if( ¬ c) 1 v 2 :=f 1 (v 1 ) w 2 :=g 1 (w 1 ) 2 if(c) if( ¬ c) 3 v 3 :=f 2 (v 2 ) w 3 :=g 2 (w 2 ) 4 if(c) if( ¬ c) 5 v 1 :=f 3 (v 3 ) w 1 :=g 3 (w 3 ) 6 14/12/11 23

Pipelining algorithm P2 P3 P4 P1 c:= ¬ c 0 if(c) if( ¬ c) 1 v 2 :=f 1 (v 1 ) w 2 :=g 1 (w 1 ) c 1 := ¬ c 2 if(c 1 ) if( ¬ c 1 ) if(c) if( ¬ c) 3 v 3 :=f 2 (v 2 ) w 3 :=g 2 (w 2 ) v 2 :=f 1 (v 1 ) w 2 :=g 1 (w 1 ) 4 if(c 1 ) if( ¬ c 1 ) if(c) if( ¬ c) 5 v 1 :=f 3 (v 3 ) w 1 :=g 3 (w 3 ) v 3 :=f 2 (v 2 ) w 3 :=g 2 (w 2 ) 6 if(c 1 ) if( ¬ c 1 ) 7 v 1 :=f 3 (v 3 ) w 1 :=g 3 (w 3 ) 8 14/12/11 24

Pipelining algorithm P2 P3 P4 P1 c:= ¬ c 0 if(c) if( ¬ c) 1 v 2 :=f 1 (v 1 ) w 2 :=g 1 (w 1 ) c 1 := ¬ c 2 if(c 1 ) if( ¬ c 1 ) if(c) if( ¬ c) 3 v 3 :=f 2 (v 2 ) w 3 :=g 2 (w 2 ) v 2 :=f 1 (v 1 ) w 2 :=g 1 (w 1 ) 4 if(c 1 ) if( ¬ c 1 ) if(c) if( ¬ c) 5 v 1 :=f 3 (v 3 ) w 1 :=g 3 (w 3 ) v 3 :=f 2 (v 2 ) w 3 :=g 2 (w 2 ) c 2 := ¬ c 1 6 if(c 2 ) if(c 1 ) if( ¬ c 2 ) if( ¬ c 1 ) 7 v 2 :=f 1 (v 1 ) v 1 :=f 3 (v 3 ) w 2 :=g 1 (w 1 ) w 1 :=g 3 (w 3 ) 8 if(c 2 ) if( ¬ c 2 ) 9 v 3 :=f 2 (v 2 ) w 3 :=g 2 (w 2 ) 10 if(c 2 ) if( ¬ c 2 ) 11 v 1 :=f 3 (v 3 ) w 1 :=g 3 (w 3 ) 14/12/11 25 Complete algorithm: the first repetition is fully covered

Latency-preserving software pipelining of predicated reservation - PowerPoint PPT Presentation

Latency-preserving software pipelining of predicated reservation tables for distributed hard real-time applications Thomas Carle Dumitru Potop-Butucaru INRIA Paris-Rocquencourt, FRANCE 14/12/11 1 Team AOSTE Outline Throughput

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

WP 9.5 task analogue HCAL Erika Garutti AIDA kickoff meeting 16-18 Feb. 2011, CERN The Goal

Multimedia Communications Spring 2006-07 Voice Traffic Characteristics Shahab Baqai LUMS Voice

Chapter 6 Reduction of Multiple Sub-Systems

INC 342 Lecture 1: Root Locus Dr. Benjamas Panomruttanarug Benjamas.pan@kmutt.ac.th DC motor BP

(ALGS) ICELW 2020 June 10 th -12 th , New York, NY, USA AUTHORS Ghada El-Hadad, Doaa Shawky and

High-Precision Trajectory Tracking in Changing Environments Through " Adaptive Feedback

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

INTRODUCTION TO MUSICAL TIMBRE YU / LAMONT FEBRUARY 20, 2018 LINGUIST 197M, SPRING 2018. CLASS

Latency-preserving software pipelining of predicated reservation - PowerPoint PPT Presentation

Latency-preserving software pipelining of predicated reservation tables for distributed hard real-time applications Thomas Carle Dumitru Potop-Butucaru INRIA Paris-Rocquencourt, FRANCE 14/12/11 1 Team AOSTE Outline Throughput

Pipelining Instruction Pipelining is the use of pipelining to allow more than one instruction to

Pipelining 1 Today Quiz Introduction to pipelining 2 Pipelining L L a a Logic

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Computer Systems Lecture 15 Pipelining and Hazards CS 230 - Spring 2020 3-1 Pipelining CS

Lecture 2 (I ): Lecture 2 (I ): Pipelining &amp; Retiming Pipelining &amp; Retiming

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Overview Basics of Pipelining Pipeline Hazards Appendix A Pipeline Implementation

Computer Architecture Summer 2020 Pipelining Tyler Bletsch Duke University Includes material

Appendix A Pipelining: Basic and Intermediate C Concepts t 1 Overview Basics of

Chapter Six 1 2004 Morgan Kaufmann Publishers Pipelining The laundry analogy for

EE 457 Unit 6a Basic Pipelining Techniques 2 Pipelining Introduction Consider a drink

Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline

Retiming &amp; Pipelining over Global Retiming &amp; Pipelining over Global Interconnects

Overview General Principles of Pipelining Goal Computer Architecture: Pipelining

WP 9.5 task analogue HCAL Erika Garutti AIDA kickoff meeting 16-18 Feb. 2011, CERN The Goal

Multimedia Communications Spring 2006-07 Voice Traffic Characteristics Shahab Baqai LUMS Voice

Chapter 6 Reduction of Multiple Sub-Systems

INC 342 Lecture 1: Root Locus Dr. Benjamas Panomruttanarug Benjamas.pan@kmutt.ac.th DC motor BP

(ALGS) ICELW 2020 June 10 th -12 th , New York, NY, USA AUTHORS Ghada El-Hadad, Doaa Shawky and

High-Precision Trajectory Tracking in Changing Environments Through &quot; Adaptive Feedback

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

INTRODUCTION TO MUSICAL TIMBRE YU / LAMONT FEBRUARY 20, 2018 LINGUIST 197M, SPRING 2018. CLASS

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

High-Precision Trajectory Tracking in Changing Environments Through " Adaptive Feedback