Optimizing Stream Programs Using Linear State Space Analysis Sitij - PowerPoint PPT Presentation

1 Optimizing Stream Programs Using Linear State Space Analysis Sitij Agrawal 1,2 , William Thies 1 , and Saman Amarasinghe 1 1 Massachusetts Institute of Technology 2 Sandbridge Technologies CASES 2005 http://cag.lcs.mit.edu/streamit

Streaming Application Domain 2 AtoD • Based on a stream of data – Graphics, multimedia, software radio Decode – Radar tracking, microphone arrays, duplicate HDTV editing, cell phone base stations • Properties of stream programs LPF 1 LPF 2 LPF 3 – Regular and repeating computation HPF 1 HPF 2 HPF 3 – Parallel, independent actors with explicit communication roundrobin – Data items have short lifetimes Encode Transmit

Conventional DSP Design Flow 3 Spec. (data-flow diagram) Design the Datapaths (no control flow) Signal Processing Expert in Matlab DSP Optimizations Coefficient Tables Rewrite the program Software Engineer Architecture-specific in C and Assembly Optimizations (performance, power, code size) C/Assembly Code

Ideal DSP Design Flow 4 Application-Level Design High-Level Program Application Programmer (dataflow + control) DSP Optimizations Compiler Architecture-Specific Optimizations Challenge: maintaining performance Challenge: maintaining performance C/Assembly Code

The StreamIt Language 5 • Goals: – Provide a high-level stream programming model – Invent new compiler technology for streams • Contributions: – Language design [CC ’02, PPoPP ’05] – Compiling to tiled architectures [ASPLOS ’02, ISCA ’04, Graphics Hardware ’05] – Cache-aware scheduling [LCTES ’03, LCTES ’05] – Domain-specific optimizations [PLDI ’03, CASES ‘05]

Programming in StreamIt 6 void->void pipeline FMRadio(int N, float lo, float hi) { AtoD add AtoD(); add FMDemod(); FMDemod add splitjoin { split duplicate; Duplicate for (int i=0; i<N; i++) { add pipeline { add LowPassFilter(lo + i*(hi - lo)/N); LPF 1 LPF 2 LPF 3 add HighPassFilter(lo + i*(hi - lo)/N); HPF 1 HPF 2 HPF 3 } } RoundRobin join roundrobin(); } add Adder(); Adder add Speaker(); Speaker }

Example StreamIt Filter 7 float->float filter LowPassButterWorth (float sampleRate, float cutoff) { float coeff; float x; init { coeff = calcCoeff(sampleRate, cutoff); } work peek 2 push 1 pop 1 { filter x = peek (0) + peek (1) + coeff * x; push (x); pop (); } }

Focus: Linear State Space Filters 8 • Properties: 1. Outputs are linear function of inputs and states 2. New states are linear function of inputs and states • Most common target of DSP optimizations – FIR / IIR filters – Linear difference equations – Upsamplers / downsamplers – DCTs

Representing State Space Filters 9 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs u states 〈 A, B, C, D 〉 x’ = Ax + Bu y = Cx + Du outputs

Representing State Space Filters 10 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { float u = pop(); 〈 A, B, C, D 〉 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs

Representing State Space Filters 11 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = D = 2 2 x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs

Representing State Space Filters 12 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 D = C = 2 2 x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs

Representing State Space Filters 13 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = 2 2 D = x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs

Representing State Space Filters 16 • A state space filter is a tuple � A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = 2 2 D = x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs Linear dataflow analysis

State Space Optimizations 17 1. State removal 2. Reducing the number of parameters 3. Combining adjacent filters

Change-of-Basis Transformation 18 x’ = Ax + Bu y = Cx + Du

Change-of-Basis Transformation 19 x’ = Ax + Bu y = Cx + Du T = invertible matrix Tx’ = TAx + TBu y = Cx + Du

Change-of-Basis Transformation 20 x’ = Ax + Bu y = Cx + Du T = invertible matrix Tx’ = TA(T -1 T)x + TBu y = C(T -1 T)x + Du

Change-of-Basis Transformation 21 x’ = Ax + Bu y = Cx + Du T = invertible matrix Tx’ = TAT -1 (Tx) + TBu y = CT -1 (Tx) + Du

Change-of-Basis Transformation 22 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx Tx’ = TAT -1 (Tx) + TBu y = CT -1 (Tx) + Du

Change-of-Basis Transformation 23 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx z’ = TAT -1 z + TBu y = CT -1 z + Du

Change-of-Basis Transformation 24 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx A’ = TAT -1 B’ =TB z’ = A’z + B’u y = C’z + D’u C’ = CT -1 D’ = D

Change-of-Basis Transformation 25 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx A’ = TAT -1 B’ =TB z’ = A’z + B’u y = C’z + D’u C’ = CT -1 D’ = D Can map original states x to transformed states z = Tx without changing I/O behavior

1) State Removal 26 • Can remove states which are: a. Unreachable – do not depend on input b. Unobservable – do not affect output • To expose unreachable states, reduce [A | B] to a kind of row-echelon form – For unobservable states, reduce [A T | C T ] • Automatically finds minimal number of states

State Removal Example 27 1 0 0.3 0.9 0 0.3 0.9 0 T = x’ = 0 0.9 x + u x’ = 0 0.9 x + u 1 1 0.2 0.5 y = 2 2 x + 2u x + 2u y = 0 2 float->float filter IIR { float x1, x2; work push 1 pop 1 { float u = pop(); push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; x2 = 0.9*x2 + 0.2*u; } }

State Removal Example 28 1 0 0.3 0.9 0 0.3 0.9 0 T = x’ = 0 0.9 x + u x’ = 0 0.9 x + u 1 1 0.2 0.5 y = 2 2 x + 2u x + 2u y = 0 2 x1 is unobservable float->float filter IIR { float x1, x2; work push 1 pop 1 { float u = pop(); push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; x2 = 0.9*x2 + 0.2*u; } }

State Removal Example 29 1 0 0.3 0.9 0 T = x’ = 0 0.9 x + u x’ = 0.9x + 0.5u 1 1 0.2 y = 2x + 2u y = 2 2 x + 2u float->float filter IIR { float->float filter IIR { float x1, x2; float x; work push 1 pop 1 { work push 1 pop 1 { float u = pop(); float u = pop(); push(2*(x1+x2+u)); push(2*(x+u)); x1 = 0.9*x1 + 0.3*u; x = 0.9*x + 0.5*u; x2 = 0.9*x2 + 0.2*u; } } } }

State Removal Example 30 5 FLOPs 9 FLOPs 8 load/store 12 load/store output output float->float filter IIR { float->float filter IIR { float x1, x2; float x; work push 1 pop 1 { work push 1 pop 1 { float u = pop(); float u = pop(); push(2*(x1+x2+u)); push(2*(x+u)); x1 = 0.9*x1 + 0.3*u; x = 0.9*x + 0.5*u; x2 = 0.9*x2 + 0.2*u; } } } }

2) Parameter Reduction 31 • Goal: Convert matrix entries (parameters) to 0 or 1 • Allows static evaluation: 1*x � x Eliminate 1 multiply 0*x + y � y Eliminate 1 multiply, 1 add • Algorithm (Ackerman & Bucy, 1971) – Also reduces matrices [A | B] and [A T | C T ] – Attains a canonical form with few parameters

Parameter Reduction Example 32 T = 2 x’ = 0.9x + 1 u x’ = 0.9x + 0.5u y = 1 x + 2u y = 2x + 2u 6 FLOPs 4 FLOPs output output

3) Combining Adjacent Filters 33 u Filter 1 u y = D 1 u Combined y z = D 2 D 1 u z = Eu Filter E Filter 2 z z = D 2 y z

3) Combining Adjacent Filters 34 u u B 1 A 1 0 x’ = x + u Combined B 2 D 1 B 2 C 1 A 2 Filter 1 Filter z = D 2 C 1 C 2 x + D 2 D 1 u y z Also in paper: Filter 2 - combination of parallel streams - combination of feedback loops - expansion of mis-matching filters z

Optimizing Stream Programs Using Linear State Space Analysis Sitij - PowerPoint PPT Presentation

1 Optimizing Stream Programs Using Linear State Space Analysis Sitij Agrawal 1,2 , William Thies 1 , and Saman Amarasinghe 1 1 Massachusetts Institute of Technology 2 Sandbridge Technologies CASES 2005 http://cag.lcs.mit.edu/streamit Streaming

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

Optimizing for Space and Time Optimizing for Space and Time Usage with Speculative Par Usage

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to

Assessing stream and riparian conditions Stream Habitat Assessment Conducted yearly

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Stream Switching Control draft-gentric-mmusic-stream-switching-00.txt Philippe Gentric

B.e) Stream Ciphers W. Schindler: Cryptography, B-IT, winter 2006 / 2007 2 B.125 Stream Ciphers

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Programming in Pd Week 1 About this course... 10:45am -12:15pm 10:45 11:45 Class (1hr)

A MASKED RING-LWE IMPLEMENTATION Oscar Reparaz, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid

Adaptive Mapping of Linear DSP Adaptive Mapping of Linear DSP Algorithms to Fixed- -Point

Contents Slide 1-1 Some DSP Chip History Slide 1-2 Other DSP Manufacturers Slide 1-3 DSP

Single Touch Payroll - Phase 2 Digital Service Providers (DSPs) Presented by: Michael Karavas

Why is it important to measure operational wireless networks? Diagnose faults Identify

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs

FPGA Multipliers Bogdan PASCA projet Ar enaire, ENS-Lyon/INRIA/CNRS/Universit e de Lyon,

Optimizing Stream Programs Using Linear State Space Analysis Sitij - PowerPoint PPT Presentation

1 Optimizing Stream Programs Using Linear State Space Analysis Sitij Agrawal 1,2 , William Thies 1 , and Saman Amarasinghe 1 1 Massachusetts Institute of Technology 2 Sandbridge Technologies CASES 2005 http://cag.lcs.mit.edu/streamit Streaming

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

Optimizing for Space and Time Optimizing for Space and Time Usage with Speculative Par Usage

Multiple Programs How do programs communicate? 1 Multiple Programs How do programs communicate?

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to

Assessing stream and riparian conditions Stream Habitat Assessment Conducted yearly

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Stream Switching Control draft-gentric-mmusic-stream-switching-00.txt Philippe Gentric

B.e) Stream Ciphers W. Schindler: Cryptography, B-IT, winter 2006 / 2007 2 B.125 Stream Ciphers

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Programming in Pd Week 1 About this course... 10:45am -12:15pm 10:45 11:45 Class (1hr)

A MASKED RING-LWE IMPLEMENTATION Oscar Reparaz, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid

Adaptive Mapping of Linear DSP Adaptive Mapping of Linear DSP Algorithms to Fixed- -Point

Contents Slide 1-1 Some DSP Chip History Slide 1-2 Other DSP Manufacturers Slide 1-3 DSP

Single Touch Payroll - Phase 2 Digital Service Providers (DSPs) Presented by: Michael Karavas

Why is it important to measure operational wireless networks? Diagnose faults Identify

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs

FPGA Multipliers Bogdan PASCA projet Ar enaire, ENS-Lyon/INRIA/CNRS/Universit e de Lyon,

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE