Linear Analysis and Optimization
- f Stream Programs
Andrew A. Lamb William Thies Saman Amarasinghe
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Linear Analysis and Optimization of Stream Programs Andrew A. Lamb - - PowerPoint PPT Presentation
Linear Analysis and Optimization of Stream Programs Andrew A. Lamb William Thies Saman Amarasinghe The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology Streaming Application Domain
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Based on audio, video, or data stream Increasingly prevalent and important
Embedded systems
Cell phones, handheld computers
Desktop applications
Streaming media
Software radio
High-performance servers
Software routers (Example: Click) Cell phone base stations HDTV editing consoles
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
A large (possibly infinite) amount of data
Limited lifetime of each data item Little processing of each data item
Computation: apply multiple filters to data
Each filter takes an input stream, does some
processing, and produces an output stream
Filters are independent and self-contained
A regular, static computation pattern
Filter graph is relatively constant A lot of opportunities for compiler optimizations
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Goals:
Provide a High-Level Programming Paradigm Improve Programmer Productivity Match Performance of Hand-Hacked Assembly
Contributions
Language Design, Structured Streams, Buffer
Management (CC 2002)
Exploiting Wire-Exposed Architectures (ASPLOS 2002) Scheduling of Static Dataflow Graphs (LCTES 2003) Domain Specific Optimizations (PLDI 2003)
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Used in…
metal detector garage door opener spectrum analyzer
Source: Application Report SPRA414 Texas Instruments, 1999
A/D Duplicate LED Detect Band pass LED Detect LED Detect LED Detect
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
A/D Duplicate LED Detect Band pass LED Detect LED Detect LED Detect
void->void pipeline FrequencyBand { float sFreq = 4000; float cFreq = 500/(sFreq*2*pi); float wFreq = 100/(sFreq*2*pi); add D2ASource(sFreq); add BandPassFilter(1, cFreq-wFreq, cFreq+wFreq, 100); add splitjoin { split duplicate; for (int i=0; i<4; i++) { add Detector(i/4); add LEDOutput(i); } join roundrobin(0); } }
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
A/D Duplicate LED Detect LED Detect LED Detect LED Detect
void->void pipeline FrequencyBand { float sFreq = 4000; float cFreq = 500/(sFreq*2*pi); float wFreq = 100/(sFreq*2*pi); add D2ASource(sFreq); add BandPassFilter(1, cFreq-wFreq, cFreq+wFreq, 100); add splitjoin { split duplicate; for (int i=0; i<4; i++) { add Detector(i/4); add LEDOutput(i); } join roundrobin(0); } } float->float pipeline BandPassFilter(float gain, float ws, float wp, int num) { add LowPassFilter(1, wp, num); add HighPassFilter(gain, ws, num); } Low pass High pass
Band pass
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
void->void pipeline FrequencyBand { float sFreq = 4000; float cFreq = 500/(sFreq*2*pi); float wFreq = 100/(sFreq*2*pi); add D2ASource(sFreq); add BandPassFilter(1, cFreq-wFreq, cFreq+wFreq, 100); add splitjoin { split duplicate; for (int i=0; i<4; i++) { add Detector(i/4); add LEDOutput(i); } join roundrobin(0); } } float->float pipeline BandPassFilter(float gain, float ws, float wp, int num) { add LowPassFilter(1, wp, num); add HighPassFilter(gain, ws, num); }
A/D
High pass
Duplicate LED Detect
Low pass
Band pass LED Detect LED Detect LED Detect
float->float pipeline BandPassFilter(float gain, float ws, float wp, int num) { add LowPassFilter(1, wp, num); add HighPassFilter(gain, ws, num); }
float->float filter LowPassFilter(float g, float cFreq, int N) { float[N] h; init { int OFF = N/2; for (int i=0; i<N; i++) { h[i] = g*sin(…); } } work peek N pop 1 push 1 { float sum = 0; for (int i=0; i<N; i++) { sum += h[i]*peek(i); } push(sum); pop(); }
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
void->void pipeline FrequencyBand { float sFreq = 4000; float cFreq = 500/(sFreq*2*pi); float wFreq = 100/(sFreq*2*pi); add D2ASource(sFreq); add BandPassFilter(1, cFreq-wFreq, cFreq+wFreq, 100); add splitjoin { split duplicate; for (int i=0; i<4; i++) { add Detector(i/4); add LEDOutput(i); } join roundrobin(0); } } float->float pipeline BandPassFilter(float gain, float ws, float wp, int num) { add LowPassFilter(1, wp, num); add HighPassFilter(gain, ws, num); }
A/D
High pass
Duplicate LED Detect
Low pass
Band pass LED Detect LED Detect LED Detect
float->float pipeline BandPassFilter(float gain, float ws, float wp, int num) { add LowPassFilter(1, wp, num); add HighPassFilter(gain, ws, num); }
float->float filter LowPassFilter(float g, float cFreq, int N) { float[N] h; init { int OFF = N/2; for (int i=0; i<N; i++) { h[i] = g*sin(…); } } work peek N pop 1 push 1 { float sum = 0; for (int i=0; i<N; i++) { sum += h[i]*peek(i); } push(sum); pop(); }
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
A/D
High pass
Duplicate LED Detect
Low pass
Band pass LED Detect LED Detect LED Detect
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Source: Application Report SPRA414, Texas Instruments, 1999
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Source: Application Report SPRA414, Texas Instruments, 1999
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Source: Application Report SPRA414, Texas Instruments, 1999
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Source: Application Report SPRA414, Texas Instruments, 1999
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
DSP Optimizations Rewrite the program Design the Datapaths
(no control flow)
Architecture-specific Optimizations
(performance, power, code size)
Coefficient Tables C/Assembly Code
Signal Processing Expert in Matlab Software Engineer in C and Assembly
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Center frequency from 500 Hz to 1200 Hz?
According to TI,
in the conventional design-flow:
Redesign filter in MATLAB Cut-and-paste values to EXCEL Recalculate the coefficients Update assembly
If using StreamIt
Change one constant Recompile
A/D Duplicate LED Detect LED Detect LED Detect LED Detect Band pass
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
DSP Optimizations StreamIt Program (dataflow + control) Architecture-Specific Optimizations
C/Assembly Code Application-Level Design
StreamIt compiler Application Programmer
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
DSP Optimizations StreamIt Program (dataflow + control) Architecture-Specific Optimizations
C/Assembly Code Application-Level Design
Benefits of programming in a
Modular Composable Portable Malleable
The Challenge: Maintaining
Replacing Expert DSP Engineer Replacing Expert Assembly
Hacker
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Most common target of DSP optimizations
FIR filters Compressors Expanders DFT/DCT
Example optimizations:
Combining Adjacent Nodes Translating to Frequency Domain
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
A linear filter is a tuple <A, b, o>
A: matrix of coefficients b: vector of constants
Example
y x
= x A + b
1 1
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
A linear filter is a tuple <A, b, o>
A: matrix of coefficients b: vector of constants
Example
2 1 1 2 A =
1 1 b =
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
work peek N pop 1 push 1 { float sum = 0; for (int i=0; i<N; i++) { sum += h[i]*peek(i); } push(sum); pop(); }
Resembles constant propagation Maintains linear form <v, b> for each variable
Peek expression: generate fresh v Push expression: copy v into A Pop expression: increment o
Linear Dataflow Analysis
1 1
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Pipelines and splitjoins can be collapsed Example: pipeline
Filter 1 Filter 2
x y z y = x A z = y B
Combined Filter
z = x C z = x A B
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
= 3 2 1 B
6 5 4 = A
6 mults
1 mults
Filter 1 Filter 2 C = [ 32 ] Combined Filter
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
U E
U E
Original Expanded
σ pop = σ
Linear Expansion
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
0% 20% 40% 60% 80% 100% F I R R a t e C
v e r t T a r g e t D e t e c t F M R a d i
a d a r F i l t e r B a n k V
e r O v e r s a m p l e D T
Benchmark Flops Removed (%) linear
0.3%
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Convolutions can be done
cheaply in the Frequency Domain
Painful to do by hand
Blocking Coefficient calculations Startup etc.
Xi*Wn-i
X ← F(x) Y ← X .* H y ← F -1(Y)
FFT VVM IFFT
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
0% 20% 40% 60% 80% 100% F I R R a t e C
v e r t T a r g e t D e t e c t F M R a d i
a d a r F i l t e r B a n k V
e r O v e r s a m p l e D T
Benchmark Flops Removed (%) linear freq
0.3%
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
When to apply what transformations?
Linear filter combination can increase the
computation cost
Shifting to the Frequency domain is expensive for
filters with pop > 1
Compute all outputs, then decimate by pop rate
Some expensive transformations may later enable
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Estimate minimal cost for each structure:
Linear combination Frequency translation No transformation
If hierarchical, consider all possible
groupings of children
Overlapping sub-problems allows efficient
Cost function based
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
RR BeamFrm Filter Mag Detect Duplicate BeamFrm Filter Mag Detect BeamFrm Filter Mag Detect BeamFrm Filter Mag Detect Splitter Sink RR Splitter(null) Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2 Input Dec Dec Cfilt CFilt2
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Linear Combination Frequency No Transform
low high
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
low high
Linear Combination Frequency No Transform
1x1
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
low high
Linear Combination Frequency No Transform
1x1
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
low high
Linear Combination Frequency No Transform
?
min= min= + 1x1
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
low high
Linear Combination Frequency No Transform
1x1 1x2
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Overall solution
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Splitter Sink RR Mag Detect Duplicate Mag Detect Mag Detect BeamFrm BeamFrm BeamFrm BeamFrm Filter Filter Filter Filter Mag Detect RR Splitter(null) Input Input Input Input Input Input Input Input Input Input Input Input Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt Cfilt CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2 CFilt2
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
RR RR RR
Splitter Sink RR Filter Mag Detect Filter Mag Detect Filter Mag Detect Duplicate BeamFrm BeamFrm BeamFrm BeamFrm Filter Mag Detect Splitter(null) Input Input Input Input Input Input Input Input Input Input Input Input
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
RR RR
Splitter Sink RR Filter Mag Detect Filter Mag Detect Filter Mag Detect RR Duplicate BeamFrm BeamFrm BeamFrm BeamFrm Filter Mag Detect Splitter(null) Input Input Input Input Input Input Input Input Input Input Input Input
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
RR RR Splitter Sink RR Filter Mag Detect Filter Mag Detect Filter Mag Detect Filter Mag Detect Splitter(null) Input Input Input Input Input Input Input Input Input Input Input Input
half as many FLOPS
Splitter Sink RR Mag Duplicate Mag Mag Mag RR Splitter(null) Input Input Input Input Input Input Input Input Input Input Input InputMaximal Combination and Shifting to Frequency Domain Using Transformation Selection 2.4 times as many FLOPS
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
0% 20% 40% 60% 80% 100% F I R R a t e C
v e r t T a r g e t D e t e c t F M R a d i
a d a r F i l t e r B a n k V
e r O v e r s a m p l e D T
Benchmark Flops Removed (%) linear freq autosel
0.3%
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
Fully automatic implementation
StreamIt compiler
StreamIt to C compilation
FFTW for shifting to the frequency domain
Benchmarks all written in StreamIt Measurements
Dynamic floating-point instruction counting Speedups on a general purpose processor
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
0% 100% 200% 300% 400% 500% 600% 700% 800% 900% F I R R a t e C
v e r t T a r g e t D e t e c t F M R a d i
a d a r F i l t e r B a n k V
e r O v e r s a m p l e D T
Benchmark Speedup (%) linear freq autosel
5%
On a Pentium IV
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
SPIRAL/SPL (Püschel et. al)
Automatic derivation of DSP transforms
FFTW (Friego et. al)
Wicked fast FFT
ADE (Covell, MIT PhD Thesis, 1989) Affine Analysis (Karr, Acta Informatica, 1976)
Affine relationships among variables of a program
Linear Analysis (Cousot, Halbwatchs, POPL, 1978)
Automatic discovery of linear restraints among
variables of a program
The New Laboratory for Computer Science and Artificial Intelligence Massachusetts Institute of Technology
http://cag.lcs.mit.edu/linear/