Efficient Analysis of Multidimensional Linear Systems for - - PowerPoint PPT Presentation

β–Ά
efficient analysis of
SMART_READER_LITE
LIVE PREVIEW

Efficient Analysis of Multidimensional Linear Systems for - - PowerPoint PPT Presentation

Efficient Analysis of Multidimensional Linear Systems for Wordlength Optimization Gal Deest Tomofumi Yuki Olivier Sentieys Steven Derrien This work was funded by European FP7 project Alma 1 Embedded System Design Many constraints:


slide-1
SLIDE 1

Efficient Analysis of Multidimensional Linear Systems for Wordlength Optimization

GaΓ«l Deest Tomofumi Yuki Olivier Sentieys Steven Derrien

1

This work was funded by European FP7 project Alma

slide-2
SLIDE 2

Embedded System Design

Many constraints:

  • Power efficiency
  • Production cost
  • Performance / speed
  • Time-to-market
  • …

2

slide-3
SLIDE 3

Design-Space Exploration (DSE)

Execution time Cost

Feasible solutions

Cost = power or area

3

slide-4
SLIDE 4

Design-Space Exploration (DSE)

Execution time Cost Time constraint

Cost = power or area

3

slide-5
SLIDE 5

Design-Space Exploration (DSE)

Execution time Cost Time constraint

Cost = power or area

Optimum

3

slide-6
SLIDE 6

Design-Space Exploration (DSE)

Cost Accuracy degradation (Signal to Noise Ratio) Accuracy constraint

Cost = power or area

Optimum

3

slide-7
SLIDE 7

Design-Space Exploration (DSE)

Cost Accuracy degradation (Signal to Noise Ratio) Accuracy constraint

Cost = power or area

Optimum

3

Custom fixed-point formats used to reduce cost

slide-8
SLIDE 8

Wordlength Optimization Process

Soft accuracy constraints (eg., noise power)

Fast accuracy evaluation is critical for thorough design-space exploration

4

slide-9
SLIDE 9

This Work

State of the art: Diagnostic: Applicability issues for analytical techniques. Contribution: Extended applicability and scalability.

5

Techniques Applicability Depth of DSE Simulation-based Excellent Limited Current analytical techniques Limited Good Our approach Good Good

slide-10
SLIDE 10

Overview

  • Background
  • Analytical techniques
  • Proposed approach

6

slide-11
SLIDE 11

Fixed-Point Arithmetic

  • Scaled integers:
  • Product of 2 π‘œ-bit numbers

2π‘œ bits !

  • Some bits must be dropped (quantization)

Example (truncation): Integer value 2βˆ’π‘™ Γ—

πŸ‘πŸ’ πŸ‘πŸ‘ πŸ‘πŸ πŸ‘πŸ πŸ‘βˆ’πŸ πŸ‘βˆ’πŸ‘ πŸ‘βˆ’πŸ’ πŸ‘βˆ’πŸ“ πŸ‘βˆ’πŸ” πŸ‘βˆ’πŸ•

Dropped bits

7

slide-12
SLIDE 12

Quantization Errors

Modeled as noise / random variable Example: Truncation to πŸ‘βˆ’π’ precision 𝑓𝑠𝑠𝑝𝑠 ~ 𝑉 βˆ’2βˆ’π‘œ; 0

  • Assumptions: Widrow hypothesis
  • Statistical moments:

𝜈 = βˆ’2βˆ’π‘œβˆ’1 𝜏2 =

2βˆ’2π‘œ 12

8

slide-13
SLIDE 13

Analytical Techniques

Goal: Compute an output noise formula Idea: Model propagation of errors to the output Representation: Signal Flow Graphs (SFG)

9

slide-14
SLIDE 14

Accuracy Model Construction

10

slide-15
SLIDE 15

Accuracy Model Construction

10

Quantization errors = new inputs

slide-16
SLIDE 16

Accuracy Model Construction

10

Quantization errors = new inputs

slide-17
SLIDE 17

Accuracy Model Construction

10

Compute transfer function for each error

slide-18
SLIDE 18

The Challenge

11

How to go from this…

float xb[N]; float fir(float in) { float y = 0; xb[0] = in; for (int i=0; i<N; i++) acc += b[i]*xb[i]; for (int i=N-1; i>0; i--) xb[i] = xb[i-1]; return y; }

…to this ?

slide-19
SLIDE 19

The Challenge

12

Current methods:

  • Flatten control (completely unroll loops, etc.)
  • Heavy use of annotations:

Example: #pragma DELAY float xb[N]; Limitations:

  • Scalability issues (large graphs)
  • Implicit 1D stream assumption
  • Not easily applicable to image processing
slide-20
SLIDE 20

Motivating Example: Deriche Filter

13

Horizontal: Vertical: Similar along columns Recursive Filter

𝑏1 𝑏2 𝑐1 𝑐2 𝑏3 𝑏4 𝑐1 𝑐2 Left-to-right Right-to-left

slide-21
SLIDE 21

Motivating Example: Deriche Filter

Issues with SFG representation:

  • Requires image size to be statically known
  • Each pixel is a different input
  • Number of transfer functions: 𝑃(𝑂4)
  • For 32x32 image: 1,048,576 !

Cannot be handled with current methods

14

slide-22
SLIDE 22

Intuition of the Technique

  • Current tools cannot capture regularity of

multidimensional filters. Idea:

  • Generalize SFGs to multidimensional systems of

equations.

  • Infer this representation using polyhedral

dependence analysis.

15

slide-23
SLIDE 23

Steps of our Method

  • 1. Build an equational representation of the

program.

16

float xb[N]; float fir(float x) { float y = 0; xb[0] = in; for (int i=0; i<N; i++) y += b[i]*xb[i]; for (int i=N-1; i>0; i--) xb[i] = xb[i-1]; return y; } 𝑇0 π‘œ = 𝑇1 π‘œ = 𝑦(π‘œ) 𝑇2 π‘œ, 𝑗 = 𝑇0 π‘œ + 𝑐(𝑗) Γ— 𝑇1(π‘œ) 𝑗 = 0 𝑇2 π‘œ, 𝑗 βˆ’ 1 + 𝑐(𝑗) Γ— 𝑇3(π‘œ βˆ’ 1, 𝑗) 𝑗 > 0 𝑇3 π‘œ, 𝑗 = 𝑇1(π‘œ) 𝑗 = 1 𝑇3(π‘œ βˆ’ 1, 𝑗 βˆ’ 1) 𝑗 > 1 𝑧 π‘œ = 𝑇2(π‘œ, 𝑂 βˆ’ 1)

slide-24
SLIDE 24

Equational Representation

Example:

  • Statement ≑ equation
  • Keeps track of data dependencies
  • Easy to transform / reason about
  • Relies on Array Dataflow Analysis (Feautrier, 1991)

17

𝑇0() = 𝑇1(𝑗) = 𝑏𝑠𝑠 𝑗 + 𝑇0() 𝑗 = 0 𝑇1(𝑗 βˆ’ 1) 𝑗 > 0 float tmp = 0; for (int i=0; i<N; i++) tmp = arr[i] + tmp;

slide-25
SLIDE 25

Example: Simplified Deriche Filter

20

for (int i=0; i<N; i++) { prev = 0; for (int j=0; j<N; j++) { tmp[i][j] = a1*x[i][j] + b1*prev; prev = tmp[i][j]; } } for (int j=0; j<N; j++) { prev = 0; for (int i=0; i<N; i++) { y[i][j] = a2*tmp[i][j] + b2*prev; prev = y[i][j]; } } Horizontal pass (row scan) Vertical pass (column scan)

slide-26
SLIDE 26

Equation System

After pre-processing: 𝑇1 𝑗, π‘˜ = 𝑏1𝑦 𝑗, π‘˜ + 𝑐1𝑇1(𝑗, π‘˜ βˆ’ 1) 𝑇2 π’Œ, 𝒋 = 𝑏2𝑇1 𝒋, π’Œ + 𝑐2𝑇2(π‘˜, 𝑗 βˆ’ 1)

21

slide-27
SLIDE 27

Equation System

After pre-processing: 𝑇1 𝑗, π‘˜ = 𝑏1𝑦 𝑗, π‘˜ + 𝑐1𝑇1(𝑗, π‘˜ βˆ’ 1) 𝑇2 π’Œ, 𝒋 = 𝑏2𝑇1 𝒋, π’Œ + 𝑐2𝑇2(π‘˜, 𝑗 βˆ’ 1)

21

Swapped dimensions (Non-uniform dependences)

slide-28
SLIDE 28

Steps of our method

  • 2. Uniformization

𝑇1 𝑗, π‘˜ = 𝑏1𝑦 𝑗, π‘˜ + 𝑐1𝑇1(𝑗, π‘˜ βˆ’ 1) 𝑇2 𝒋, π’Œ = 𝑏2𝑇1 𝒋, π’Œ + 𝑐2𝑇2(𝑗 βˆ’ 1, π‘˜)

23

slide-29
SLIDE 29

Steps of our Method

  • 3. Convolution Detection

Computation pattern: 𝑧 𝒍 =

π’˜

𝑑(π’˜) Γ— 𝑦(𝒍 βˆ’ π’˜)

  • Pattern matching.
  • Simplifies noise propagation.

19

slide-30
SLIDE 30

Convolution Detection

After pre-processing: 𝑇1 𝑗, π‘˜ = 𝑏1𝑦 𝑗, π‘˜ + 𝑐1𝑇1(𝑗, π‘˜ βˆ’ 1) 𝑇2 𝑗, π‘˜ = 𝑏2𝑇1 𝑗, π‘˜ + 𝑐2𝑇2(𝑗 βˆ’ 1, π‘˜)

23

𝑦 βˆ— 𝑏1 𝑇1 βˆ— 𝑐1

slide-31
SLIDE 31

Convolution Detection

After pre-processing: 𝑇1 𝑗, π‘˜ = 𝑏1𝑦 𝑗, π‘˜ + 𝑐1𝑇1(𝑗, π‘˜ βˆ’ 1) 𝑇2 𝑗, π‘˜ = 𝑏2𝑇1 𝑗, π‘˜ + 𝑐2𝑇2(𝑗 βˆ’ 1, π‘˜)

23

𝑦 βˆ— 𝑏1 𝑇1 βˆ— 𝑐1 βˆ— 𝑏2 𝑇2 βˆ— 𝑐2

slide-32
SLIDE 32

Accuracy Model Construction

  • 4. Compute noise propagation for each source

24

𝑦 βˆ— 𝑏1 𝑇1 βˆ— 𝑐1 βˆ— 𝑏2 𝑇2 βˆ— 𝑐2 Extract subfilter to the output. Example: From statement 𝑇1 to 𝑇2

slide-33
SLIDE 33

Impulse Response Computation

Determines noise propagation:

𝑓𝑠𝑠

𝑝𝑣𝑒 π’˜ = 𝑓𝑠𝑠 π‘—π‘œ βˆ— π’Š

π’˜

  • Easy to compute for non-recursive filters
  • Infinite for recursive filters

25

𝑇1 βˆ— 𝑐1 βˆ— 𝑏2 𝑇2 βˆ— 𝑐2

𝑓𝑠𝑠

π‘—π‘œ

𝑓𝑠𝑠

𝑝𝑣𝑒

π’Š

slide-34
SLIDE 34

Non-Recursive Filters

26

𝑦 βˆ— βˆ— β„Ž1 β„Ž2 𝑨 βˆ— β„Ž3 𝑧 𝑨 = 𝑦 βˆ— β„Ž1 + 𝑦 βˆ— β„Ž2 𝑧 = 𝑨 βˆ— β„Ž3

slide-35
SLIDE 35

Non-Recursive Filters

27

𝑦 βˆ— βˆ— β„Ž1 β„Ž2 𝑨 βˆ— β„Ž3 𝑧 𝑨 = 𝑦 βˆ— β„Ž1 + 𝑦 βˆ— β„Ž2 𝑧 = 𝑨 βˆ— β„Ž3

slide-36
SLIDE 36

Non-Recursive Filters

28

𝑦 βˆ— 𝑨 βˆ— β„Ž3 𝑧 𝑨 = 𝑦 βˆ— (β„Ž1+β„Ž2) 𝑧 = 𝑨 βˆ— β„Ž3 β„Ž1 + β„Ž2

slide-37
SLIDE 37

Non-Recursive Filters

29

𝑦 βˆ— 𝑨 βˆ— β„Ž3 𝑧 𝑨 = 𝑦 βˆ— (β„Ž1+β„Ž2) 𝑧 = 𝑨 βˆ— β„Ž3 β„Ž1 + β„Ž2

slide-38
SLIDE 38

Non-Recursive Filters

30

𝑦 𝑧 𝑧 = 𝑦 βˆ— β„Ž3 βˆ— (β„Ž1 + β„Ž2) βˆ— β„Ž3 βˆ— (β„Ž1 + β„Ž2) π’Š = β„Ž3 βˆ— (β„Ž1 + β„Ž2)

slide-39
SLIDE 39

Recursive Filters

31

𝑦 𝑧 βˆ— β„Ž1 βˆ— β„Ž2 𝑧 = 𝑦 βˆ— β„Ž1 + 𝑧 βˆ— β„Ž2

  • Finding π’Š ≑ solving multidimensional recurrence
  • Hard problem
slide-40
SLIDE 40

Impulse Response Approximation

  • Hypothesis: Filter is stable

π’˜

π’Š π’˜ < ∞

  • Consequence:

lim

π‘ β†’βˆž 𝑀 >𝑠

β„Ž(𝑀) = 0

  • Idea: Evaluate impulse response on sufficiently

large window.

32

slide-41
SLIDE 41

Back to the Definition

  • Impulse response = output of the filter when the

input is a unit impulse: πœ€ 𝑀 = 1 π’˜ = 𝟏

  • therwise
  • Feed the filter with impulse and use the output as

impulse response

33

slide-42
SLIDE 42

Experimental Results: Model Construction Time

Algorithm ID.Fix (s) Our Tool (s) IIR8 23.1 20.5 Sobel (πŸ’πŸ‘ Γ— πŸ’πŸ‘) 169.1 9.2 Sobel (πŸ•πŸ“ Γ—64) 2173.1 9.7 Sobel (πŸπŸ‘πŸ— Γ— πŸπŸ‘πŸ—)

  • 9.4

Gaussian blur (πŸ’πŸ‘ Γ— πŸ’πŸ‘) 160.1 10.2 Gaussian blur (πŸ•πŸ“ Γ— πŸ•πŸ“) 2010.9 9.5 Gaussian blur (πŸπŸ‘πŸ— Γ— πŸπŸ‘πŸ—)

  • 9.4

Deriche (πŸπŸ• Γ—16)

  • 6.5

34

slide-43
SLIDE 43

Experimental Results: Model Validity

Algorithm Simulation (dB) Our Tool (dB) Error (%) IIR8

  • 17.80
  • 17.84
  • 0.2

Sobel 11.62 12.04 3.6 Gauss 3.78 3.78 0.1 Deriche

  • 18.01
  • 18.06
  • 2.78

35

slide-44
SLIDE 44

Conclusion

  • 1. Extraction of a compact program representation

(generalization of SFGs).

  • 2. Reformulation of analytical techniques on this

representation.

  • 3. Wider applicability for analytical accuracy

analysis

36

slide-45
SLIDE 45

Open Issues

  • Extension to non linear, non time-invariant filters
  • Extensions exist for 1D SFGs
  • Expected to be easily applicable to our model
  • Regular, but non affine programs
  • Example: FFT
  • Highly correlated Inputs

37