VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital - - PowerPoint PPT Presentation

vlsi digital signal processing systems
SMART_READER_LITE
LIVE PREVIEW

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital - - PowerPoint PPT Presentation

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems Textbook: K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999 Buy Textbook:


slide-1
SLIDE 1

VLSI Digital Signal Processing Systems

Keshab K. Parhi

slide-2
SLIDE 2

2

  • Chap. 2

VLSI Digital Signal Processing Systems

  • Textbook:

– K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999

  • Buy Textbook:

– http://www.bn.com – http://www.amazon.com – http://www.bestbookbuys.com

slide-3
SLIDE 3

3

  • Chap. 2

Chapter 1. Introduction to DSP Systems

  • Introduction (Read Sec. 1.1, 1.3)
  • Non-Terminating Programs Require Real-Time

Operations

  • Applications dictate different speed constraints

(e.g., voice, audio, cable modem, settop box, Gigabit ethernet, 3-D Graphics)

  • Need to design Families of Architectures for

specified algorithm complexity and speed constraints

  • Representations of DSP Algorithms (Sec. 1.4)
slide-4
SLIDE 4

4

  • Chap. 2

Typical DSP Programs

  • Usually highly real-time, design hardware and/or software to meet the

application speed constraint

  • Non-terminating

– Example:

DSP System samples in

  • ut

Algorithms

  • ut

.…

signals T 2T nT 3T

end n x c n x b n x a n y to n for ) 2 ( ) 1 ( ) ( ) ( 1 − ⋅ + − ⋅ + ⋅ = ∞ =

slide-5
SLIDE 5

5

  • Chap. 2

Area-Speed-Power Tradeoffs

  • 3-Dimensional Optimization (Area, Speed, Power)
  • Achieve Required Speed, Area-Power Tradeoffs
  • Power Consumption
  • Latency reduction Techniques => Increase in speed or

power reduction through lower supply voltage operation

  • Since the capacitance of the multiplier is usually dominant,

reduction of the number of multiplications is important (this is possible through strength reduction) f V C P ⋅ ⋅ =

2

slide-6
SLIDE 6

6

  • Chap. 2

Representation Methods of DSP systems

Example: y(n)=a*x(n)+b*x(n-1)+c*x(n-2)

  • Graphical Representation Method 1: Block Diagram

– Consists of functional blocks connected with directed edges, which represent data flow from its input block to its output block

D D

a b c x(n) y(n) x(n-2) x(n-1)

slide-7
SLIDE 7

7

  • Chap. 2
  • Graphical Representation Method 2: Signal-Flow Graph

– SFG: a collection of nodes and directed edges – Nodes: represent computations and/or task, sum all incoming signals – Directed edge (j, k): denotes a linear transformation from the input signal at node j to the output signal at node k – Linear SFGs can be transformed into different forms without changing the system functions. For example, Flow graph reversal or transposition is

  • ne of these transformations (Note: only applicable to single-input-single-
  • utput systems)

– Usually used for linear time-invariant DSP systems representation

x(n) y(n)

a b c

1 −

z

1 −

z

slide-8
SLIDE 8

8

  • Chap. 2
  • Graphical Representation Method 3: Data-Flow Graph

– DFG: nodes represent computations (or functions or subtasks), while the directed edges represent data paths (data communications between nodes), each edge has a nonnegative number of delays associated with it. – DFG captures the data-driven property of DSP algorithm: any node can perform its computation whenever all its input data are available. – Each edge describes a precedence constraint between two nodes in DFG:

  • Intra-iteration precedence constraint: if the edge has zero delays
  • Inter-iteration precedence constraint: if the edge has one or more delays
  • DFGs and Block Diagrams can be used to describe both linear single-rate and

nonlinear multi-rate DSP systems

  • Fine-Grain DFG

x(n) y(n) b c a D D

slide-9
SLIDE 9

9

  • Chap. 2

Examples of DFG

– Nodes are complex blocks (in Coarse-Grain DFGs) – Nodes can describe expanders/decimators in Multi-Rate DFGs FFT IFFT Adaptive filtering

2 ↓

N samples N/2 samples

2 ↑

N/2 samples N samples ≡

2 1 1 2

Decimator Expander

slide-10
SLIDE 10

10

  • Chap. 2

Chapter 2: Iteration Bound

  • Introduction
  • Loop Bound

– Important Definitions and Examples

  • Iteration Bound

– Important Definitions and Examples – Techniques to Compute Iteration Bound

slide-11
SLIDE 11

11

  • Chap. 2

Introduction

  • Iteration: execution of all computations (or functions) in an algorithm
  • nce

– Example 1:

  • For 1 iteration, computations are:
  • Iteration period: the time required for execution of one iteration of

algorithm (same as sample period)

– Example: A

1 2

B C

2 3 2 1

A B C 2 times 2 times 3 times 1 −

Z

y(n-1) x(n) a + +

1

1 1 ) ( . . ) ( ) 1 ( ) (

⋅ − = + − ⋅ = z a z H e i n x n y a n y

c b a

slide-12
SLIDE 12

12

  • Chap. 2

Introduction (cont’d)

– Assume the execution times of multiplier and adder are Tm & Ta, then the iteration period for this example is Tm+ Ta (assume 10ns, see the red-color box). so for the signal, the sample period (Ts) must satisfy:

  • Definitions:

– Iteration rate: the number of iterations executed per second – Sample rate: the number of samples processed in the DSP system per second (also called throughput)

a m s

T T T + ≥

slide-13
SLIDE 13

13

  • Chap. 2

Iteration Bound

  • Definitions:

– Loop: a directed path that begins and ends at the same node – Loop bound of the j-th loop: defined as Tj/Wj, where Tj is the loop computation time & Wj is the number of delays in the loop – Example 1: a→ b→ c→ a is a loop (see the same example in Note 2, PP2), its loop bound: – Example 2: y(n) = a*y(n-2) + x(n), we have:

2D

y(n-2) x(n) a + +

ns T T T

a m loopbound

5 2 = + =

ns T T T

a m loopbound

10 = + =

slide-14
SLIDE 14

14

  • Chap. 2

Iteration Bound (cont’d)

– Example 3: compute the loop_bounds of the following loops:

  • Definitions (Important):

– Critical Loop: the loop with the maximum loop bound – Iteration bound of a DSP program: the loop bound of the critical loop, it is defined as – Example 4: compute the iteration bound of the example 3:

ns T ns T ns T

L L L

5 . 7 2 ) 3 2 10 ( 5 2 ) 5 3 2 ( 12 1 ) 2 10 (

3 2 1

= + + = = + + = = + =

10ns A

D B C

2ns 3ns 5ns

L1: D L3: 2D L2: 2D

          =

∈ ∞ j j L j

W T T max where L is the set of loops in the DSP system, Tj is the computation time of the loop j and Wj is the number of delays in the loop j

{ }

5 . 7 , 5 , 12 max

L l

T

∈ ∞ =

slide-15
SLIDE 15

15

  • Chap. 2

Iteration bound (cont’d)

  • If no delay element in the loop, then

– Delay-free loops are non-computable, see the example:

  • Non-causal systems cannot be implemented
  • Speed of the DSP system: depends on the “critical path comp. time”

– Paths: do not contain delay elements (4 possible path locations)

  • (1) input node →delay element
  • (2) delay element’s output → output node
  • (3) input node → output node
  • (4) delay element → delay element

– Critical path of a DFG: the path with the longest computation time among all paths that contain zero delays – Clock period is lower bounded by the critical path computation time

∞ = =

∞ L

T T

A B

A B Z

      ⋅ = − ⋅ =

causal Z B A causal non Z A B

1

slide-16
SLIDE 16

16

  • Chap. 2

Iteration Bound (cont’d)

– Example: Assume Tm = 10ns, Ta = 4ns, then the length of the critical path is 26ns (see the red lines in the following figure) – Critical path: the lower bound on clock period – To achieve high-speed, the length of the critical path can be reduced by pipelining and parallel processing (Chapter 3).

D D D

D

a b c d e x(n) y(n)

26 26 22 18 14

slide-17
SLIDE 17

17

  • Chap. 2

Precedence Const raint s

  • Each edge of DFG def ines a precedence const raint
  • Precedence Const raint s:

– I nt ra-it erat ion ⇒ edges wit h no delay element s – I nt er-it erat ion ⇒ edges wit h non-zero delay element s

  • Acyclic Precedence Graph(APG) : Graph obt ained

by delet ing all edges wit h delay element s.

slide-18
SLIDE 18

18

  • Chap. 2

y(n)=ay(n-1) + x(n) A B

int er-it erat ion precedence const raint A1B2 A2 B3

D + ×a

int ra-it erat ion precedence const raint B1A1=> B2A2=> B3A3=> … ..

A B C D D D 2D 10 3 6 21

13 19 10

Crit ical P at h = 27ut Tclk > = 27ut

A

B C D AP G of t his graph is

x(n)

slide-19
SLIDE 19

19

  • Chap. 2
  • Achieving Loop Bound

A B D (10) (3) Tloop= 13ut A1 B1=> A2 B2=> A3… . B C D (3) (6) (21) D 2D B1 => C

2 D2 =>

B4 => C5 D5 => B7 B2 => C3 D3 => B5 => C6 D6 => B8 C

1 D1 =>

B3 => C4 D4 => B6 Loop cont ains t hree delay element s loop bound = 30 / 3 =10ut = (loop comput at ion t ime) / (# of delay element s)

slide-20
SLIDE 20

20

  • Chap. 2
  • Algor it hms t o comput e it er at ion bound

– Longest Pat h Mat rix (LPM) – Minimum Cycle Mean (MCM)

slide-21
SLIDE 21

21

  • Chap. 2
  • Longest P

at h Mat rix Algorit hm

Let ‘d’ be t he number of delays in t he DFG.

A series of mat rices L(m), m = 1, 2, … , d, are const ruct ed such t hat li,j

(m) is t he longest comput at ion t ime of all pat hs

f rom delay element di t o dj t hat passes t hrough exact ly (m-1) delays. I f such a pat h does not exist li,j

(m) = -1.

The longest pat h bet ween any t wo nodes can be comput ed using eit her Bellman-Ford algorit hm or Floyd- Warshall algorit hm (Appendix A). Usually, L(1)is comput ed using t he DFG. The higher order mat rices are comput ed recursively as f ollows : li,j

(m+1) = max(-1, li,k (1) + lk,j (m)) f or k∈K

where K is t he set of int egers k in t he int erval [1,d] such

t hat neit her li,k

(1) = -1 nor lk,j (m) = -1 holds.

The it erat ion bound is given by, T∞ = max{li,i

(m) /m} , f or i, m ∈ {1, 2, …

, d}

slide-22
SLIDE 22

22

  • Chap. 2
  • Example :

1 2 3 4 5 6 D D D D (1) (1) (1) (2) (2) (2) d1 d2 d3 d4

  • 1
  • 1
  • 1

5

  • 1
  • 1

5

  • 1
  • 1

4

  • 1
  • 1
  • 1

L(1) =

  • 1
  • 1

5

  • 1
  • 1
  • 1

5 5

  • 1

4 5

  • 1
  • 1

4 L(3) =

  • 1

5

  • 1

9

  • 1

5 5 9

  • 1

4 5 8

  • 1

4 5 5

  • 1

9 10 5 5 9 10 4 5 8 9

  • 1

4 5 8 L(4) = L(2) = T∞ = max{4/ 2,4/ 2,5/ 3,5/ 3,5/ 3,8/ 4,8/ 4,5/ 4,5/ 4} = 2.

slide-23
SLIDE 23

23

  • Chap. 2
  • Minimum Cycle Mean :

The cycle mean m(c) of a cycle c is t he average lengt h of t he edges in c, which can be f ound by simply t aking t he sum of t he edge lengt hs and dividing by t he number of edges in t he cycle. Minimum cycle mean is t he min{m(c)} f or all c. The cycle means of a new graph Gd are used t o comput e t he it erat ion bound. G

d is obt ained f r om t he or iginal DFG f or which

it erat ion bound is being comput ed. This is done as f ollows: # of nodes in G

d is equal t o t he # of delay element s in G.

The weight w(i,j ) of t he edge f rom node i t o j in G

d is t he

longest pat h among all pat hs in G f rom delay di t o dj t hat do not pass t hrough any delay element s. The const ruct ion of G

d is t hus t he const ruct ion of mat rix

L(1) in LP M.

The cycle mean of G

d is obt ained by t he usual def init ion

  • f cycle mean and t his gives t he maximum cycle bound of

t he cycles in G t hat cont ain t he delays in c. The maximum cycle mean of G

d is t he max cycle bound of

all cycles in G, which is t he it erat ion bound.

slide-24
SLIDE 24

24

  • Chap. 2
  • To comput e t he maximum cycle mean of G

d t he MCM of G d’

is comput ed and mult iplied wit h –1. G

d’ is similar t o G d

except t hat it s weight s negat ive of t hat of G

d.

Algorit hm f or MCM :

  • Const ruct a series of d+1 vect ors, f (m), m=0, 1, …

, d, which are each of dimension d×1.

  • An arbit rary ref erence node s is chosen and f (0)is f ormed

by set t ing f (0)(s)=0 and remaining ent ries of f (0) t o ∞.

  • The remaining vect ors f (m), m = 1, 2, …

, d are recursively comput ed according t o f (m)(j ) = min(f (m-1)(i) + w’(i,j )) f or i ∈ I where, I is t he set of nodes in G

d’ such t hat t here exist s

an edge f rom node i t o node j .

  • The it erat ion bound is given by :

T∞ = -mini ∈{1,2,…

,d} (max m ∈ {0,1, … , d-1}((f (d)(i) - f (m)(i))/ (d-m)))

slide-25
SLIDE 25

25

  • Chap. 2
  • Example :

1 4 3 2 4 5 5 1 4 3 2

  • 4
  • 5
  • 5

G

d t o G d’

∞ ∞ ∞-∞ ∞-∞

∞-∞ i=4

  • 2
  • 2

i=3

  • 1
  • 1
  • 5/ 3

i=2

  • 2
  • 3
  • 2
  • 2

i=1 maxm ∈ {0,1, …

, d-1}((f (d)(i) - f (m)(i))/ (d-m))

m=3 m=2 m=1 m=0 T∞ = -min{-2, -1, -2, ∞} = 2