VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital - - PowerPoint PPT Presentation
VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital - - PowerPoint PPT Presentation
VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems Textbook: K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999 Buy Textbook:
2
- Chap. 2
VLSI Digital Signal Processing Systems
- Textbook:
– K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley, 1999
- Buy Textbook:
– http://www.bn.com – http://www.amazon.com – http://www.bestbookbuys.com
3
- Chap. 2
Chapter 1. Introduction to DSP Systems
- Introduction (Read Sec. 1.1, 1.3)
- Non-Terminating Programs Require Real-Time
Operations
- Applications dictate different speed constraints
(e.g., voice, audio, cable modem, settop box, Gigabit ethernet, 3-D Graphics)
- Need to design Families of Architectures for
specified algorithm complexity and speed constraints
- Representations of DSP Algorithms (Sec. 1.4)
4
- Chap. 2
Typical DSP Programs
- Usually highly real-time, design hardware and/or software to meet the
application speed constraint
- Non-terminating
– Example:
DSP System samples in
- ut
Algorithms
- ut
.…
signals T 2T nT 3T
end n x c n x b n x a n y to n for ) 2 ( ) 1 ( ) ( ) ( 1 − ⋅ + − ⋅ + ⋅ = ∞ =
5
- Chap. 2
Area-Speed-Power Tradeoffs
- 3-Dimensional Optimization (Area, Speed, Power)
- Achieve Required Speed, Area-Power Tradeoffs
- Power Consumption
- Latency reduction Techniques => Increase in speed or
power reduction through lower supply voltage operation
- Since the capacitance of the multiplier is usually dominant,
reduction of the number of multiplications is important (this is possible through strength reduction) f V C P ⋅ ⋅ =
2
6
- Chap. 2
Representation Methods of DSP systems
Example: y(n)=a*x(n)+b*x(n-1)+c*x(n-2)
- Graphical Representation Method 1: Block Diagram
– Consists of functional blocks connected with directed edges, which represent data flow from its input block to its output block
D D
a b c x(n) y(n) x(n-2) x(n-1)
7
- Chap. 2
- Graphical Representation Method 2: Signal-Flow Graph
– SFG: a collection of nodes and directed edges – Nodes: represent computations and/or task, sum all incoming signals – Directed edge (j, k): denotes a linear transformation from the input signal at node j to the output signal at node k – Linear SFGs can be transformed into different forms without changing the system functions. For example, Flow graph reversal or transposition is
- ne of these transformations (Note: only applicable to single-input-single-
- utput systems)
– Usually used for linear time-invariant DSP systems representation
x(n) y(n)
a b c
1 −
z
1 −
z
8
- Chap. 2
- Graphical Representation Method 3: Data-Flow Graph
– DFG: nodes represent computations (or functions or subtasks), while the directed edges represent data paths (data communications between nodes), each edge has a nonnegative number of delays associated with it. – DFG captures the data-driven property of DSP algorithm: any node can perform its computation whenever all its input data are available. – Each edge describes a precedence constraint between two nodes in DFG:
- Intra-iteration precedence constraint: if the edge has zero delays
- Inter-iteration precedence constraint: if the edge has one or more delays
- DFGs and Block Diagrams can be used to describe both linear single-rate and
nonlinear multi-rate DSP systems
- Fine-Grain DFG
x(n) y(n) b c a D D
9
- Chap. 2
Examples of DFG
– Nodes are complex blocks (in Coarse-Grain DFGs) – Nodes can describe expanders/decimators in Multi-Rate DFGs FFT IFFT Adaptive filtering
2 ↓
N samples N/2 samples
2 ↑
N/2 samples N samples ≡
≡
2 1 1 2
Decimator Expander
10
- Chap. 2
Chapter 2: Iteration Bound
- Introduction
- Loop Bound
– Important Definitions and Examples
- Iteration Bound
– Important Definitions and Examples – Techniques to Compute Iteration Bound
11
- Chap. 2
Introduction
- Iteration: execution of all computations (or functions) in an algorithm
- nce
– Example 1:
- For 1 iteration, computations are:
- Iteration period: the time required for execution of one iteration of
algorithm (same as sample period)
– Example: A
1 2
B C
2 3 2 1
A B C 2 times 2 times 3 times 1 −
Z
y(n-1) x(n) a + +
1
1 1 ) ( . . ) ( ) 1 ( ) (
−
⋅ − = + − ⋅ = z a z H e i n x n y a n y
c b a
12
- Chap. 2
Introduction (cont’d)
– Assume the execution times of multiplier and adder are Tm & Ta, then the iteration period for this example is Tm+ Ta (assume 10ns, see the red-color box). so for the signal, the sample period (Ts) must satisfy:
- Definitions:
– Iteration rate: the number of iterations executed per second – Sample rate: the number of samples processed in the DSP system per second (also called throughput)
a m s
T T T + ≥
13
- Chap. 2
Iteration Bound
- Definitions:
– Loop: a directed path that begins and ends at the same node – Loop bound of the j-th loop: defined as Tj/Wj, where Tj is the loop computation time & Wj is the number of delays in the loop – Example 1: a→ b→ c→ a is a loop (see the same example in Note 2, PP2), its loop bound: – Example 2: y(n) = a*y(n-2) + x(n), we have:
2D
y(n-2) x(n) a + +
ns T T T
a m loopbound
5 2 = + =
ns T T T
a m loopbound
10 = + =
14
- Chap. 2
Iteration Bound (cont’d)
– Example 3: compute the loop_bounds of the following loops:
- Definitions (Important):
– Critical Loop: the loop with the maximum loop bound – Iteration bound of a DSP program: the loop bound of the critical loop, it is defined as – Example 4: compute the iteration bound of the example 3:
ns T ns T ns T
L L L
5 . 7 2 ) 3 2 10 ( 5 2 ) 5 3 2 ( 12 1 ) 2 10 (
3 2 1
= + + = = + + = = + =
10ns A
D B C
2ns 3ns 5ns
L1: D L3: 2D L2: 2D
=
∈ ∞ j j L j
W T T max where L is the set of loops in the DSP system, Tj is the computation time of the loop j and Wj is the number of delays in the loop j
{ }
5 . 7 , 5 , 12 max
L l
T
∈ ∞ =
15
- Chap. 2
Iteration bound (cont’d)
- If no delay element in the loop, then
– Delay-free loops are non-computable, see the example:
- Non-causal systems cannot be implemented
- Speed of the DSP system: depends on the “critical path comp. time”
– Paths: do not contain delay elements (4 possible path locations)
- (1) input node →delay element
- (2) delay element’s output → output node
- (3) input node → output node
- (4) delay element → delay element
– Critical path of a DFG: the path with the longest computation time among all paths that contain zero delays – Clock period is lower bounded by the critical path computation time
∞ = =
∞ L
T T
A B
A B Z
⋅ = − ⋅ =
−
causal Z B A causal non Z A B
1
16
- Chap. 2
Iteration Bound (cont’d)
– Example: Assume Tm = 10ns, Ta = 4ns, then the length of the critical path is 26ns (see the red lines in the following figure) – Critical path: the lower bound on clock period – To achieve high-speed, the length of the critical path can be reduced by pipelining and parallel processing (Chapter 3).
D D D
D
a b c d e x(n) y(n)
26 26 22 18 14
17
- Chap. 2
Precedence Const raint s
- Each edge of DFG def ines a precedence const raint
- Precedence Const raint s:
– I nt ra-it erat ion ⇒ edges wit h no delay element s – I nt er-it erat ion ⇒ edges wit h non-zero delay element s
- Acyclic Precedence Graph(APG) : Graph obt ained
by delet ing all edges wit h delay element s.
18
- Chap. 2
y(n)=ay(n-1) + x(n) A B
int er-it erat ion precedence const raint A1B2 A2 B3
D + ×a
int ra-it erat ion precedence const raint B1A1=> B2A2=> B3A3=> … ..
A B C D D D 2D 10 3 6 21
13 19 10
Crit ical P at h = 27ut Tclk > = 27ut
A
B C D AP G of t his graph is
x(n)
19
- Chap. 2
- Achieving Loop Bound
A B D (10) (3) Tloop= 13ut A1 B1=> A2 B2=> A3… . B C D (3) (6) (21) D 2D B1 => C
2 D2 =>
B4 => C5 D5 => B7 B2 => C3 D3 => B5 => C6 D6 => B8 C
1 D1 =>
B3 => C4 D4 => B6 Loop cont ains t hree delay element s loop bound = 30 / 3 =10ut = (loop comput at ion t ime) / (# of delay element s)
20
- Chap. 2
- Algor it hms t o comput e it er at ion bound
– Longest Pat h Mat rix (LPM) – Minimum Cycle Mean (MCM)
21
- Chap. 2
- Longest P
at h Mat rix Algorit hm
Let ‘d’ be t he number of delays in t he DFG.
A series of mat rices L(m), m = 1, 2, … , d, are const ruct ed such t hat li,j
(m) is t he longest comput at ion t ime of all pat hs
f rom delay element di t o dj t hat passes t hrough exact ly (m-1) delays. I f such a pat h does not exist li,j
(m) = -1.
The longest pat h bet ween any t wo nodes can be comput ed using eit her Bellman-Ford algorit hm or Floyd- Warshall algorit hm (Appendix A). Usually, L(1)is comput ed using t he DFG. The higher order mat rices are comput ed recursively as f ollows : li,j
(m+1) = max(-1, li,k (1) + lk,j (m)) f or k∈K
where K is t he set of int egers k in t he int erval [1,d] such
t hat neit her li,k
(1) = -1 nor lk,j (m) = -1 holds.
The it erat ion bound is given by, T∞ = max{li,i
(m) /m} , f or i, m ∈ {1, 2, …
, d}
22
- Chap. 2
- Example :
1 2 3 4 5 6 D D D D (1) (1) (1) (2) (2) (2) d1 d2 d3 d4
- 1
- 1
- 1
5
- 1
- 1
5
- 1
- 1
4
- 1
- 1
- 1
L(1) =
- 1
- 1
5
- 1
- 1
- 1
5 5
- 1
4 5
- 1
- 1
4 L(3) =
- 1
5
- 1
9
- 1
5 5 9
- 1
4 5 8
- 1
4 5 5
- 1
9 10 5 5 9 10 4 5 8 9
- 1
4 5 8 L(4) = L(2) = T∞ = max{4/ 2,4/ 2,5/ 3,5/ 3,5/ 3,8/ 4,8/ 4,5/ 4,5/ 4} = 2.
23
- Chap. 2
- Minimum Cycle Mean :
The cycle mean m(c) of a cycle c is t he average lengt h of t he edges in c, which can be f ound by simply t aking t he sum of t he edge lengt hs and dividing by t he number of edges in t he cycle. Minimum cycle mean is t he min{m(c)} f or all c. The cycle means of a new graph Gd are used t o comput e t he it erat ion bound. G
d is obt ained f r om t he or iginal DFG f or which
it erat ion bound is being comput ed. This is done as f ollows: # of nodes in G
d is equal t o t he # of delay element s in G.
The weight w(i,j ) of t he edge f rom node i t o j in G
d is t he
longest pat h among all pat hs in G f rom delay di t o dj t hat do not pass t hrough any delay element s. The const ruct ion of G
d is t hus t he const ruct ion of mat rix
L(1) in LP M.
The cycle mean of G
d is obt ained by t he usual def init ion
- f cycle mean and t his gives t he maximum cycle bound of
t he cycles in G t hat cont ain t he delays in c. The maximum cycle mean of G
d is t he max cycle bound of
all cycles in G, which is t he it erat ion bound.
24
- Chap. 2
- To comput e t he maximum cycle mean of G
d t he MCM of G d’
is comput ed and mult iplied wit h –1. G
d’ is similar t o G d
except t hat it s weight s negat ive of t hat of G
d.
Algorit hm f or MCM :
- Const ruct a series of d+1 vect ors, f (m), m=0, 1, …
, d, which are each of dimension d×1.
- An arbit rary ref erence node s is chosen and f (0)is f ormed
by set t ing f (0)(s)=0 and remaining ent ries of f (0) t o ∞.
- The remaining vect ors f (m), m = 1, 2, …
, d are recursively comput ed according t o f (m)(j ) = min(f (m-1)(i) + w’(i,j )) f or i ∈ I where, I is t he set of nodes in G
d’ such t hat t here exist s
an edge f rom node i t o node j .
- The it erat ion bound is given by :
T∞ = -mini ∈{1,2,…
,d} (max m ∈ {0,1, … , d-1}((f (d)(i) - f (m)(i))/ (d-m)))
25
- Chap. 2
- Example :
1 4 3 2 4 5 5 1 4 3 2
- 4
- 5
- 5
G
d t o G d’
∞ ∞ ∞-∞ ∞-∞
∞-∞ i=4
- 2
- ∞
- 2
- ∞
- ∞
i=3
- 1
- 1
- ∞
- 5/ 3
- ∞
i=2
- 2
- 3
- 2
- ∞
- 2
i=1 maxm ∈ {0,1, …
, d-1}((f (d)(i) - f (m)(i))/ (d-m))
m=3 m=2 m=1 m=0 T∞ = -min{-2, -1, -2, ∞} = 2