[PPT] - FFTs Overview EECS 360 Notes Methods descriptions Hardware PowerPoint Presentation

SLIDE 1

FFTs

Overview
EECS 360 Notes
Methods descriptions
Hardware Implementations
Direct Implementation
Goertzel
Re-indexing
Chirp-z
Rader

SLIDE 2

Fourier Methods

Time Domain (continuous/discrete ) Time Domain Periodicity Transform Method (Tables) Frequency Domain (continuous/discrete ) Frequency Domain Periodicity Transfer Function s or z translation Continuous (t) Periodic (T=1/Δf)

CTFS (6.1, 6.2) 𝑦 𝑢 = σ𝑙=−∞

∞

𝑑𝑦 𝑙 𝑓𝑘2𝜌

Τ 𝑙 𝑈 𝑢

𝑑𝑦 𝑙 =

1 𝑈 ׬ 𝑈 𝑦 𝑢 𝑓−𝑘2𝜌 Τ 𝑙 𝑈 𝑢𝑒𝑢

Discrete (k, Δf=1/T, f=k∙Δf) Aperiodic s = j∙2π∙k/T or j∙2π∙k∙Δf Continuous (t) Aperiodic

CTFT (6.3-6.6) 𝑦 𝑢 = ׬

−∞ ∞ 𝑌 𝑔 𝑓𝑘2𝜌𝑔𝑢𝑒𝑔

𝑌 𝑔 = ׬

−∞ ∞ 𝑦 𝑢 𝑓−𝑘2𝜌𝑔𝑢𝑒𝑢

Continuous (f) Aperiodic s = j∙2π∙f Discrete (n, Δt=1/BW, t=n∙Δt) Periodic (N, T=N∙Δt)

DTFS 𝑦 𝑜 = σ𝑙=0

𝑂−1 𝑑𝑦 𝑙 𝑓𝑘2𝜌𝑙𝑜/𝑂

𝑑𝑦 𝑙 =

1 𝑂 σ𝑜=0 𝑂−1 𝑦 𝑜 𝑓−𝑘2𝜌𝑙𝑜/𝑂

Discrete (k, Δf=1/T, f=k∙Δf) Periodic (N, BW=N∙Δf) z = ej∙2π∙k/T or ej∙2π∙k∙Δf Discrete (n, t=n∙Δt) Aperiodic

DTFT 𝑦 𝑜 = ׬

𝐶𝑋 𝑌 𝑔 𝑓𝑘2𝜌𝑔 𝑜∙∆𝑢 𝑒𝑔

𝑌 𝑔 =

1 𝐶𝑋 σ𝑜=−∞ ∞

𝑦 𝑜 𝑓−𝑘2𝜌𝑔 𝑜∙∆𝑢

Continuous (f) Periodic (BW=1/Δt) z = ej∙2π∙f

*unless noted otherwise, Δt is assumed to be 1.

SLIDE 3

Fourier Methods: DTFS variation (The DFT or FFT)

Time Domain (continuous/discrete ) Time Domain Periodicity Transform Method (Tables) Frequency Domain (continuous/discrete ) Frequency Domain Periodicity Transfer Function s or z translation Discrete (n, Δt=1/BW, t=n∙Δt) Periodic (N, T=N∙Δt)

DTFS 𝑦 𝑜 = σ𝑙=0

𝑂−1 𝑑𝑦 𝑙 𝑓𝑘2𝜌𝑙𝑜/𝑂

𝑑𝑦 𝑙 =

1 𝑂 σ𝑜=0 𝑂−1 𝑦 𝑜 𝑓−𝑘2𝜌𝑙𝑜/𝑂

Discrete (k, Δf=1/T, f=k∙Δf) Periodic (N, BW=N∙Δf) z = ej∙2π∙k/T or ej∙2π∙k∙Δf Discrete (n, Δt=1/BW, t=n∙Δt) Periodic (N, T=N∙Δt)

DFT (MATLAB: FFT and IFFT) IFFT: 𝑦 𝑜 =

1 𝑂 σ𝑙=0 𝑂−1 𝑌 𝑙 𝑓𝑘2𝜌𝑙𝑜/𝑂

FFT: 𝑌 𝑙 = σ𝑜=0

𝑂−1 𝑦 𝑜 𝑓−𝑘2𝜌𝑙𝑜/𝑂

Discrete (k, Δf=1/T, f=k∙Δf) Periodic (N, BW=N∙Δf) z = ej∙2π∙k/T or ej∙2π∙k∙Δf

*unless noted otherwise, Δt is assumed to be 1.

SLIDE 4

DFT equation

DFT (MATLAB: FFT and IFFT) IFFT: 𝑦 𝑜 =

1 𝑂 σ𝑙=0 𝑂−1 𝑌 𝑙 𝑓𝑘2𝜌𝑙𝑜/𝑂

FFT: 𝑌 𝑙 = σ𝑜=0

𝑂−1 𝑦 𝑜 𝑓−𝑘2𝜌𝑙𝑜/𝑂

SLIDE 5

Goertzel Algorithm

FFT: 𝑌 𝑙 = σ𝑜=0

𝑂−1 𝑦 𝑜 𝑓−𝑘2𝜌𝑙𝑜/𝑂

Expand the sum, WN = e-j2π/N

X[k] = WN

0kx[0] + WN 1kx[1] + WN 2kx[2] + ... + WN (N-2)kx[N-2] + WN (N-1)kx[N-1]

X[k] = (WN

Nk)(WN

0kx[0] + WN 1kx[1] + WN 2kx[2] + ... + WN (N-2)kx[N-2] + WN (N-1)kx[N-1])

X[k] = (WN

Nkx[0] + WN
(N-1)kx[1] + WN
(N-2)kx[2] + ... + WN
(2)kx[N-2] + WN
(1)kx[N-1])

X[k] = (WN

(N-1)kx[0] + WN
(N-2)kx[1] + WN
(N-3)kx[2] + ... + WN
(1)kx[N-2] + x[N-1])WN
(1)k

X[k] = (...((WN

2kx[0] + WN
1kx[1] + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

X[k] = ((...(((x[0])WN

1k + x[1])WN
1k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k
Integrator multiplied by WN
k every

iteration.

SLIDE 6

DFT equation

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

SLIDE 7

Remember the Integrator Filter

Sample Domain Equation
1st order IIR filter with a0 = 1;

y[n] = x[n] + y[n-1]

Z domain

H(z) = 1/(1-z-1)

Pole at z = 1 (Critically Stable)

z-1

SLIDE 8

DFT equation

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

z-1

WN

k

SLIDE 9

DFT equation

X[k] = ((...(((x[0])WN

k + x[1])WN
k + x[2])WN
k + ... + x[N-2])WN
k + x[N-1])WN
k

z-1

WN

k

Adders: 1+2 = 3. Multipliers: 4.

n-counter

wrap rst

SLIDE 10

8-Point DFT

z-1

0.7071+j0.7071

z-1 z-1

1

z-1

1
0.7071+j0.7071

X[0] X[4] X[1],conj(X[7]) X[3],conj(X[5])

z-1

j1 X[2],conj(X[6]) Adders: 11. Multipliers: 14. Delays: 5.

SLIDE 11

N-Point DFT (even)

z-1

WN

1

z-1 z-1

1

z-1

1

WN

N/2-1

X[0] X[N/2] X[1],conj(X[7]) X[3],conj(X[5])

z-1

WN

2

X[2],conj(X[6]) In Parallel Adders: 2+(N/2-1)*3. Multipliers: 2+(N/2-1)*4. Registers: N. Latency: N.

SLIDE 12

N-Point Complex DFT (even)

z-1

WN

0: (from ROM)

X[0] In Parallel Adders: N*4. Multipliers: N*4. Registers: N. Latency: N.

z-1

WN

1

X[1]

z-1

WN

N-1

X[N-1] ...

n-counter

rst rst rst

Data In Memory Data Out Memory

??? ena ???

SLIDE 13

N-Point DFT (even)

z-1

In Series Adders: 3. Multipliers: 4. Registers: 2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

CORDIC

r ROM

k-counter n-counter Data In Memory concat I&Q Dual Port Data Out Memory

WN

k

rst done

SLIDE 14

N-Point complex DFT (even)

z-1

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms

CORDIC

r Cos

Sin ROM k-counter n-counter Concat I&Q Data In Memory concat I&Q Data Out Memory

WN

k

done rst ena addr

SLIDE 15

Trade Offs

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms In Parallel Adders: N*4. Multipliers: N*4. Registers: N. Latency: N. Direct Trade: 1024x Resources, 1024x Faster.

SLIDE 16

N-Point complex DFT (even)

z-1

Partially Parallel Adders: 4*2. Multipliers: 4*2. Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

CORDIC

r Cos

Sin ROM k-counter n-counter Data In Memory concat I&Q Data Out Memory

WN

k

done rst ena addr

z-1 concat I&Q Data Out Memory

rst ena addr WN

(k+1)

SLIDE 17

Trade Offs

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms In Parallel Adders: N*4. Multipliers: N*4. Registers: N. Latency: N. Direct Trade: 1024x Resources, 1024x Faster. Partially Parallel Adders: 4*2. Multipliers: 4*2. Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

What level of parallelization should we use? Depends on:

1. # of resources.
2. Types of resources (memory access is typically in a serial

fashion) (??? Problem above)

SLIDE 18

Memory Resources (Series 7)

Dual Port 36 Kb.
Can’t access more than 2

address values per cycle.

2x Single Port 18 Kb.
Smallest Memory Segment.
Number Formats (single addr).
Concatinated Real and Imag.
18-bit real and 18-bit imag #s.
Results in 2x 512 Complex Values.
For N-Value DFT.
Parallelization by N/512.

SLIDE 19

1024-Point 18r:18i complex DFT

z-1 k-counter n-counter

WN

k

done rst wea addr

z-1

rst web addr+1 WN

(k+1)

Partially Parallel Adders: 4*2. Multipliers: 4*2. Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms WN

k

WN

(k+1)

SLIDE 20

2048-Point 18r:18i complex DFT

k-counter 0 to 512 n-counter 0 to 2048

done Partially Parallel Adders: 4*4. Multipliers: 4*4. Registers: 2*4. Latency: N*N/4. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 2048*512/100e6 = 10.24ms WN

k

WN

(k+1)

WN

(k+2)

WN

(k+3)

SLIDE 21

Implementing the accum and mult.

z-1

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms

CORDIC

r Cos

Sin ROM k-counter n-counter Data In Memory concat I&Q Data Out Memory

WN

k

done rst ena addr

SLIDE 22

Implementing the accum and mult.

real(WN

k)

z-1 z-1

imag(WN

k)
imag(WN
k)

real(WN

k)

SLIDE 23

The DSP48E1

SLIDE 24

Complex Multiply with DSP48E1

z-1

We need two of these for real and imaginary parts. 4x DSP Slices

SLIDE 25

Implementing the accum and mult.

We need two of these for real and imaginary parts. 2x DSP Slices

SLIDE 26

Implementing the accum and mult.

Keeping track of BP. B-input is 18 bits (use BP=17)

SLIDE 27

real(WN

k)

imag(WN

k)
imag(WN
k)

real(WN

k)

Implementation with 6 DSP Slices.

SLIDE 28

Implementing the accum and mult.

48-bit accum is a bit excessive. Can be configured as 2x 24-bit adders using inputs (A:B) and C.

SLIDE 29

real(WN

k)

imag(WN

k)
imag(WN
k)

real(WN

k)

Implementation with 5 DSP slices.

SLIDE 30

1024-Point 18r:18i complex DFT

k-counter n-counter

done wea addr web addr+1 Partially Parallel 10 DSP Blocks. 3x 36kb Block rams 2 counters Latency: N*N/2. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms WN

k

WN

(k+1)

SLIDE 31

Counter with DSP Block

DSP48E1 can be configured as a wide

counter.

Internal accumulator feedback with

delay register.

Step Size
Use 1-bit carryin for count-by-one.
Use A:B for variable step size.
Limit set with Parameter or C.

z-1

=

SLIDE 32

1024-Point 18r:18i complex DFT

WN

k

WN

(k+1)

count[19:0]=k_count[9:0],n_count[9:0] done n_count[9:0] k_count[9:0] k_count[9:0] 6 DSP Blocks 3 RAMS