FFTs Overview EECS 360 Notes Methods descriptions Hardware - - PowerPoint PPT Presentation

β–Ά
ffts
SMART_READER_LITE
LIVE PREVIEW

FFTs Overview EECS 360 Notes Methods descriptions Hardware - - PowerPoint PPT Presentation

FFTs Overview EECS 360 Notes Methods descriptions Hardware Implementations Direct Implementation Goertzel Re-indexing Chirp-z Rader Fourier Methods Time Domain Frequency Domain Frequency Time Domain Transfer


slide-1
SLIDE 1

FFTs

  • Overview
  • EECS 360 Notes
  • Methods descriptions
  • Hardware Implementations
  • Direct Implementation
  • Goertzel
  • Re-indexing
  • Chirp-z
  • Rader
slide-2
SLIDE 2

Fourier Methods

Time Domain (continuous/discrete ) Time Domain Periodicity Transform Method (Tables) Frequency Domain (continuous/discrete ) Frequency Domain Periodicity Transfer Function s or z translation Continuous (t) Periodic (T=1/Ξ”f)

CTFS (6.1, 6.2) 𝑦 𝑒 = σ𝑙=βˆ’βˆž

∞

𝑑𝑦 𝑙 π‘“π‘˜2𝜌

Ξ€ 𝑙 π‘ˆ 𝑒

𝑑𝑦 𝑙 =

1 π‘ˆ Χ¬ π‘ˆ 𝑦 𝑒 π‘“βˆ’π‘˜2𝜌 Ξ€ 𝑙 π‘ˆ 𝑒𝑒𝑒

Discrete (k, Ξ”f=1/T, f=kβˆ™Ξ”f) Aperiodic s = jβˆ™2Ο€βˆ™k/T or jβˆ™2Ο€βˆ™kβˆ™Ξ”f Continuous (t) Aperiodic

CTFT (6.3-6.6) 𝑦 𝑒 = Χ¬

βˆ’βˆž ∞ π‘Œ 𝑔 π‘“π‘˜2πœŒπ‘”π‘’π‘’π‘”

π‘Œ 𝑔 = Χ¬

βˆ’βˆž ∞ 𝑦 𝑒 π‘“βˆ’π‘˜2πœŒπ‘”π‘’π‘’π‘’

Continuous (f) Aperiodic s = jβˆ™2Ο€βˆ™f Discrete (n, Ξ”t=1/BW, t=nβˆ™Ξ”t) Periodic (N, T=Nβˆ™Ξ”t)

DTFS 𝑦 π‘œ = σ𝑙=0

π‘‚βˆ’1 𝑑𝑦 𝑙 π‘“π‘˜2πœŒπ‘™π‘œ/𝑂

𝑑𝑦 𝑙 =

1 𝑂 Οƒπ‘œ=0 π‘‚βˆ’1 𝑦 π‘œ π‘“βˆ’π‘˜2πœŒπ‘™π‘œ/𝑂

Discrete (k, Ξ”f=1/T, f=kβˆ™Ξ”f) Periodic (N, BW=Nβˆ™Ξ”f) z = ejβˆ™2Ο€βˆ™k/T or ejβˆ™2Ο€βˆ™kβˆ™Ξ”f Discrete (n, t=nβˆ™Ξ”t) Aperiodic

DTFT 𝑦 π‘œ = Χ¬

𝐢𝑋 π‘Œ 𝑔 π‘“π‘˜2πœŒπ‘” π‘œβˆ™βˆ†π‘’ 𝑒𝑔

π‘Œ 𝑔 =

1 𝐢𝑋 Οƒπ‘œ=βˆ’βˆž ∞

𝑦 π‘œ π‘“βˆ’π‘˜2πœŒπ‘” π‘œβˆ™βˆ†π‘’

Continuous (f) Periodic (BW=1/Ξ”t) z = ejβˆ™2Ο€βˆ™f

*unless noted otherwise, Ξ”t is assumed to be 1.

slide-3
SLIDE 3

Fourier Methods: DTFS variation (The DFT or FFT)

Time Domain (continuous/discrete ) Time Domain Periodicity Transform Method (Tables) Frequency Domain (continuous/discrete ) Frequency Domain Periodicity Transfer Function s or z translation Discrete (n, Ξ”t=1/BW, t=nβˆ™Ξ”t) Periodic (N, T=Nβˆ™Ξ”t)

DTFS 𝑦 π‘œ = σ𝑙=0

π‘‚βˆ’1 𝑑𝑦 𝑙 π‘“π‘˜2πœŒπ‘™π‘œ/𝑂

𝑑𝑦 𝑙 =

1 𝑂 Οƒπ‘œ=0 π‘‚βˆ’1 𝑦 π‘œ π‘“βˆ’π‘˜2πœŒπ‘™π‘œ/𝑂

Discrete (k, Ξ”f=1/T, f=kβˆ™Ξ”f) Periodic (N, BW=Nβˆ™Ξ”f) z = ejβˆ™2Ο€βˆ™k/T or ejβˆ™2Ο€βˆ™kβˆ™Ξ”f Discrete (n, Ξ”t=1/BW, t=nβˆ™Ξ”t) Periodic (N, T=Nβˆ™Ξ”t)

DFT (MATLAB: FFT and IFFT) IFFT: 𝑦 π‘œ =

1 𝑂 σ𝑙=0 π‘‚βˆ’1 π‘Œ 𝑙 π‘“π‘˜2πœŒπ‘™π‘œ/𝑂

FFT: π‘Œ 𝑙 = Οƒπ‘œ=0

π‘‚βˆ’1 𝑦 π‘œ π‘“βˆ’π‘˜2πœŒπ‘™π‘œ/𝑂

Discrete (k, Ξ”f=1/T, f=kβˆ™Ξ”f) Periodic (N, BW=Nβˆ™Ξ”f) z = ejβˆ™2Ο€βˆ™k/T or ejβˆ™2Ο€βˆ™kβˆ™Ξ”f

*unless noted otherwise, Ξ”t is assumed to be 1.

slide-4
SLIDE 4

DFT equation

DFT (MATLAB: FFT and IFFT) IFFT: 𝑦 π‘œ =

1 𝑂 σ𝑙=0 π‘‚βˆ’1 π‘Œ 𝑙 π‘“π‘˜2πœŒπ‘™π‘œ/𝑂

FFT: π‘Œ 𝑙 = Οƒπ‘œ=0

π‘‚βˆ’1 𝑦 π‘œ π‘“βˆ’π‘˜2πœŒπ‘™π‘œ/𝑂

slide-5
SLIDE 5

Goertzel Algorithm

FFT: π‘Œ 𝑙 = Οƒπ‘œ=0

π‘‚βˆ’1 𝑦 π‘œ π‘“βˆ’π‘˜2πœŒπ‘™π‘œ/𝑂

  • Expand the sum, WN = e-j2Ο€/N

X[k] = WN

0kx[0] + WN 1kx[1] + WN 2kx[2] + ... + WN (N-2)kx[N-2] + WN (N-1)kx[N-1]

X[k] = (WN

  • Nk)(WN

0kx[0] + WN 1kx[1] + WN 2kx[2] + ... + WN (N-2)kx[N-2] + WN (N-1)kx[N-1])

X[k] = (WN

  • Nkx[0] + WN
  • (N-1)kx[1] + WN
  • (N-2)kx[2] + ... + WN
  • (2)kx[N-2] + WN
  • (1)kx[N-1])

X[k] = (WN

  • (N-1)kx[0] + WN
  • (N-2)kx[1] + WN
  • (N-3)kx[2] + ... + WN
  • (1)kx[N-2] + x[N-1])WN
  • (1)k

X[k] = (...((WN

  • 2kx[0] + WN
  • 1kx[1] + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

X[k] = ((...(((x[0])WN

  • 1k + x[1])WN
  • 1k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k
  • Integrator multiplied by WN
  • k every

iteration.

slide-6
SLIDE 6

DFT equation

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k
slide-7
SLIDE 7

Remember the Integrator Filter

  • Sample Domain Equation
  • 1st order IIR filter with a0 = 1;

y[n] = x[n] + y[n-1]

  • Z domain

H(z) = 1/(1-z-1)

  • Pole at z = 1 (Critically Stable)

z-1

slide-8
SLIDE 8

DFT equation

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

z-1

WN

  • k
slide-9
SLIDE 9

DFT equation

X[k] = ((...(((x[0])WN

  • k + x[1])WN
  • k + x[2])WN
  • k + ... + x[N-2])WN
  • k + x[N-1])WN
  • k

z-1

WN

  • k

Adders: 1+2 = 3. Multipliers: 4.

n-counter

wrap rst

slide-10
SLIDE 10

8-Point DFT

z-1

0.7071+j0.7071

z-1 z-1

1

z-1

  • 1
  • 0.7071+j0.7071

X[0] X[4] X[1],conj(X[7]) X[3],conj(X[5])

z-1

j1 X[2],conj(X[6]) Adders: 11. Multipliers: 14. Delays: 5.

slide-11
SLIDE 11

N-Point DFT (even)

z-1

WN

1

z-1 z-1

1

z-1

  • 1

WN

N/2-1

X[0] X[N/2] X[1],conj(X[7]) X[3],conj(X[5])

z-1

WN

2

X[2],conj(X[6]) In Parallel Adders: 2+(N/2-1)*3. Multipliers: 2+(N/2-1)*4. Registers: N. Latency: N.

slide-12
SLIDE 12

N-Point Complex DFT (even)

z-1

WN

0: (from ROM)

X[0] In Parallel Adders: N*4. Multipliers: N*4. Registers: N. Latency: N.

z-1

WN

1

X[1]

z-1

WN

N-1

X[N-1] ...

n-counter

rst rst rst

Data In Memory Data Out Memory

??? ena ???

slide-13
SLIDE 13

N-Point DFT (even)

z-1

In Series Adders: 3. Multipliers: 4. Registers: 2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

CORDIC

  • r ROM

k-counter n-counter Data In Memory concat I&Q Dual Port Data Out Memory

WN

  • k

rst done

slide-14
SLIDE 14

N-Point complex DFT (even)

z-1

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms

CORDIC

  • r Cos

Sin ROM k-counter n-counter Concat I&Q Data In Memory concat I&Q Data Out Memory

WN

  • k

done rst ena addr

slide-15
SLIDE 15

Trade Offs

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms In Parallel Adders: N*4. Multipliers: N*4. Registers: N. Latency: N. Direct Trade: 1024x Resources, 1024x Faster.

slide-16
SLIDE 16

N-Point complex DFT (even)

z-1

Partially Parallel Adders: 4*2. Multipliers: 4*2. Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

CORDIC

  • r Cos

Sin ROM k-counter n-counter Data In Memory concat I&Q Data Out Memory

WN

  • k

done rst ena addr

z-1 concat I&Q Data Out Memory

rst ena addr WN

  • (k+1)
slide-17
SLIDE 17

Trade Offs

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms In Parallel Adders: N*4. Multipliers: N*4. Registers: N. Latency: N. Direct Trade: 1024x Resources, 1024x Faster. Partially Parallel Adders: 4*2. Multipliers: 4*2. Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms

What level of parallelization should we use? Depends on:

  • 1. # of resources.
  • 2. Types of resources (memory access is typically in a serial

fashion) (??? Problem above)

slide-18
SLIDE 18

Memory Resources (Series 7)

  • Dual Port 36 Kb.
  • Can’t access more than 2

address values per cycle.

  • 2x Single Port 18 Kb.
  • Smallest Memory Segment.
  • Number Formats (single addr).
  • Concatinated Real and Imag.
  • 18-bit real and 18-bit imag #s.
  • Results in 2x 512 Complex Values.
  • For N-Value DFT.
  • Parallelization by N/512.
slide-19
SLIDE 19

1024-Point 18r:18i complex DFT

z-1 k-counter n-counter

WN

  • k

done rst wea addr

z-1

rst web addr+1 WN

  • (k+1)

Partially Parallel Adders: 4*2. Multipliers: 4*2. Registers: 2*2. Latency: N*N/2. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms WN

  • k

WN

  • (k+1)
slide-20
SLIDE 20

2048-Point 18r:18i complex DFT

k-counter 0 to 512 n-counter 0 to 2048

done Partially Parallel Adders: 4*4. Multipliers: 4*4. Registers: 2*4. Latency: N*N/4. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 2048*512/100e6 = 10.24ms WN

  • k

WN

  • (k+1)

WN

  • (k+2)

WN

  • (k+3)
slide-21
SLIDE 21

Implementing the accum and mult.

z-1

In Series Adders: 4. Multipliers: 4. Registers: 2. Latency: N*N. Excludes CORDIC and storage. At 100 MHz, 1024 pt DFT in 1024*1024/100e6 = 10.24ms

CORDIC

  • r Cos

Sin ROM k-counter n-counter Data In Memory concat I&Q Data Out Memory

WN

  • k

done rst ena addr

slide-22
SLIDE 22

Implementing the accum and mult.

real(WN

  • k)

z-1 z-1

imag(WN

  • k)
  • imag(WN
  • k)

real(WN

  • k)
slide-23
SLIDE 23

The DSP48E1

slide-24
SLIDE 24

Complex Multiply with DSP48E1

z-1

We need two of these for real and imaginary parts. 4x DSP Slices

slide-25
SLIDE 25

Implementing the accum and mult.

We need two of these for real and imaginary parts. 2x DSP Slices

slide-26
SLIDE 26

Implementing the accum and mult.

Keeping track of BP. B-input is 18 bits (use BP=17)

slide-27
SLIDE 27

real(WN

  • k)

imag(WN

  • k)
  • imag(WN
  • k)

real(WN

  • k)

Implementation with 6 DSP Slices.

slide-28
SLIDE 28

Implementing the accum and mult.

48-bit accum is a bit excessive. Can be configured as 2x 24-bit adders using inputs (A:B) and C.

slide-29
SLIDE 29

real(WN

  • k)

imag(WN

  • k)
  • imag(WN
  • k)

real(WN

  • k)

Implementation with 5 DSP slices.

slide-30
SLIDE 30

1024-Point 18r:18i complex DFT

k-counter n-counter

done wea addr web addr+1 Partially Parallel 10 DSP Blocks. 3x 36kb Block rams 2 counters Latency: N*N/2. At 100 MHz, 1024 pt DFT in 1024*512/100e6 = 5.12ms WN

  • k

WN

  • (k+1)
slide-31
SLIDE 31

Counter with DSP Block

  • DSP48E1 can be configured as a wide

counter.

  • Internal accumulator feedback with

delay register.

  • Step Size
  • Use 1-bit carryin for count-by-one.
  • Use A:B for variable step size.
  • Limit set with Parameter or C.

z-1

=

slide-32
SLIDE 32

1024-Point 18r:18i complex DFT

WN

  • k

WN

  • (k+1)

count[19:0]=k_count[9:0],n_count[9:0] done n_count[9:0] k_count[9:0] k_count[9:0] 6 DSP Blocks 3 RAMS