A Systolic FFT Architecture for Real Time FPGA Systems Preston - PowerPoint PPT Presentation

A Systolic FFT Architecture for Real Time FPGA Systems Preston Jackson, Cy Chan, Charles Rader, Jonathan Scalera, and Michael Vai HPEC 2004 29 September 2004 This work was sponsored by DARPA ATO under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government . MIT Lincoln Laboratory Systolic Architecture-1 PAJ 9/29/2004

Outline Introduction • Motivation – Evaluation metrics – Parallel architecture • Systolic architecture • Performance summary • Conclusions • MIT Lincoln Laboratory Systolic Architecture-2 PAJ 9/29/2004

Radar Processing Application ADC 1.2 GSPS x 32K ∑ ∗ = − Corr [ m ] x [ n ] y [ n m ] ADC x, y Correlation 1.2 GSPS n y 8K FFT bottleneck Real-time • Complex • I/Q FFT FIFO Conjugate 0.6 GSPS input (16-bits) • 1.2 GSPS output (12-bits) • k - 1 × I/Q FFT FIFO FIFO + × + MIT Lincoln Laboratory Systolic Architecture-3 PAJ 9/29/2004

Evaluation Scorecard The design changes will be scored based on the following • metrics: Length of FFT ∆ Size 16 8192 IO pins Pins ? ? ? Fly ? ? ? Butterflies Mult ? ? ? Add ? ? ? Multipliers Shift ? ? ? Adder/subtractors Shift registers MIT Lincoln Laboratory Systolic Architecture-4 PAJ 9/29/2004

Outline Introduction • Parallel architecture • Data flow graph – Effects of serial input – Systolic architecture • Performance summary • Conclusions • MIT Lincoln Laboratory Systolic Architecture-5 PAJ 9/29/2004

Baseline Parallel Architecture ∆ Size 16 8192 1 1 1 1 Pins 448 229K 2 2 2 2 Fly 32 53K Mult 3 3 3 3 Add 4 4 4 4 Shift 0 0 5 5 5 5 6 6 6 6 7 7 7 7 Parallel FFT 8 8 8 8 Butterfly structure • 9 9 9 9 Removes • 10 10 10 10 redundant 11 11 11 11 calculation 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 MIT Lincoln Laboratory Systolic Architecture-6 PAJ 9/29/2004

Complex Butterfly ∆ Size 16 8192 Pins 448 229K Butterfly contains • Fly 32 53K 1 complex addition Mult – 1 complex subtraction Add – Shift 0 0 1 complex, constant multiply – u x + v y × - r W N MIT Lincoln Laboratory Systolic Architecture-7 PAJ 9/29/2004

Complex Addition ∆ Size 16 8192 Pins 448 229K Complex addition adds the real and • Fly 32 53K imaginary parts separately: Mult Add 128 213K + + + = + + + (a jb) (c jd) (a c) j(b d) Shift 0 0 2 adds a real + c b imag + d MIT Lincoln Laboratory Systolic Architecture-8 PAJ 9/29/2004

Complex Multiply ∆ Size 16 8192 Pins 448 229K The FOIL method of multiplying complex • Fly 32 53K numbers: Mult 128 213K Add 192 320K + + = − + + (a jb)(c jd) (ac bd) j(ad bc) Shift 0 0 4 multiplies and 2 adds a × real - c × b × imag + d × MIT Lincoln Laboratory Systolic Architecture-9 PAJ 9/29/2004

Efficient Complex Multiply ∆ Size 16 8192 Pins 448 229K Another approach requires fewer multiplies: • Fly 32 53K Mult 96 159K 75% + = + − − (ad bc) c(a b) a(c d) Add 288 480K 150% Shift 0 0 − = − + − (ac bd) d(a b) a(c d) 3 multiplies and 5 adds a - × b real + - × c imag + × d - MIT Lincoln Laboratory Systolic Architecture-10 PAJ 9/29/2004

Parallel-Pipelined Architecture ∆ Size 16 8192 1 1 1 1 Pins 448 229K 2 2 2 2 Fly 32 53K 3 3 3 3 Mult 96 159K Add 288 480K 4 4 4 4 Shift 0 0 5 5 5 5 6 6 6 6 7 7 7 7 A pipelined version 8 8 8 8 IO Bound • 9 9 9 9 100% Efficient • 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 MIT Lincoln Laboratory Systolic Architecture-11 PAJ 9/29/2004

Serial Input ∆ Size 16 8192 1 1 1 1 Pins 28 28 .01% 2 2 2 2 Fly 32 53K 3 3 3 3 Mult 96 159K Add 288 480K 4 4 4 4 Shift 0 0 5 5 5 5 6 6 6 6 7 7 7 7 A serial version 8 8 8 8 IO-rate matches • 9 9 9 9 A/D 10 10 10 10 6.25% Efficient • 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 MIT Lincoln Laboratory Systolic Architecture-12 PAJ 9/29/2004

Outline Introduction • Parallel architecture • Systolic architecture • Serial implementation – Application specific optimizations – Performance summary • Conclusions • MIT Lincoln Laboratory Systolic Architecture-13 PAJ 9/29/2004

Serial Architecture ∆ Size 16 8192 Pins 28 28 The parallel architecture can be collapsed • Fly 4 13 .03% – One butterfly per stage Mult 12 39 .03% – Consumes 1 sample per cycle Add 36 117 .03% Shift 22 12K – Same latency and throughput – More efficient design Stage 1 Stage 2 Stage 3 Stage 4 50% Efficiency MIT Lincoln Laboratory Systolic Architecture-14 PAJ 9/29/2004

High Level View ∆ Size 16 8192 Pins 28 28 Replace complex structure with an • Fly 4 13 abstract cell which contains: Mult 12 39 FIFOs – Add 36 117 Butterfly Shift 22 12K – Switch network – 1 2 3 4 Stage 1 Stage 2 Stage 3 Stage 4 MIT Lincoln Laboratory Systolic Architecture-15 PAJ 9/29/2004

8192-Point Architecture ∆ Size 16 8192 Pins 28 28 Requires 13 stages • Fly 4 13 Fixed point arithmetic • Mult 12 39 Add 36 117 Varies the dynamic range to increase • Shift 22 12K accuracy Overflow replaced with saturated value • 1 2 3 4 5 6 7 8 9 10 11 12 13 4 int 4 int 5 int 6 int 7 int 8 int 9 int 10 int 4 frac 14 frac 13 frac 12 frac 11 frac 10 frac 9 frac 8 frac 0110.0101 Multipliers limit design to 18-bits and 150 MHz • 6 + 5 Achieves 70 dB of accuracy • 16 MIT Lincoln Laboratory Systolic Architecture-16 PAJ 9/29/2004

Increase Parallelism ∆ Size 16 8192 Pins 112 112 400% Add more pipelines Fly 16 52 400% Design limited to 150 MHz by multipliers • Mult 48 156 400% I/Q module generate 600 MSPS • Add 144 468 400% Meets real-time requirement through parallelism • Shift 16 12K 100% 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 MIT Lincoln Laboratory Systolic Architecture-17 PAJ 9/29/2004

Simplification ∆ Size 16 8192 Pins 160 160 143% Target application allows a specific simplification Fly 16 52 Pads a 4096-point sequence with 4096 zeros • Mult 36 144 92% Removes 1 st stage multipliers and adders • Add 108 432 92% Achieves 100% efficiency in steady state Shift 4 8K 67% • 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 MIT Lincoln Laboratory Systolic Architecture-18 PAJ 9/29/2004

Outline Introduction • Parallel architecture • Systolic architecture • Performance summary • Power, operations per second – FPGA resources, frequency – Latency, throughput – Conclusions • MIT Lincoln Laboratory Systolic Architecture-19 PAJ 9/29/2004

Results The current implementation has been placed on a Virtex II 8000 and verified at 150 MHz Power: 22 Watts @ 65 C • GOPS: 86 total @ 3.9 GOPS/Watt • FPGA resources (XC2V8000) • Multipliers: 144 (85%) – LUTs and SRLs: 39,453 (42%) – BlockRAM: 56 (33%) – Filp flops: 35,861 (38%) – Frequency: 150 MHz • Latency: 1127 cycles • Throughput: 1.2 GSPS • MIT Lincoln Laboratory Systolic Architecture-20 PAJ 9/29/2004

Outline Introduction • Parallel architecture • Systolic architecture • Performance summary • Conclusions • Applicability to other platforms – Future work – MIT Lincoln Laboratory Systolic Architecture-21 PAJ 9/29/2004

Conclusions Created a high performance, real-time FFT core • Low power (3.9 GOPS/Watt) – High throughput (1.2 GSPS), low latency (7.6 µsec/sample) – Fixed-point (18-bits), high accuracy (70 dB) – General architecture • Extendable to a generic FPGA core – Retargetable to ASIC technology – Future work • Develop a parameterizable IP core generator – MIT Lincoln Laboratory Systolic Architecture-22 PAJ 9/29/2004

A Systolic FFT Architecture for Real Time FPGA Systems Preston - PowerPoint PPT Presentation

A Systolic FFT Architecture for Real Time FPGA Systems Preston Jackson, Cy Chan, Charles Rader, Jonathan Scalera, and Michael Vai HPEC 2004 29 September 2004 This work was sponsored by DARPA ATO under Air Force Contract F19628-00-C-0002.

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Ellen Cliff, Conor Horgan, Richard Kong, Henry Orton, Janelle San Juan, Victor Wang, Laura Wey,

Al Alterna rnative Har Hardwood ood Ma Market ets NERCOFE, March 15, 2016 Orono, Maine Eric

Integrated Technology Plan Update Board of Directors | November 20, 2018 Objectives Update

Chip Jackson Vice-President of Business and Finance/CFO E. Jenell Sargent Asst. Vice-President

THE BD2K TRAINING COORDINATING CENTER (TCC): A RESOURCE FOR THE DATA SCIENCE COMMUNITY John

Pre-Construction Meeting WELCOME! City of Rockford Public Works Department Water and Storm

!"#$%&'"(%)+(%,-"#(.#+ !"#$%&''()+,-.)%$/))0)1,%234,+56

Tit itle: America's Multiple Political Elements of an United Culture By John Girdwood, Ph.D.

A Systolic FFT Architecture for Real Time FPGA Systems Preston - PowerPoint PPT Presentation

A Systolic FFT Architecture for Real Time FPGA Systems Preston Jackson, Cy Chan, Charles Rader, Jonathan Scalera, and Michael Vai HPEC 2004 29 September 2004 This work was sponsored by DARPA ATO under Air Force Contract F19628-00-C-0002.

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

2DECOMP&amp;FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Ellen Cliff, Conor Horgan, Richard Kong, Henry Orton, Janelle San Juan, Victor Wang, Laura Wey,

Al Alterna rnative Har Hardwood ood Ma Market ets NERCOFE, March 15, 2016 Orono, Maine Eric

Integrated Technology Plan Update Board of Directors | November 20, 2018 Objectives Update

Chip Jackson Vice-President of Business and Finance/CFO E. Jenell Sargent Asst. Vice-President

THE BD2K TRAINING COORDINATING CENTER (TCC): A RESOURCE FOR THE DATA SCIENCE COMMUNITY John

Pre-Construction Meeting WELCOME! City of Rockford Public Works Department Water and Storm

!&quot;#$%&amp;'&quot;(%)*+(%,-&quot;#(.#*+ !&quot;#$%&amp;''()*+,-.)%$/*))0)1,%234,+56

Tit itle: America's Multiple Political Elements of an United Culture By John Girdwood, Ph.D.

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

!"#$%&'"(%)+(%,-"#(.#+ !"#$%&''()+,-.)%$/))0)1,%234,+56