case study in 3d fft
play

Case Study in 3D FFT Ahmed Sanaullah Martin Herbordt Vipin - PowerPoint PPT Presentation

OpenCL for FPGAs/HPC Case Study in 3D FFT Ahmed Sanaullah Martin Herbordt Vipin Sachdeva Boston University Silicon Therapeutics OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/15/2017 What gives FPGAs high performance? Deep pipelines


  1. OpenCL for FPGAs/HPC Case Study in 3D FFT Ahmed Sanaullah Martin Herbordt Vipin Sachdeva Boston University Silicon Therapeutics

  2. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/15/2017 What gives FPGAs high performance? ► Deep pipelines Boston University Slideshow Title Goes Here ► Block RAMs ► Flexible on-chip communication/networks ► High utilization

  3. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 What gives FPGAs high performance? ► Deep pipelines Boston University Slideshow Title Goes Here ► Block RAMs ► Flexible on-chip communication/networks ► High utilization To sum it up … Application Specific Architecture

  4. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 What gives FPGAs high performance? ► Deep pipelines Boston University Slideshow Title Goes Here ► Block RAMs ► Flexible on-chip communication/networks ► High utilization To sum it up … Application Specific Architecture But creating these designs in HDL is very complex How do we solve the programmability problem?

  5. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 IP Cores ► 3 rd party solutions ► Highly optimized Boston University Slideshow Title Goes Here ► Ease of use ► Reduces implementation timeframes

  6. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 IP Cores ► 3 rd party solutions ► Highly optimized Boston University Slideshow Title Goes Here ► Ease of use ► Reduces implementation timeframes But … ► Limited customizability ► Implementation specifics hidden to protect intellectual property

  7. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 IP Cores ► 3 rd party solutions ► Highly optimized Boston University Slideshow Title Goes Here ► Ease of use ► Reduces implementation timeframes But … ► Limited customizability ► Implementation specifics hidden to protect intellectual property Which means … 7 Application Specific Architecture Pseudo

  8. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 How about OpenCL? ► Develop application in C99 and compile to hardware Boston University Slideshow Title Goes Here ► Primitives and pragmas ► further customize hardware translations ► e.g. loop unroll, compute unit replication, single/multiple work item Doesn’t OpenCL generate a complete .aocx file? ► Do not have to complete compilation ► Can obtain generated HDL from kernel_system folder ► Isolate and integrate required modules into existing design

  9. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 3D FFT (Z dimension) (depth) Boston University Slideshow Title Goes Here 2D FFT (X dimension) (width) Case Study 3D FFT 1D FFT (Y dimension) (height)

  10. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 3D FFT Compute Units Boston University Slideshow Title Goes Here OpenCL Radix-2 IP Core Radix-4/2 1D Vector FFT IP Core 1 1D Vector FFT IP Core 2 Stage log(N) 1D Vector 1D Vector Stage 1 Stage 2 1D Vector FFT IP Core 3 Individual Complex 1D Vector FFT IP Core 4 Values 1D Vector FFT IP Core N

  11. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 FPGA: Altera Arria 10-X115 ► 427K ALMs Boston University Slideshow Title Goes Here ► 1518 DSP blocks ► 53Mb BRAMs FFT Size: 64 3 Throughput Constraint: 64 ► Mix of ALMs and DSPs used for FFT IP cores ► Insufficient DSP resources ► DSPs preferred over ALMs

  12. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 Resource and Performance Comparison Boston University Slideshow Title Goes Here • OpenCL FFT has: ► ≈ 10x fewer ALMs usage ► ≈ 25x less on-chip memory usage ► ≈ 2x higher frequency ► OpenCL FFT can meet the required throughput using DSPs only

  13. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 Conclusion Boston University Slideshow Title Goes Here ► OpenCL based designs can perform better than IP core based one ► For 64 3 FFT ► FFT IP cores are constrained to a specific computational flow ► May not be optimal for all FFT sizes ► OpenCL enables more application specific designs ► with less effort than HDL programming

  14. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 Memory Architecture ► Ping-pong Primary Memory buffers ► Primary Memory Bank: O(N 2 ) complexity (single read, single write) Boston University Slideshow Title Goes Here ► Secondary Memory Bank: O(N) complexity (single read, parallel write) ► Transpose ► Outputs of Compute Unit write to the same Secondary Memory Bank ► Secondary Memory Banks write to Primary Memory Banks ► New writes to Secondary Memory Bank every N cycles

  15. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 Can this design source and sink data stall-free? 𝐽𝑜𝑒𝑓𝑦_3𝐸 = 𝑪𝒗𝒈𝒈𝒇𝒔# × 𝑂 2 + 𝑷𝒈𝒈𝒕𝒇𝒖 × 𝑂 + 𝑴𝒑𝒅 Boston University Slideshow Title Goes Here IP Core ► Buffer# varies for a given cycle Loc Offset Buffer # ► FFTx X Y Z Loc changes every cycle FFTy Y Z X ► Offset changes every N cycles FFTz Z X Y ► Buffer# → Offset for next FFT dimension

  16. OpenCL for FPGAs/HPC: Case Study in 3D FFT 11/16/2017 Can this design source and sink data stall-free? 𝐽𝑜𝑒𝑓𝑦_3𝐸 = 𝑪𝒗𝒈𝒈𝒇𝒔# × 𝑂 2 + 𝑷𝒈𝒈𝒕𝒇𝒖 × 𝑂 + 𝑴𝒑𝒅 Boston University Slideshow Title Goes Here IP Core ► Buffer# varies for a given cycle Loc Offset Buffer # ► FFTx X Y Z Loc changes every cycle FFTy Y Z X ► Offset changes every N cycles FFTz Z X Y ► Buffer# → Offset for next FFT dimension OpenCL Radix-2 Loc Offset Buffer # ► Only difference is in initial data FFTx Y Z X locations FFTy Z X Y FFTz X Y Z ► Hence, no stalls

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend