Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver - PowerPoint PPT Presentation

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni) Parallel and Reconfigurable Computing Department of Electrical and Computer Engineering University of Auckland Computing for SKA, 2017 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Strong-field Test of Gravity using Pulsars Image credit: NASA . Image Credit: NASA/Tod Strohmayer (GSFC)/Dana Berry (Chandra X-Ray Observatory) Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Outline Overview and Task 1 FT Convolution Decomposition 2 High-level Techniques and Implementation 3 Evaluation 4 What’s Next 5 Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Pulsar and Pulsar Search Observed radiation is a pulse Binary pulsar (Doppler effect) Acceleration search: 1) Time-domain 2) Frequency-domain . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Pulsar and Pulsar Search Frequency-domain Using matched filtering technique in Fourier domain to recover the signal into single bin. [ r 0 ]+ m / 2 A k A ∗ ∑ A r 0 ⋍ r 0 − k , k =[ r 0 ] − m / 2 where frequency r 0 is unknown. Summation is computed at a range of frequencies r . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Block Overview of Pulsar Search Engine Filterbank Dedispersion Dedispersed Flagged DB Data Chuncks Buffers Data Buffer (FDB) Dedispersion Dedispersion Periodicity (FDC) (DB) (DDB) Data Receptor RFI Mtigation Buffer Creator Transform Search Buffer (RCPT) (RFIM) (DDBC) (DDTR) Creator (PSBC) Beamformed Data Filterbank Data Dedispersed Periodicity (BFD) for Selected SP Data Buffer Search Buffer Candidates (DDB) (PSB) To SDP Candidate Data Complex Output Single Pulse Fourier Streamer Detector (SPCT) Transform (CDOS) (CXFT) Filterbank Data for Candidate From SDP Candidate Full Filterbank Single Pulse Single Pulse Folding Folding and Birdie Zapping Buffer Creator Optimiser Sifter Optimsation (BRDZ) (FFBC) (SPOPT) (SPSIFT) (FLDO) Time Domain Time Domain Candidate Harmonic Fourier Transform Inverse Complex Dereddening Candidate Resampler Sifting Summing and Power Fourier Transform Spectrum Optimisation Transform (SIFT) (HRMS) Spectrum (PWFT) (iCXFT) (DRED) (TDAO) (TDRT) Common Single Pulse Fourier Domain Fourier Domain Candidate Time‐domain Acc Acceleration Search Optimisation (FDAS) Freq‐domain Acc (FDAO) Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Fourier-domain Acceleration Search (FDAS) FDAS module is applied to search for (binary) pulsars with constant frequency derivatives in frequency-domain Beam i signals are PSS Engine_ i de-dispersed for 6,000 DMs . . . . Beam 2 . . · Single Pulse Search Modules DM 1 DM 1 . . Beam i . . · . . Time Domain Acceleration or Beam N DM 2 Pre- FDAS Module Processing ... FT Convolution Module .RFIM Over 2,000 beams .DDTR Post- are formed at 4,096 DM j FIR_1 Harmonic- .PSBC . processing channels/beam . . .CXFT sum FIR_ k . ... .BRDZ . Module . .DRED FIR_ 85 DM 6000 85 FIR filters, maximum length is 421-tap Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Specification of Task Parameter Destriiption Value B # of beams 1000 ∼ 2000 DM # of de-dispersion measure (DM) trails 6000 Observation period 540 s T obs t limit Time of executing one sample group 88 ms 2 22 N # of complex samples per group M # of templates/filter 85 K # of average template/filter length > 200 . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next FT Convolution Complex floating-point operations Multiple long FIR filters Large input size Strict time limit Number of acceleration devices ( CapEx ) Energy consumption ( OpEx ) Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Basic Element Time-domain FIR Filter (TDFIR) K − 1 ∑ y m [ i ] = x m [ i − k ] h m [ k ] , for i = 0 , 1 , ... N − 1 k = 0 Frequency-domain FIR Filter (FDFIR) F { f ∗ h } = F { f }· F { h } Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Hardware Limitation Naïve Time Domain DSP block Single precision floating-point (SPF) multiplications ( A + iB ) × ( C + iD ) = ( A × C − B × D )+ i ( A × D + B × C ) Naïve Frequency Domain Off-chip (global) memory Off-chip memory bandwidth RAM block On-chip (local) memory size 4-Million elements = 32MBytes . Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Decomposition Algorithms Overlap-add Algorithm Overlap-save Algorithm Split the coefficient array Split the input array –> OLA-TD –> OLS-FD Length =N coef -1 Zero Input Data Split . . . ID_1 Coefficients C_1 C_2 C_N Split the ID_2 input into N Length = N coef /N -1 Convolve with subset small groups coefficient group i ID_3 ... Input data Zero Output data_i Convolution ID_N Output data_1 with FIR filter ID_i PD_i Output data_2 . . . Discard the N coef -1 elements Length = N coef /N ... + Output data_N PD_1 PD_2 PD_3 PD_N Output data Output Data (a) OLA (b) OLS Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next High-level Techniques Maxeler MaxCompiler using Java to develop FPGA ( HPCC2016 ) Open Computing Language ( OpenCL ) for FPGAs ( Intel FPGA Cards ), GPUs, and CPUs ( FPT2016 , best paper candidate ) 2GB DDR3 x 2 FPGA_i . Host . Memory DDR Controller & PHY . Core 1 (Global Memory) FPGA_0 DDR Controller & PHY ... Global Memory PCIe Interconnect Global Memory ... PCIe PCIe Core 4 Interconnect Kernel Pipeline Block Kernel Pipeline Kernel Pipeline Kernel Pipeline RAM Kernels Pipeline Kernel Pipeline Block Kernel Pipeline Kernel Pipeline Kernel Pipeline Kernels Pipeline RAM Local Memory Interconnect Memory (DDR3 and SSD) Local Memory Interconnect Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation What’s Next Kernel Structures–OLA �� Oliver Sinnen, Haomiao Wang & Prabu Thiagaraj (Manchester Uni) FDAS on FPGA using OpenCL

Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver - PowerPoint PPT Presentation

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation Whats Next Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni)

The BIST History of FPGAs FPGAs The BIST History of The BISTory BISTory of of FPGAs FPGAs

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

FPGAs 1 CMPE691/491: Advanced FPGA Design FPGAs Large array of configurable logic blocks

Physical Design For FPGAs Rajeev Jayaraman Physical Implementation Tools Xilinx Inc. ISPD-2001

sphere wind Pulsar e + ,e - , (ions?) wind nebula electro-magnetic fields 1000 km 0.1 pc

HW/SW Codesign w/ FPGAs The Nature of HW/SW I ECE 522 Hardware Software Codesign with FPGAs

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto

Physical optimization for Physical optimization for FPGAs using post- FPGAs using post-

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Pulsar Process Measurement Mike Ward Alistair MacKinnon Stirling Technical Engineering Pulsar

The gamma-ray spectrum of the pulsar Outer-Gap J.Takata & S.Shibata Yamagata Univ. Contents

Millisecond Pulsar Populations Millisecond Pulsar Populations in Globular Clusters in Globular

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Precise Pulsar Timing and Radio Follow-Up of Pulsars Discovered in LAT Blind Periodicity

The Square Kilometre Array Radio Telescope Project : An Overview Yashwant Gupta NCRA-TIFR

BEHAVE Working Group IETF 87 Berlin July 2013 Chairs: Dave Thaler, dthaler@microsoft.com

Module 12, part 4 The End Develop Your Data Mindset Communicate findings of goal evaluation

The Marvellous Universe of Arithmetization-Oriented Primitives Abdelrahaman Aly, Tomer Ashur , Eli

Sir Ewen Waterman Oration Disruptive innovations a natural history Prof Ian Curran BSc,

Backlog Refinement SWEN-610 By Dr ian mitchell (Own work) [CC BY-SA 4.0

Approximation Algorithms for Traffic Grooming in WDM Rings K. Corcoran 1 S. Flaxman 2 M. Neyer 3

Growing Software Quality Engineers Tim Farley Quality Engineering & Specialized Testing

Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver - PowerPoint PPT Presentation

Overview and Task FT Convolution Decomposition High-level Techniques and Implementation Evaluation Whats Next Hardware Acceleration of Pulsar Search on FPGAs using OpenCL Oliver Sinnen Haomiao Wang & Prabu Thiagaraj (Manchester Uni)

The BIST History of FPGAs FPGAs The BIST History of The BISTory BISTory of of FPGAs FPGAs

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

FPGAs 1 CMPE691/491: Advanced FPGA Design FPGAs Large array of configurable logic blocks

Physical Design For FPGAs Rajeev Jayaraman Physical Implementation Tools Xilinx Inc. ISPD-2001

sphere wind Pulsar e + ,e - , (ions?) wind nebula electro-magnetic fields 1000 km 0.1 pc

HW/SW Codesign w/ FPGAs The Nature of HW/SW I ECE 522 Hardware Software Codesign with FPGAs

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto

Physical optimization for Physical optimization for FPGAs using post- FPGAs using post-

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Pulsar Process Measurement Mike Ward Alistair MacKinnon Stirling Technical Engineering Pulsar

The gamma-ray spectrum of the pulsar Outer-Gap J.Takata &amp; S.Shibata Yamagata Univ. Contents

Millisecond Pulsar Populations Millisecond Pulsar Populations in Globular Clusters in Globular

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Precise Pulsar Timing and Radio Follow-Up of Pulsars Discovered in LAT Blind Periodicity

The Square Kilometre Array Radio Telescope Project : An Overview Yashwant Gupta NCRA-TIFR

BEHAVE Working Group IETF 87 Berlin July 2013 Chairs: Dave Thaler, dthaler@microsoft.com

Module 12, part 4 The End Develop Your Data Mindset Communicate findings of goal evaluation

The Marvellous Universe of Arithmetization-Oriented Primitives Abdelrahaman Aly, Tomer Ashur , Eli

Sir Ewen Waterman Oration Disruptive innovations a natural history Prof Ian Curran BSc,

Backlog Refinement SWEN-610 By Dr ian mitchell (Own work) [CC BY-SA 4.0

Approximation Algorithms for Traffic Grooming in WDM Rings K. Corcoran 1 S. Flaxman 2 M. Neyer 3

Growing Software Quality Engineers Tim Farley Quality Engineering &amp; Specialized Testing

The gamma-ray spectrum of the pulsar Outer-Gap J.Takata & S.Shibata Yamagata Univ. Contents

Growing Software Quality Engineers Tim Farley Quality Engineering & Specialized Testing