FPGA vs GPU Performance Comparison on the Implementation of FIR - PDF document

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the performance of them, we Abstract choose different models for each platform to get fairer FIR filters find place in digital signal processing comparison, since the performance has steep difference applications that require stopping a frequency band while between different models and architectures of the CPU, passing another band or removing noise. Due to the complex GPU and FPGA. Previous work usually takes one model structure and parallelism property of FIR filters, dedicated from each platform and compares the performance of these reconfigurable hardware are preferred for implementation platforms only by comparing the results of one model from rather than CPUs. Recently, GPGPU emerged as an each platform, which may misguide the researchers. effective technique for solving computation-intensive Therefore, 3 different FPGAs, 5 GPUs, and 4 different CPU problems having massive level of parallelism. In this paper, models are selected for comparison. In section two, previous we took FIR filtering application with different tap sizes and works are given in which FPGA and GPUs are compared for implemented them on different FPGA and GPU models performance. Section three gives a summary of GPGPU using both OpenCL and CUDA platforms. We have programming architecture for OpenCL and CUDA. The evaluated FIR filters’ performance s using two different details of FIR the filter implementations on different kernels on GPU and compared the performances with platforms are given in section four. We show comprehensive various FPGA implementations by taking an OpenMP performance results and discuss them in section five. Finally implementation that utilizes all available cores in single in section six the discussion is concluded. CPU as a baseline performance point. In general, FPGA 2. Related Work outperformed GPU in terms of output samples produced per second. But GPU is a life saver when very high order filters In their work Llamocca et al. compares the energy, are needed where FPGA cannot help due to their inadequate performance and accuracy of implementations of 2D logic units. difference of Gaussians (DOG) filter for real-time digital video processing applications on FPGA and GPU. The Keywords article concludes that for 2D filtering applications GPUs are FIR Filter, GPGPU, FPGA, heterogeneous computing better for performance and precision, but FPGAs have the 1. Introduction advantage of lower power dissipation [1]. Pauwels et al. made a comparison of FPGA and GPU performance on FIR (finite impulse response) filters are the most common computation of phase-based optical flow, stereo and local digital filters used in signal processing applications due to image features. Based on their work, GPUs overcome the linear phase response and always stable characteristics. FPGAs for perform ance aspects especially by GPU’s higher In signal processing, FIR filters are usually used for memory bandwidth and clock speed [2]. Kalarot and Morris stopping a frequency band while passing another frequency compare FPGA and GPU for implementation of real-time band or removing noise from an information carrying signal. stereo vision applications. Although prior works state that FIR filters find place themselves for applications varying FPGAs outperform GPUs [3], they conclude that GPUs are from radar, satellite and military to numerous industrial as effective as FPGAs when graphic processors are utilized systems; in fact, whenever an application involves signals, efficiently with CUDA [4]. processing operations on them is inevitable, where filtering is the most common operation. In their work, Zhang et al. take the operation of sparse matrix-vector multiplication (SpMV) for performance FIR filters are inherently parallel structures, so that by using comparison between FPGA and GPU. GPU greatly extra resources they can be implemented in a parallel outperforms FPGA when considered the memory transaction fashion to reduce the operation time. In high order FIR operations; however, when FPGA memory performance is filters, FPGAs were the common solution to achieve scaled to the GPU rates the FPGA exceeds the performance massive level of parallelism. However, programming of GPU [5]. In the digital video processing field, dynamic FPGAs is not as easy as programming microcontrollers or partial reconfiguration method allows designers to control digital signal processors (DSPs). Recently, GPGPU emerged resources based on energy, performance, and accuracy as an efficient technique for solving computer-intensive considerations. FPGA implementations of different problems having massive level of parallelism with the ease approaches utilized dynamic partial reconfiguration on of programmability. OpenCL and CUDA are the two most digital video processing [6] [7]. Recently, image and video common frameworks to program GPUs for general-purpose processing applications with OpenCL and CUDA applications [13] [16]. OpenMP is a parallel platform for programming have made practical performance as stated in CPUs and can also be used to parallelize FIR filter [8] and [9]. In their work Che et al. compares FPGA, GPU applications on CPU platforms [19]. However, due to the and CPU for three different applications: Gaussian fact that CPUs have small number of cores comparing to elimination, data encryption standard, and Needleman- GPUs, even with OpenMP the performance results usually is Wunsch algorithm. They conclude that the application not comparable with GPUs and FPGAs. characteristics are important for choosing the platform to In this work, we take the FIR filtering application and accelerate specific applications [10]. In their work Howes et implement it on different platforms, namely CPU, GPU and

FPGA vs GPU Performance Comparison on the Implementation of FIR - PDF document

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the performance of them, we Abstract choose different models for each platform to get fairer FIR filters find place in digital signal processing

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

Software into GPU code, Multicore Software and FPGA Hardware Satnam Singh Microsoft Research,

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Understanding GPU performance How to get peak FLOPS (GPU version) Kenjiro Taura 1 / 7 Contents

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

GPU peak performance vs. CPU Squeezing GPU performance Peak Double Precision FLOPS Peak Memory

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

Pedestal Subtraction - Filtering Kostas Manolopoulos Rutherford Appleton Laboratory Trigger

Advanced Digital Signal Processing Part 3: Efficient FIR Structures Gerhard Schmidt

Optimum FIR Filters Definitions Suppose we have a univariate random signal x ( n ) that is

Character Atticus Finch Unconvential Single father Devoted to his children Unusual parenting

SDP Media Capability Negotiation Bob Gilman, Flemming Andreasen and Roni Even

Digital Signal Processing for the APFEL Oliver Noll PANDA-Collaboration Meeting 18/3

of parameters in nonlinear identification benchmarks Anna Marconato, Maarten Schoukens, Yves

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Nol Pouchet ,

FPGA vs GPU Performance Comparison on the Implementation of FIR - PDF document

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters FPGA. While comparing the performance of them, we Abstract choose different models for each platform to get fairer FIR filters find place in digital signal processing

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

Software into GPU code, Multicore Software and FPGA Hardware Satnam Singh Microsoft Research,

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Understanding GPU performance How to get peak FLOPS (GPU version) Kenjiro Taura 1 / 7 Contents

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

GPU peak performance vs. CPU Squeezing GPU performance Peak Double Precision FLOPS Peak Memory

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

Pedestal Subtraction - Filtering Kostas Manolopoulos Rutherford Appleton Laboratory Trigger

Advanced Digital Signal Processing Part 3: Efficient FIR Structures Gerhard Schmidt

Optimum FIR Filters Definitions Suppose we have a univariate random signal x ( n ) that is

Character Atticus Finch Unconvential Single father Devoted to his children Unusual parenting

SDP Media Capability Negotiation Bob Gilman, Flemming Andreasen and Roni Even

Digital Signal Processing for the APFEL Oliver Noll PANDA-Collaboration Meeting 18/3

of parameters in nonlinear identification benchmarks Anna Marconato, Maarten Schoukens, Yves

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Nol Pouchet ,

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team