FPGA vs GPU Performance Comparison on the Implementation of FIR Filters
Abstract
FIR filters find place in digital signal processing applications that require stopping a frequency band while passing another band or removing noise. Due to the complex structure and parallelism property of FIR filters, dedicated reconfigurable hardware are preferred for implementation rather than CPUs. Recently, GPGPU emerged as an effective technique for solving computation-intensive problems having massive level of parallelism. In this paper, we took FIR filtering application with different tap sizes and implemented them on different FPGA and GPU models using both OpenCL and CUDA platforms. We have evaluated FIR filters’ performances using two different kernels on GPU and compared the performances with various FPGA implementations by taking an OpenMP implementation that utilizes all available cores in single CPU as a baseline performance point. In general, FPGA
- utperformed GPU in terms of output samples produced per
- second. But GPU is a life saver when very high order filters
are needed where FPGA cannot help due to their inadequate logic units.
Keywords
FIR Filter, GPGPU, FPGA, heterogeneous computing
- 1. Introduction
FIR (finite impulse response) filters are the most common digital filters used in signal processing applications due to the linear phase response and always stable characteristics. In signal processing, FIR filters are usually used for stopping a frequency band while passing another frequency band or removing noise from an information carrying signal. FIR filters find place themselves for applications varying from radar, satellite and military to numerous industrial systems; in fact, whenever an application involves signals, processing operations on them is inevitable, where filtering is the most common operation. FIR filters are inherently parallel structures, so that by using extra resources they can be implemented in a parallel fashion to reduce the operation time. In high order FIR filters, FPGAs were the common solution to achieve massive level of parallelism. However, programming FPGAs is not as easy as programming microcontrollers or digital signal processors (DSPs). Recently, GPGPU emerged as an efficient technique for solving computer-intensive problems having massive level of parallelism with the ease
- f programmability. OpenCL and CUDA are the two most
common frameworks to program GPUs for general-purpose applications [13] [16]. OpenMP is a parallel platform for CPUs and can also be used to parallelize FIR filter applications on CPU platforms [19]. However, due to the fact that CPUs have small number of cores comparing to GPUs, even with OpenMP the performance results usually is not comparable with GPUs and FPGAs. In this work, we take the FIR filtering application and implement it on different platforms, namely CPU, GPU and
- FPGA. While comparing the performance of them, we
choose different models for each platform to get fairer comparison, since the performance has steep difference between different models and architectures of the CPU, GPU and FPGA. Previous work usually takes one model from each platform and compares the performance of these platforms only by comparing the results of one model from each platform, which may misguide the researchers. Therefore, 3 different FPGAs, 5 GPUs, and 4 different CPU models are selected for comparison. In section two, previous works are given in which FPGA and GPUs are compared for
- performance. Section three gives a summary of GPGPU
programming architecture for OpenCL and CUDA. The details of FIR the filter implementations on different platforms are given in section four. We show comprehensive performance results and discuss them in section five. Finally in section six the discussion is concluded.
- 2. Related Work
In their work Llamocca et al. compares the energy, performance and accuracy of implementations of 2D difference of Gaussians (DOG) filter for real-time digital video processing applications on FPGA and GPU. The article concludes that for 2D filtering applications GPUs are better for performance and precision, but FPGAs have the advantage of lower power dissipation [1]. Pauwels et al. made a comparison of FPGA and GPU performance on computation of phase-based optical flow, stereo and local image features. Based on their work, GPUs overcome FPGAs for performance aspects especially by GPU’s higher memory bandwidth and clock speed [2]. Kalarot and Morris compare FPGA and GPU for implementation of real-time stereo vision applications. Although prior works state that FPGAs outperform GPUs [3], they conclude that GPUs are as effective as FPGAs when graphic processors are utilized efficiently with CUDA [4]. In their work, Zhang et al. take the operation of sparse matrix-vector multiplication (SpMV) for performance comparison between FPGA and GPU. GPU greatly
- utperforms FPGA when considered the memory transaction
- perations; however, when FPGA memory performance is
scaled to the GPU rates the FPGA exceeds the performance
- f GPU [5]. In the digital video processing field, dynamic
partial reconfiguration method allows designers to control resources based on energy, performance, and accuracy considerations. FPGA implementations
- f