Optimization techniques for 3D-FWT on systems with manycore GPUs and - PowerPoint PPT Presentation

International Conference on Computational Science (ICCS 2013) Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs G. Bernabé † , J. Cuenca † and D. Giménez ‡ † Computer Engineering Department, University of Murcia ‡ Computer Science and Systems Department, University of Murcia 5-7 June, 2013 Conference title 1

Outline Introduction The Wavelet Transform Optimization techniques for 3D-FWT on a single GPU system Optimization techniques for 3D-FWT on hybrid systems Conclusions and Future work ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 2

Introduction • The application of Wavelet Transform – Important development: Mainly applied to image and video compression – Optimal tiled 2D and 3D FWT: Reduction of almost an order of magnitude in the overall execution time (with respect to a baseline version on a CPU) – CUDA and OpenCL provide mechanisms to optimize general-purpose applications on GPUs (GPGPUs) – Several implementations of the 3D-FWT on CUDA and OpenCL for accelerating on GPUs A method to compute automatically the parameters of the 3D-FWT running on systems with multicore CPU and manycore GPUs ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 4

The Wavelet Transform 1D-FWT • The wavelet transform uses simple filters for fast computing • The filters are applied to the signal. The filter output downsampled by two generating two bands • Maintaining the amount of data on each additional level with minimum info loss • Access pattern is determined by our mother wavelet function ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 7

The Wavelet Transform 2D-FWT • Generalize the 1D-FWT for an image (2D) • Applying the 1D-FWT to each row and to each column of the image ... after a three level Original image Rows transformed Columns transformed application of the filters ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 8

The Wavelet Transform 3D-FWT with tiling • Generalize the 1D-FWT for a sequence of video (3D) 1.N rows x N colums calls to 1D-FWT on frames 2.Each of N frames calls to 2D-FWT with tiling ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 9

The Wavelet Transform 3D-FWT on CUDA and OpenCL • Our 3D-FWT implementation in CUDA and OpenCL consists of the following three steps: 1. The host (CPU) allocates in memory the first four video frames 2. The first four images are transferred from host to device. – The 1D-FWT is then applied to the first four frames over the time dimension 3. The 2D-FWT is applied to detailed and reference video and results sent to CPU ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 11

The Wavelet Transform 3D-FWT on CUDA and OpenCL • We read two more frames (interleaved) to complete each new step ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 15

Optimization techniques for 3D-FWT on a single GPU system The method consists mainly on three stages 1. Detect automatically the available GPU in the system GPU Nvidia or ATI  3D-FWT 2. 3. The key parameter value of block or work-group size is selected automatically • The remaining parameters (grid size, the occupation of the shared memory, etc) are also calculated automatically ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 18

Optimization techniques for 3D-FWT on a single GPU system The method consists mainly on three stages 1. Detect automatically the available GPU in the system 2. GPU Nvidia or ATI  3D-FWT CUDA or OpenCL 3. The key parameter value of block or work-group size is selected automatically • The block size value is based on the CUDA occupancy calculator 1. Select the block size that maximizes the occupancy of each multiprocessor 2. If two or more values obtain the same occupancy, the maximum value of the number of active threads blocks per multiprocessor • The work-group size is equal to the value of CL_DEVICE_MAX_WORK_GROUP_SIZE ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 19

Optimization techniques for 3D-FWT on a single GPU system Experiments with 3D-FWT parameters for 3 GPUs Execution Run on 64 frames, each of them of size: Times 512x512 1024x1024 2048x2048 Block size 64 128 192 256 64 128 192 256 64 128 192 256 Tesla C870 58.68 56.28 53.51 58.68 225.74 214.36 209.01 217.21 889.83 841.47 840.14 850.26 Tesla C2050 35.33 53.17 32.13 33.59 122.12 115.02 110.88 113.32 467.50 438.46 427.69 433.84 FirePro V5800 130.06 135.87 131.29 114.87 452.95 346.29 313.35 307.54 2123.60 1496.27 1284.56 1217.59 • The optimization engine studies the problem for different block or work-group sizes • Selects 192 in the Tesla C870 and Fermi C2050 (optimal) • Selects 256 for the ATI FirePro (optimal) ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 20

Optimization techniques for 3D-FWT on systems with manycore GPUs and - PowerPoint PPT Presentation

International Conference on Computational Science (ICCS 2013) Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs G. Bernab , J. Cuenca and D. Gimnez Computer Engineering Department, University

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Presentation Techniques A Guide To Drawing And Presenting Design Ideas Presentation Techniques A

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Low Power Techniques for SoC Design: basic concepts and techniques Estagi ario de Doc encia

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

What is Parameter Optimization? Optimization Techniques Reading: C.M.Bishop NNPR 7 A fancy name

Stochastic Optimization Techniques for Big Data Machine Learning Tong Zhang Rutgers University

PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@dataegret.com)

) UNION SELECT `This_Talk` AS ('New Optimization and Obfuscation Optimization and Obfuscation

BEEM103 Optimization Techniques for Economists Level Curves Multivariate Functions Isoquants

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

SIMULATING REAL-TIME FIRE FOR FIREFIGHTING TRAINING Fire and Evacuation Modeling Technical

Recent Research ch on Lightning, with Implications f s for Air Terminals William William am

Kingwood Photoclub ~ February 2016 Kingwood Photoclub ~ February 2016 What is a GoPro? /

Meet the Big Time Spatio-Temporal Regularization over Many Frames Alistair Boyle

Business Results Second Quarter of Fiscal Year Ending March 31, 2019 MinebeaMitsumi Inc.

Multi-Modal Localization for Autonomous Lunar Lander Robert Fisher Heather Jones Localizing

Corporate Presentation Corporate Overview Fact Sheet Successfully Delivered Founded with the

Health and Human Services Non-Emergency Medical Transportation (NEMT) Open Forum Follow Up