An Exploration of General Purpose Programming on GPUs MSc - PDF document

An Exploration of General Purpose Programming on GPUs MSc Dissertation Michael Fergus McCann A thesis submitted in part fulfilment of the degree of MSc Advanced Software Engineering in Computer Science under the supervision of Dr. Neil Hurley. School of Computer Science and Informatics University College Dublin April 2011

An Exploration of General Purpose Programming on GPUs Michael Fergus McCann Abstract The recent emergence of simplified models for general purpose GPU programming has led to an explosion in the popularity of GPU accelerated applications development. An overview of this phenomenon is presented along with a GPU based parallel implementation of the well known RANMAR pseudo random number generator. The parallel RANMAR implementation is shown to exhibit up to a 5 fold speed up over the sequential version on one system tested. Exhaustive statistical tests were run on the numbers produced and a potential weakness in at least two common implementations of the double precision RANMAR are discussed. The implementation is integrated with Corsika (the widely used, FORTRAN based, high-energy cosmic radiation interaction simulation software). Finally, the suitability of generating random numbers using GPU hardware is discussed.

An Exploration of General Purpose Programming on GPUs Michael Fergus McCann I would like to thank Dr. John Quinn of the UCD School of Physics for providing the real world problem that an investigation such as this requires. I would also like to thank him for providing a GPU enabled hardware environment and an invaluable explanation of the physics behind the simulations that this thesis aimed to (GPU) accelerate. I would also like to thank Dr. Neil Hurley, my project supervisor, for his advice and encouragement for the duration of this project.

An Exploration of General Purpose Programming on GPUs Michael Fergus McCann Table of Contents 1 Introduction .................................................................................................... 7 1.1 .................................................................................................... 7 GPGPU Background 1.2 Project Outline and Goals ............................................................................................ 8 2 The Problem Domain ..................................................................................... 9 2.1 ... 9 The Role of Monte-Carlo Simulations in Cosmic Ray and Gamma Ray Astronomy 2.1.1 Introduction ...................................................................................................... 9 2.1.2 Extensive Air Showers and TeV Gamma-ray Astronomy .............................. 10 2.2 UCD High Energy Astrophysics Group and VERITAS .............................................. 10 2.3 CORSIKA ................................................................................................................... 11 2.4 The RANMAR Pseudo Random Number Generator ................................................. 11 2.4.1 Extension to Double Precision ....................................................................... 12 3 GPU Programming and CUDA ..................................................................... 13 3.1 ................................................................................................................ 13 Background 3.2 ....................................................................................................... 14 The CUDA Model 3.2.1 The CUDA C Coding and Compilation Model ............................................... 14 3.2.2 The CUDA Task and Data Parallelisation Model........................................... 15 3.2.3 ............................................................................. 15 The CUDA Memory Model 3.2.4 The CUDA Program Flow Model ................................................................... 16 4 Parallel RANMAR Using a CUDA GPU ........................................................ 17 4.1 Parallel RANMAR Design .......................................................................................... 17 4.1.1 RANMAR Parallelisation within a Single Sequence (Leapfrog) .................... 18 4.1.2 RANMAR Parallelisation Using Multiple Independent Sequences ................ 21 4.2 Comparison of Design with Existing Schemes .......................................................... 22 4.3 Implementation Phase ............................................................................................... 23 4.3.1 Corsika Imposed Constraints and Features .................................................. 23

An Exploration of General Purpose Programming on GPUs Michael Fergus McCann 4.3.2 ............................................................................. 24 Other High Level Features 4.3.3 Implementation Details and Challenges ........................................................ 24 4.4 Verification of RANMAR Correctness ........................................................................ 29 4.5 Integration with Corsika ............................................................................................. 29 5 Statistical Validation of Generator Output .................................................... 30 5.1 ................................................................................................... 30 Validation Approach 5.2 Validation Results ...................................................................................................... 31 5.2.1 Sanity Validation of Sequential RANMAR Provided by TestU01 .................. 31 5.2.2 Sanity Validation of Parallel RANMAR with 1 Instance ................................. 32 5.2.3 Validation of Parallel RANMAR with 8 instances........................................... 32 5.3 ..................................... 33 Proposed Extension to Current Double Precision RANMAR 5.3.1 Validation of Extended Parallel RANMAR with 1 instance ............................ 33 5.3.2 Validation of Extended Parallel RANMAR with 8 instances .......................... 34 6 Performance Results .................................................................................... 35 6.1 Introduction ................................................................................................................ 35 6.2 Standalone Testing .................................................................................................... 35 6.2.1 .......................................................................................................... 38 Analysis 6.3 Corsika Testing .......................................................................................................... 39 7 Conclusion and Future Work ........................................................................ 40 7.1 ............................................................................................................. 40 Project Recap 7.2 ................................................................................................................ 41 Conclusions 7.3 Future Work ............................................................................................................... 42 8 References ................................................................................................... 44 APPENDIX A – Specification of Test Systems 9 ............................................. 46 9.1 Test System1 ............................................................................................................. 46 9.1.1 ............................................................ 46 CPU Specification (cat /proc/cpuinfo) 9.1.2 ............................................................. 46 Operating System (cat /etc/*release)

An Exploration of General Purpose Programming on GPUs Michael Fergus McCann 9.1.3 GPU Driver (cat /proc/driver/nvidia/version) .................................................. 46 9.1.4 .......................................................................................... 46 GPU Specification 9.2 Test System2 ............................................................................................................. 47 9.2.1 ............................................................ 47 CPU Specification (cat /proc/cpuinfo) 9.2.2 ............................................................. 47 Operating System (cat /etc/*release) 9.2.3 GPU Driver (cat /proc/driver/nvidia/version) .................................................. 47 9.2.4 .......................................................................................... 47 GPU Specification

An Exploration of General Purpose Programming on GPUs MSc - PDF document

An Exploration of General Purpose Programming on GPUs MSc Dissertation Michael Fergus McCann A thesis submitted in part fulfilment of the degree of MSc Advanced Software Engineering in Computer Science under the supervision of Dr. Neil Hurley.

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs

Unleashing the Power of GPUs over the Web Vishal Vaidyanathan Royal Caliber LLC GPUs are

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Clusters of GPUs Michael LeBeane mlebeane@utexas.edu Advisor : Lizy K. John Problem Statement

MD5 Chosen-Prefix Collisions on GPUs Marc Bevand m.bevand@gmail.com marc.bevand@rapid7.com

Analyzing Throughput of GPUs Analyzing Throughput of GPUs Exploiting Within-Die Core-to-Core

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Exploration edge Fraser MacCorquodale General Manager - Exploration What are we looking for Au

A Dynamic Programming-based MCMC Framework for Solving DCOPs with GPUs Ferdinando Fioretto 1(2)

Why study algorithms? Their impact is broad and far-reaching. Internet. Web search, packet

Update for the Policy Group Meeting EM rootstock club Felicidad Fernndez Fernndez Breeding

AP Physics 2 Thermal Physics Multiple Choice www.njctl.org Slide 2 / 59 1 Which of the

ECE 550D Fundamentals of Computer Systems and Engineering Fall 2016 Intro to Intel x86 Tyler

The Tor Project Our mission is to be the global resource for technology, advocacy, research and

EuCARD-2 Enhanced European Coordination for Accelerator Research & Development Presentation

Neutron Stars Nanda Rea Institute for Space Sciences (ICE), CSIC-IEEC, Barcelona, ES Anton

A Balloon-borne Soft Gamma-ray Polarimeter Mark Pearce Dept. of Physics, KTH, Stockholm, Sweden

An Exploration of General Purpose Programming on GPUs MSc - PDF document

An Exploration of General Purpose Programming on GPUs MSc Dissertation Michael Fergus McCann A thesis submitted in part fulfilment of the degree of MSc Advanced Software Engineering in Computer Science under the supervision of Dr. Neil Hurley.

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs

Unleashing the Power of GPUs over the Web Vishal Vaidyanathan Royal Caliber LLC GPUs are

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Clusters of GPUs Michael LeBeane mlebeane@utexas.edu Advisor : Lizy K. John Problem Statement

MD5 Chosen-Prefix Collisions on GPUs Marc Bevand m.bevand@gmail.com marc.bevand@rapid7.com

Analyzing Throughput of GPUs Analyzing Throughput of GPUs Exploiting Within-Die Core-to-Core

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Exploration Strategy Exploration Strategy Workshop Workshop Scott Doc Horowitz Scott

Exploration edge Fraser MacCorquodale General Manager - Exploration What are we looking for Au

A Dynamic Programming-based MCMC Framework for Solving DCOPs with GPUs Ferdinando Fioretto 1(2)

Why study algorithms? Their impact is broad and far-reaching. Internet. Web search, packet

Update for the Policy Group Meeting EM rootstock club Felicidad Fernndez Fernndez Breeding

AP Physics 2 Thermal Physics Multiple Choice www.njctl.org Slide 2 / 59 1 Which of the

ECE 550D Fundamentals of Computer Systems and Engineering Fall 2016 Intro to Intel x86 Tyler

The Tor Project Our mission is to be the global resource for technology, advocacy, research and

EuCARD-2 Enhanced European Coordination for Accelerator Research &amp; Development Presentation

Neutron Stars Nanda Rea Institute for Space Sciences (ICE), CSIC-IEEC, Barcelona, ES Anton

A Balloon-borne Soft Gamma-ray Polarimeter Mark Pearce Dept. of Physics, KTH, Stockholm, Sweden

EuCARD-2 Enhanced European Coordination for Accelerator Research & Development Presentation