GPU Computing at the Netherlands eScience Center Ben van Werkhoven - PowerPoint PPT Presentation

GPU Computing at the Netherlands eScience Center Ben van Werkhoven NIRICT – GPU Applications Workshop Utrecht, June 8 2017

Climate Modeling Radio Astronomy GPU Applications Super-resolution Microscopy Astro-particle Physics Life Sciences Computational Linguistics Digital Forensics

How we work Yearly calls for proposals Accepted projects receive: - 250K to hire Postdoc or PhD student - 2.5FTE eScience Research Engineers

Projects started in 2017 Data mining DIRAC - tools for abrupt Distributed Radio climate change Astronomical Computing Methodology and Accelerating ecosystem for Astronomical many-core Applications 2 programming

Real-time detection of neutrinos from the distant Universe

KM3NeT – Neutrino Telescope • Huge instrument at the bottom of the Mediterranean Sea • Pretty high data rate due to background noise from bioluminescence and Potassium-40 decay • Current event detection / reconstruction happens on pre-filtered data (so called L1 hits) • Our goal: Work towards event detection based on unfiltered data (so called L0 hits)

Correlating hits Correlation matrix • Hits are correlated based on their time and location • Correlations can only occur in a small window of time • Density of the narrow band depends on correlation criterion in use hit no. Try-out two designs: • Dense pipeline that stores the narrow band as a table • Sparse pipeline that stores the matrix in compressed sparse row (CSR) form hit no.

Data representation N N 1500 N N N – Dense N 1500 correlations on the GPU correlation matrix table N CSR format # correlations – Sparse column indices N N start of row correlation matrix

Comparing performance

Super-resolution microscopy

Super-resolution microscopy • Collect a large number of images from fluorescence microscope • Localize fluorophores using fitting code • Create single super-resolution image from all localized fluorophores • Segment all individual molecules in the image Fluorescence microscope • Create single reconstruction by combining identical copies in the data

Existing GPU code • GPU code for maximum likelihood estimation developed in 2009-2010 – ”Fast, single -molecule localization that achieves theoretically minimum uncertainty ” Smith et al. Nature Methods (2010) • Estimates the locations and several other parameters of points in noisy image data for various fitting schemes and pixel area sizes • State of the code: – Each thread worked on exactly one fitting – Pixel area analyzed by single thread is 7x7, 19x19, and expected to grow in future – Requires many registers and a lot of shared memory per thread block – Results in low utilization on modern GPUs – Multiple fitting schemes implemented with lots of code duplication

New parallelization • One fitting is now computed by a whole thread block cooperatively • Used CUB library for thread block-wide reductions • Code quality – Used function templates to de-duplicate code between different fitting methods – Wrote scripts for testing and tuning of device functions and kernels • Results – Currently, speedup of 5.8x to 6.6x over old GPU code on Nvidia GTX Titan X – Code can handle arbitrary pixel area per fitting – Makes it possible to do termination detection – Easier to maintain and extend the code with new fitting schemes

Lessons Learned

Software Engineering Practice “Throw all good practices out of the window for the sake of high performance” • Examples: – Thousands of code lines in a single function – Only acronyms as variable names – No comments or external documentation about the code – Unnecessary optimization • Recommendations: – Start GPU code from simple code – Write and use tests – Write C++ and not C, whenever possible – Trust the compiler to handle simple stuff

Evaluating results Results from the CPU and GPU codes are not bit-for-bit the same • GPUs today implement the IEEE standard just like CPUs • CPU compilers sometimes more aggressive than GPU compilers • Fused multiply-add rounds differently • Floating-point arithmetic is not associative Things to keep in mind • It depends on the application whether bit-for-bit difference is a problem • Testing with random input can give a false sense of correctness

Talking about performance • Many computer scientists I know think – The only way to properly way to discuss GPU performance is to fully optimize and tune for both CPU and GPU – Then (and only then) you are allowed to say anything about GPU performance – Answering the question: “Which architecture performs the best for this application?” • Many scientists from others fields that I work with just want to know: – “How much faster is that Matlab /Python code I gave you on the GPU?”

Summary • Choose your starting point carefully • High-performance and high quality software can co-exist • Application dependent if small differences in results is a problem • When talking about performance, be very clear on what is compared to what www.esciencecenter.nl Ben van Werkhoven b.vanwerkhoven@esciencecenter.nl

Project Partners

GPU Computing at the Netherlands eScience Center Ben van Werkhoven - PowerPoint PPT Presentation

GPU Computing at the Netherlands eScience Center Ben van Werkhoven NIRICT GPU Applications Workshop Utrecht, June 8 2017 Climate Modeling Radio Astronomy GPU Applications Super-resolution Microscopy Astro-particle Physics Life Sciences

eScience in the Netherlands Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl We work

TCS (eScience) Personal CA Milan Sova Context TCS: TERENA SSL CA TERENA eScience SSL

eScience Projects in Projects in eScience Singapore Singapore Lawrence Wong National Grid

Building Virtual Communities with eScience Andy Parker Director, Cambridge eScience Centre What

Growi rowing ng resea research whi rch which com ch comput putes es Nick Jones Director

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

eScience on Distributed Infrastructure in Poland Marian Bubak AGH University of Science and

The Promise and Perils of Data Science in the Wild Data Science & Society Seminar | eScience

HPC AND IN THE DATA CENTER Peter Messmer, DATE 2019, March 27 2019 RISE OF GPU COMPUTING 1000X

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

A fast Statistical Colocalization Method for 3D Live Cell Imaging and Super- Resolution

Superresolution Imaging for Neuroscience Jan Tonnesen, U. Valentin N agerl Experimental

Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere Yanjun Li Yoram Bresler

2 0 optimization for single molecule localization microscopy Laure Blanc-Fraud G.

Semi-blind deconvolution in 4Pi-microscopy Robert St uck Institute for numerical and applied

Data Curation at Large Experimental Facilities with Open Source Software Line Pouchard, Pavol

Using imaging to measure neural Outline: ac5vity 1. History

The new CDC population health initiative to improve health in 5 years or less Office of the

GPU Computing at the Netherlands eScience Center Ben van Werkhoven - PowerPoint PPT Presentation

GPU Computing at the Netherlands eScience Center Ben van Werkhoven NIRICT GPU Applications Workshop Utrecht, June 8 2017 Climate Modeling Radio Astronomy GPU Applications Super-resolution Microscopy Astro-particle Physics Life Sciences

eScience in the Netherlands Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl We work

TCS (eScience) Personal CA Milan Sova Context TCS: TERENA SSL CA TERENA eScience SSL

eScience Projects in Projects in eScience Singapore Singapore Lawrence Wong National Grid

Building Virtual Communities with eScience Andy Parker Director, Cambridge eScience Centre What

Growi rowing ng resea research whi rch which com ch comput putes es Nick Jones Director

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

eScience on Distributed Infrastructure in Poland Marian Bubak AGH University of Science and

The Promise and Perils of Data Science in the Wild Data Science &amp; Society Seminar | eScience

HPC AND IN THE DATA CENTER Peter Messmer, DATE 2019, March 27 2019 RISE OF GPU COMPUTING 1000X

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

A fast Statistical Colocalization Method for 3D Live Cell Imaging and Super- Resolution

Superresolution Imaging for Neuroscience Jan Tonnesen, U. Valentin N agerl Experimental

Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere Yanjun Li Yoram Bresler

2 0 optimization for single molecule localization microscopy Laure Blanc-Fraud G.

Semi-blind deconvolution in 4Pi-microscopy Robert St uck Institute for numerical and applied

Data Curation at Large Experimental Facilities with Open Source Software Line Pouchard, Pavol

Using imaging to measure neural Outline: ac5vity 1. History

The new CDC population health initiative to improve health in 5 years or less Office of the

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

The Promise and Perils of Data Science in the Wild Data Science & Society Seminar | eScience