Dune living knowledge WWU Mnster J Fahlke, Christian Engwer July - PowerPoint PPT Presentation

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE Dune living knowledge WWU Münster Jö Fahlke, Christian Engwer July 16, 2014

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 2 /27 ExaDune Dune+PDELab ◮ Framework for solving PDEs ◮ Already good at MPI-Parallelization Exa-Scale Computers ◮ Arriving around 2018 living knowledge ◮ Many processing units WWU Münster ◮ Little memory per processing unit ◮ Accelerator hardware (e.g. GPU, MIC, etc.) , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 3 /27 Outline Intro Threading Vectorization living knowledge WWU Münster Conclusions Outlook , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 4 /27 Anatomy of a PDE solver Two components eat most of the CPU cycles and benefit most from parallelization: ◮ Linear algebra (Steffen Müthing) living knowledge ◮ Assembling the linear system WWU Münster (this talk) , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 5 /27 DUNE-PDELab design applications pdelab user code local function local operator Assembler space grid function time integrator grid operator space living knowledge petsc Lineare Algebra WWU Münster core modules grid geometry istl localfunctions common , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 6 /27 Challenges for PDE frameworks ◮ Assembly kernels often written by the framework user ◮ User is not an expert in programming = ⇒ We can’t rely on obscure languages ◮ Avoid multiple versions of a kernel living knowledge = ⇒ Kernels must be portable. WWU Münster ◮ Keep it open = ⇒ Avoid relying on proprietary languages and libraries. , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 7 /27 Threading living knowledge WWU Münster , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 8 /27 Why Threading? ◮ Reduced memory overhead compared to message passing ◮ Reduced communication overhead. living knowledge WWU Münster , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 9 /27 Challenges in threading ◮ Choosing a partitioning scheme. ◮ Choosing a race avoidance strategy. living knowledge WWU Münster , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 10 /27 Partitioning Strategies strided ranged sliced tensor living knowledge ◮ Strided: calculated on the fly, but only efficient with random WWU Münster access iterators. ◮ Ranged: memory efficient, needs preprocessing or random access iterators. ◮ General (sliced, tensor): not memory efficient, but enables coloring, or tuning for small surface area. , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 11 /27 Data Access Strategies Races can occur when accumulating into global data structures. Strategies to avoid races: ◮ batched: Batched writeback with global lock. ◮ elock: One lock per mesh entity ◮ coloring: partitions of the same color do not “touch”. living knowledge Other strategies not considered here: WWU Münster ◮ global locking ◮ race-free schemes ◮ not considered since it is often not possible. ◮ but tried by R. Klöfkorn (Proceedings of ALGORITMY 2012). , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 12 /27 Test Setup ◮ Stationary advection problem in Ω , ∇ · ( − A ( x ) ∇ u + b ( x ) u ) + c ( x ) u = f with appropriate Dirichlet and outflow boundary conditions. ◮ DG (weighted SIPG). living knowledge ◮ Orthonormalized P k basis. WWU Münster ◮ Wall time for assembly of residual and jacobian. ◮ As many threads as possible. , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 13 /27 Results batched elock colored strided 1048ns (7%) 350ns (22%) ranged 712ns (10%) 209ns (37%) sliced 712ns (10%) 212ns (37%) 209ns (36%) tensor 715ns (10%) 208ns (37%) 211ns (36%) living knowledge Table: PHI, degree=1, jacobian, threads=240. WWU Münster ◮ Runtime per DoF and (efficiency). , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 14 /27 CPU vs. PHI CPU CPU CPU PHI PHI PHI PHI k t t t t t t t 1 10 20 1 120 240 60 0 4.59 µ s 0.74 µ s 0.54 µ s 59.57 µ s 1.33 µ s 1.17 µ s 1.20 µ s 1 1.38 µ s 0.22 µ s 0.17 µ s 18.92 µ s 0.37 µ s 0.27 µ s 0.26 µ s 2 1.10 µ s 0.15 µ s 0.12 µ s 17.12 µ s 0.32 µ s 0.21 µ s 0.19 µ s 3 1.29 µ s 0.16 µ s 0.13 µ s 19.84 µ s 0.36 µ s 0.23 µ s 0.20 µ s 4 1.52 µ s 0.18 µ s 0.15 µ s 5 1.81 µ s 0.21 µ s 0.18 µ s living knowledge Runtimes per dof, degree k , jacobian, sliced partitioning, entity-wise locking WWU Münster ◮ CPU still better than PHI. , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 14 /27 CPU vs. PHI CPU CPU CPU PHI PHI PHI PHI k t t t t t t t 1 10 20 1 120 240 60 0 4.59 µ s 0.74 µ s 0.54 µ s 59.57 µ s 1.33 µ s 1.17 µ s 1.20 µ s 1 1.38 µ s 0.22 µ s 0.17 µ s 18.92 µ s 0.37 µ s 0.27 µ s 0.26 µ s 2 1.10 µ s 0.15 µ s 0.12 µ s 17.12 µ s 0.32 µ s 0.21 µ s 0.19 µ s 3 1.29 µ s 0.16 µ s 0.13 µ s 19.84 µ s 0.36 µ s 0.23 µ s 0.20 µ s 4 1.52 µ s 0.18 µ s 0.15 µ s 5 1.81 µ s 0.21 µ s 0.18 µ s living knowledge Runtimes per dof, degree k , jacobian, sliced partitioning, entity-wise locking WWU Münster ◮ CPU still better than PHI. ◮ Unfair comparison: SIMD units are 128bit for CPU and 512bit for PHI. ⇒ Requires vectorization. = , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 15 /27 Conclusions ◮ Useful partitionings: ranged: memory efficient general: as a backup, allows coloring ◮ Useful data access strategies: living knowledge entity-wise locking: general, good performance WWU Münster coloring: good performance, needs particular partitioning ◮ Need vectorization. , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 16 /27 Vectorization living knowledge WWU Münster , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 17 /27 Zoology of Devices ◮ CPU ◮ Lots of smart heuristics ◮ Good single-thread performance Typical stats (1 UMA node): ◮ 10 cores, 20 threads living knowledge ◮ 2 SIMD lanes ( double precision) WWU Münster ◮ 48GiB memory ◮ Phi ◮ GPU , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 17 /27 Zoology of Devices ◮ CPU ◮ Phi ◮ Simplified CPU ◮ Needs a host system for housing ◮ Main program can run on host (with offloading) or native on device living knowledge Typical stats: WWU Münster ◮ 60 cores, 240 threads ◮ 8 SIMD lanes ( double precision) ◮ 8GiB memory ◮ GPU , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 17 /27 Zoology of Devices ◮ CPU ◮ Phi ◮ GPU ◮ Very basic processor ◮ Needs host system for housing and scheduling ◮ Main programs runs on host, offloading to device living knowledge Typical stats: WWU Münster ◮ 2000–3000 cores currently* ◮ 32 SIMT lanes ◮ 5–12GiB memory *Meaning of core differs from CPU/PHI , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 17 /27 Zoology of Devices ◮ CPU ◮ Phi living knowledge ◮ GPU WWU Münster , , Jö Fahlke, Christian Engwer

W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER Hybrid Parallelization of Assembly in DUNE 18 /27 Programming Approaches I Unsuitable approaches: ◮ Intrinsics (non-portable) ◮ Special language (needs special compiler) living knowledge ◮ Autovectorizer (difficult to drive portably) WWU Münster , , Jö Fahlke, Christian Engwer

Dune living knowledge WWU Mnster J Fahlke, Christian Engwer July - PowerPoint PPT Presentation

W ESTFLISCHE W ILHELMS -U NIVERSITT M NSTER Hybrid Parallelization of Assembly in DUNE Dune living knowledge WWU Mnster J Fahlke, Christian Engwer July 16, 2014 W ESTFLISCHE W ILHELMS -U NIVERSITT M NSTER Hybrid

DUNE APA Requirements study Yichen Li Brookhaven National Laboratory DUNE APA Consortium

DUNE BSM Physics Paper DUNE Collaboration The Deep Underground Neutrino Experiment (DUNE) will be

Getting Started with DUNE's Software and Computing Thomas R. Junk Young Dune September 16, 2016

DUNE timing system Stoyan Trilov, University of Bristol DUNE UK Meeting 11/12/2019 1 Outline

Overcoming Neutrino Interaction Mis-modeling with DUNE-PRISM New Perspectives 2019 2019-06-11

Proposal to add DUNE to the OSG Council Ken Herner for the DUNE Collaboration CHEP 2019 13 Dec

DUNE Photon Detector Review Photosensor Baseline & Testing V. Zutshi for the DUNE Photon

High-pressure gaseous argon in the DUNE near detector Andy Furmanski, on behalf of the DUNE

DUNE CVN Alexander Radovic College of William and Mary on behalf of the DUNE Experiment Who is

Machine Learning-based Trigger for DUNE Guanqun Ge, Columbia University on behalf of DUNE

DUNE detector design and low- energy reconstruction capabilities Ins Gil Botella Supernova

DUNE Near Detector Overview Alfons Weber for the DUNE ND Design Group DESY, 21-Oct-2019

DUNE Single-Phase FD DAQ Overview Matt Graham, SLAC on behalf of DAQ team DUNE Calibration

DUNE-PRISM PHYSICS OPPORTUNITIES AT THE NEAR DUNE DETECTOR HALL FERMILAB DECEMBER 3 RD , 2018

Comments on DUNE DAQ Challenges Architecture Ba Babak Abi DUNE DAQ Simulations Meeting 16 16

DUNE Project Status Jolie Macier DUNE PMG Meeting 17 April 2018 Outline ES&H Update

Economics of Cybercrime The Influence of Perceived Cybercrime Risk on Online Service Adoption of

Debugging Stages Coding Testing living knowledge Instrumentation WWU Mnster Tools , ,

Evaluation and Reproducibility of Numerical Simulations Stephan Rave living knowledge Software

Deducing Errors Andreas Zeller 1 Obtaining a Hypothesis Problem Report Deducing from Code

Advances concerning multiscale methods and uncertainty quantification in E XA -D UNE P. Bastian,

THE HOPF INVARIANT IN TOPOLOGY AND ALGEBRA Andrew Ranicki (Edinburgh and M unster)

Sequentially split -homomorphisms (Part I) Workshop on Structure and Classification of C

Finite group actions and the UCT problem Workshop on Model Theory and Operator Algebras G abor

Dune living knowledge WWU Mnster J Fahlke, Christian Engwer July - PowerPoint PPT Presentation

W ESTFLISCHE W ILHELMS -U NIVERSITT M NSTER Hybrid Parallelization of Assembly in DUNE Dune living knowledge WWU Mnster J Fahlke, Christian Engwer July 16, 2014 W ESTFLISCHE W ILHELMS -U NIVERSITT M NSTER Hybrid

DUNE APA Requirements study Yichen Li Brookhaven National Laboratory DUNE APA Consortium

DUNE BSM Physics Paper DUNE Collaboration The Deep Underground Neutrino Experiment (DUNE) will be

Getting Started with DUNE's Software and Computing Thomas R. Junk Young Dune September 16, 2016

DUNE timing system Stoyan Trilov, University of Bristol DUNE UK Meeting 11/12/2019 1 Outline

Overcoming Neutrino Interaction Mis-modeling with DUNE-PRISM New Perspectives 2019 2019-06-11

Proposal to add DUNE to the OSG Council Ken Herner for the DUNE Collaboration CHEP 2019 13 Dec

DUNE Photon Detector Review Photosensor Baseline &amp; Testing V. Zutshi for the DUNE Photon

High-pressure gaseous argon in the DUNE near detector Andy Furmanski, on behalf of the DUNE

DUNE CVN Alexander Radovic College of William and Mary on behalf of the DUNE Experiment Who is

Machine Learning-based Trigger for DUNE Guanqun Ge, Columbia University on behalf of DUNE

DUNE detector design and low- energy reconstruction capabilities Ins Gil Botella Supernova

DUNE Near Detector Overview Alfons Weber for the DUNE ND Design Group DESY, 21-Oct-2019

DUNE Single-Phase FD DAQ Overview Matt Graham, SLAC on behalf of DAQ team DUNE Calibration

DUNE-PRISM PHYSICS OPPORTUNITIES AT THE NEAR DUNE DETECTOR HALL FERMILAB DECEMBER 3 RD , 2018

Comments on DUNE DAQ Challenges Architecture Ba Babak Abi DUNE DAQ Simulations Meeting 16 16

DUNE Project Status Jolie Macier DUNE PMG Meeting 17 April 2018 Outline ES&amp;H Update

Economics of Cybercrime The Influence of Perceived Cybercrime Risk on Online Service Adoption of

Debugging Stages Coding Testing living knowledge Instrumentation WWU Mnster Tools , ,

Evaluation and Reproducibility of Numerical Simulations Stephan Rave living knowledge Software

Deducing Errors Andreas Zeller 1 Obtaining a Hypothesis Problem Report Deducing from Code

Advances concerning multiscale methods and uncertainty quantification in E XA -D UNE P. Bastian,

THE HOPF INVARIANT IN TOPOLOGY AND ALGEBRA Andrew Ranicki (Edinburgh and M unster)

Sequentially split -homomorphisms (Part I) Workshop on Structure and Classification of C

Finite group actions and the UCT problem Workshop on Model Theory and Operator Algebras G abor

DUNE Photon Detector Review Photosensor Baseline & Testing V. Zutshi for the DUNE Photon

DUNE Project Status Jolie Macier DUNE PMG Meeting 17 April 2018 Outline ES&H Update