EDGE: Extreme-scale Discontinuous Galerkin Environment Alexander - PowerPoint PPT Presentation

EDGE: Extreme-scale Discontinuous Galerkin Environment Alexander Breuer, Alexander Heinecke (Intel), Yifeng Cui

Getting Started: Advection Equation q ( x, t ) t + v · q ( x, t ) x = 0 , v ∈ R • “Simplest” hyperbolic Partial Differential Equation (PDE) • Elastic wave equations similar: Linear system with variable coefficients

Getting Started: Fused Solver q ( x, t ) t + v · q ( x, t ) x = 0 , v ∈ R • Non-Fused: o 1 = s ( i 1 ) o 4 = s ( i 4 ) o 3 = s ( i 3 ) o 2 = s ( i 2 ) • Fused: O 4 = ( o 1 , o 2 , o 3 , o 4 ) = S 4 ( I 4 ) = S 4 ( i 1 , i 2 , i 3 , i 4 )

DOFs: Non-Fused vs. Fused fused runs fused runs fused runs 0 1 2 3 0 1 2 3 0 1 2 3 0 0 3 6 0 0 1 2 3 12 13 14 15 12 13 14 15 modes modes 1 1 4 7 1 4 5 6 7 16 17 18 19 16 17 18 19 2 2 5 8 2 8 9 10 11 20 21 22 23 20 21 22 23 0 1 2 0 1 2 elements elements

Key Advantages • Full vector operations, even 6.8 7 for sparse matrix operators relative arithmetic 6 4.9 5 4.0 intensity • Automatic memory alignment 3.3 3.1 4 2.7 2.5 2.4 3 2.0 2.0 • Read-only data shared 1.8 1.9 1.7 1.7 1.5 1.4 2 1.0 1.0 1.0 1.0 among all runs 1 0 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 1 2 4 8 16 1 • Lower sensitivity to latency (memory & network) Relative arithmetic intensities. Shown are convergence rates 2-5 and fusion of 2,4,8,16 simulations vs. non-fused for the elastic wave equations, using an ADER-DG solver. [ISC17]

� � � � � � � � � � � � � � � � � � � � � � � � “Similar Enough”: EDGE’s Approach 1 2 1. Identical mesh for all fused simulations 2. Identical simulations parameters: 3 4 1. Start and end time 2. Convergence rate 3. “Frequency” of wave field output, “frequency” and location of seismic receivers 5 6 3. Identical material parameters (velocity model) 4. “Sources”: � 1. Arbitrary initial DOFs 7 8 2. Kinematic sources: Fused or non-fused point sources 3. Spontaneous rupture: Identical friction law, other parameters � (e.g., nucleation, initial stresses, coefficients) arbitrary � mulations (SoA) with point sources at di � erent locations

Performance: LOH.1 • Orders: 2-6 (non-fused), 2-4 (fused) • Unstructured tetrahedral mesh: 350,264 elements LOH.1 Benchmark: Example mesh • Single node of Cori-II (68 core Intel and material regions [ISC16_1] Xeon Phi x200,   1 code-named Knights   0.8 0.6 Landing) 0.4 0.2 u (m/s) 0 • EDGE vs. SeisSol (GTS, git-tag -0.2 -0.4 201511) -0.6 reference -0.8 EDG Ǝ O4 -1 -1.2 0 1 2 3 4 5 6 7 8 9 time (s) Synthetic seismogram of EDGE for quantity u at the ninth seismic receiver located at (8647 m, 5764 m, 0) in red. The reference solution is shown in black. Detailed setup: [ISC17]

Fused Simulations: Speedup 4.60 EDGE over speedup: 4 SeisSol 2.87 3 1.82 2 1.24 0.91 0.96 0.80 0.74 1 0 O2C1 O2C8 O3C1 O3C8 O4C1 O4C8 O5C1 O6C1 configuration (order, #fused simulations) Speedup of EDGE over SeisSol (GTS, git-tag 201511). Convergence rates O2 − O6: single non-fused forward simulations (O2C1-O6C1). Additionally, per-simulation speedups for orders O2 − O4 when using EDGE’s full capabilities by fusing eight simulations (O2C8-O4C8). [ISC17]

Weak: Setup • Regular cubic mesh, 5 Tets per Cube, 4th order (P3) and 6th order (P5) 1 10 • Imitates convergence 0 10 -1 benchmark 10 O1 Q8 C1 O1 Q8 C4 -2 10 O1 Q8 C8 • 276K elements per node O2 Q8 C1 linf error -3 10 O2 Q8 C4 O2 Q8 C8 • 1-9000 nodes of Cori-II (9000 -4 O3 Q8 C1 10 O3 Q8 C4 nodes = 612,000 cores) O3 Q8 C8 -5 10 O4 Q8 C1 O4 Q8 C4 -6 10 O4 Q8 C8 O5 Q8 C1 -7 10 O5 Q8 C4 Convergence of EDGE in the L ∞ -norm. Shown are orders O1 − O5 for v (Q8) when utilizing O5 Q8 C8 -8 EDGE’s fusion capabilities with shifted initial conditions. For clarity, from the total of eight fused 10 50 25 20 10 5 3 1/3 2.5 2 simulations, only errors of the first (C1), fourth (C4) and last simulation (C8) are shown. [ISC17] edge length (m)

Weak: Results �� • O6C1 @ 9K nodes: �� 10.4 PFLOPS (38% �� of peak) �� • O4C8 vs. O4C1 @ �� 9K nodes:   �� 2.0x speedup �� Weak scaling study on Cori-II. Shown are hardware and non-zero peak efficiencies �� in flat mode. O denotes the order and C the number of fused simulations. [ISC17]

Strong: LOH.1 • Orders: 4 & 6 (non-fused), 4 (fused) • Unstructured tetrahedral LOH.1 Benchmark: Example mesh and material regions [ISC16_1] mesh: 172,386,915 elements 0.02 • 32-3200 nodes of Theta (64 core Intel Xeon Phi x200,   2 0.01 frequency (Hz) 1 code-named Knights Landing) 0 0.4 • 3200 nodes = 204,800 cores -0.01 -0.02 0 1 2 3 4 5 6 7 8 time (s) Time-frequency misfit for quantity u at the ninth seismic receiver located at (8647 m, 5764 m, 0) and in a frequency range between 0.13Hz and 5Hz. Detailed setup: [ISC17], Visualization: TF-MISFIT_GOF_CRITERIA, http://nuquake.eu

Strong: Results �� • O6C1 @ 3.2K nodes: �� 3.4 PFLOPS (40% of �� peak) �� • O4C8 vs. O4C1 @ �� 3.2K nodes:   �� 2.0x speedup �� Strong scaling study on Theta. Shown are hardware and non-zero peak efficiencies �� in flat mode. O denotes the order and C the number of fused simulations. [ISC17]

EDGE: Current and Upcoming • Sparse, fused assembly   • Elements: Line, rectangular quads, 3-node triangles, rectangular hexes, 4-node tets kernels for orders 5+ • Equations: Advection (FV+ADER-DG: 1D, • Kinematic Sources   2D, 3D), Shallow Water (FV: 1D), Elastic (Standard Rupture Format): Wave Equations (FV+ADER-DG: 2D, 3D) Support for fused and   • Parallelization: Assembly kernels for non-fused source descriptions WSM, SNB, HSW, KNC (non-fused), KNL • Spontaneous Rupture (fused & non-fused), OpenMP (custom), MPI (overlapping) Simulations • Continuity: Continuous Integration (sanity • Grouped Local Time Stepping checks), Continuous Delivery (automated • EDGEcut: Automated surface   convergence + benchmarks runs), and volume meshing automated code coverage, automated license checks, container bootstrap • Public in next few weeks:   http://dial3343.org • License: 3-clause BSD

EDGE: Extreme-scale Discontinuous Galerkin Environment Alexander - PowerPoint PPT Presentation

EDGE: Extreme-scale Discontinuous Galerkin Environment Alexander Breuer, Alexander Heinecke (Intel), Yifeng Cui Getting Started: Advection Equation q ( x, t ) t + v q ( x, t ) x = 0 , v R Simplest hyperbolic Partial

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

EDGE: Extreme Scale Fused Seismic Simulations with the Discontinuous Galerkin Method Alexander

Novel non-hydrostatic numerical schemes based on discontinuous Galerkin finite element method

The Discontinuous Galerkin Method for the Compressible Navier-Stokes Equations Miloslav

A reconstruction-enhanced discontinuous Galerkin method for hyperbolic problems V aclav Ku

Adaptive discontinuous Galerkin method for tsunami modeling and prediction on a global scale Jan

ExaDG: High-order discontinuous Galerkin for the exa-scale Guido Kanschat 1 Katharina Kormann 2

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

Extreme Heat Preparedness Objectives What is extreme heat ? How does it impact SF? What are the

2014: Extreme territories 2 2015: Extreme territories 3 2016: Extreme territories 4 2018:

Application of the discontinuous Galerkin method to 3D compressible RANS simulations of a high

A Multiscale Discontinuous Galerkin Method P. Bochev Computational Mathematics and Algorithms

An h -adaptive asynchronous spacetime discontinuous Galerkin method for TD analysis of complex

Spectral/Discontinuous Galerkin approach to fully kinetic simulations of plasma turbulence with

Adaptive Discontinuous Galerkin Methods on Polytopic Meshes Paul Houston School of Mathematical

The Developing Regulatory Style of the CNSC A Personal View Mike Taylor This Talk A quick

The Nuclear Option: Decommissioning (and waste) Climate Change: the Energy Conundrum College of

Update on RegCM4 developments and plans Filippo Giorgi Abdus Salam ICTP, Trieste, Italy Eighth

AllHandsMee2ng ESnet:NetworkingforScience March10,2010

Ramin Raziperchikolaei and Miguel A. Carreira-Perpi n an Electrical Engineering and

Ramin Raziperchikolaei Electrical Engineering and Computer Science University of California,

Source (A) Print: 3 01=NYT 02=WAPO 03=Chicago Tribune 04=Honolulu Advertiser 05=Wall St

Researching Television News in the Era Before Videotape: Election-Night Forecasting, 1952 Ira