Benson K. Muite 1 Arvutiteaduse Institute Tartu likool, Estonia & - PowerPoint PPT Presentation

Benson K. Muite 1 Arvutiteaduse Institute Tartu Ülikool, Estonia & 2 Pole Pole Enterprises, Kenya Samar A. Aseeri King Abdullah University of Science and Technology (KAUST), Saudi Arabia

§ Motivation § Klein-Gordon equation § Method of Implementation § Hardware Description § Numerical Experiments § Outlook

§ While there has been significant effort on numerical analysis of different computational methods, there is less work comparing the effectiveness of a particular parallel numerical methods. § In this work, several numerical methods for solving the one dimensional Klein Gordon equation on a single core are reviewed and their effectiveness evaluated. § The Klein Gordon equation is chosen as a mini-application because it is relatively simple, can be used to evaluate different time stepping methods and spatial discretization methods, and is representative of seismic wave solvers, and weather codes, all of which use a large amount of high performance computing time. § As a prelude to a three dimensional study of parallel solvers, a comparison of solvers for the one dimensional Klein Gordon equation on five architectures is presented showing the effects of discretization method on time to solution for a specified accuracy on a single core. § Such a method can be informative in choosing where to run an application to get the most cost efficient results.

§ The Klein-Gordon equation occurs as a modification of the linear Schrödinger equation that is consistent with special relativity. § The one dimensional Klein-Gordon equation takes the form: ! "" = ∆! − ! + ! ' . § The cubic Klein-Gordon equation is a simple but non-trivial partial differential equation whose numerical solution has the main building blocks required for the solution of many other partial differential equations. § In our previous study, Solving the Klein-Gordon equation using Fourier spectral methods: A benchmark test for computer performance, the library 2DECOMP&FFT was used in a Fourier spectral scheme to solve the Klein-Gordon equation and strong scaling of the code was examined on thirteen different machines for a problem size of 512 3 . § The problem was chosen to be large enough to solve on a workstation, yet also of interest to solve quickly on a supercomputer, in particular for parametric studies. § We concluded that unlike the linpack benchmark, a high ranking will not be obtained by simply building a bigger computer.

§ Aseeri et al. examined performance of a Fourier pseudospectral solver for the three dimensional Klein Gordon equation using a second order time stepping scheme: ! "#$ − 2! " + ! "($ = ∆ − 1 ! "($ + 2! " + ! "#$ + ! " ! " + )* + 4 § In high performance computing, finite difference methods on uniform grids are often used because they are easy to parallelize and have good scalability properties. § Typically, low order finite difference methods are used. § It may be the case that these are not the most efficient. § The Klein Gordon equation is a model problem for which one can test the efficiency of different solution methods.

§ This work gives a comparison of 4 th , 6 th and 8 th finite difference approximations for an exact solution of the one dimensional Klein Gordon equation. Order Approximation for u xx 1 2nd =( - ! >?@ − 2! > + ! >B@ 4th =( - −! >?- 1 12 + 4! >?@ − 5! > 2 + 4! >B@ − ! >B- 3 3 12 1 ! >?E 90 − 3! >?- 20 + 3! >?@ − 49! > 20 + 3! >B@ − 3! >B- 20 + ! >BE 6th =( - 2 2 90 8th =( - −! >?F 1 560 + 8! >?E 315 − ! >?- 5 + 8! >?@ − 205! > + 8! >B@ − ! >B- 5 + 8! >BE 315 − ! >BF 5 72 5 560 § The sparse linear system is solved using a conjugate gradient algorithm, using the previous iterate as an initial starting guess. § For these programs, memory bandwidth is a limiting factor and to minimize the number of memory accesses, the coefficients are programmed using a matrix free approach. § The example programs are written in Fortran. Accuracy is evaluated by comparing the exact travelling wave solution ( − *+ 678 (5[−9;, 9;) ! = 2 sech 1 − * - , /01 * = 0.5, +5 0,5

§ The equations will be discretized first in time, and then in space. § Time Stepping algorithms used are: § Semi-Implicit Second Order Leap Frog Method ! "#$ − 2! " + ! "($ = ∆ − 1 ! "#$ + 2! " + ! "($ 0 + ! " )* + 4 § Semi-Implicit Fourth Order Leap Frog Method ! "#$ − 2! " + ! "($ 0 + 2 + = ∆ − 1 ! " + ! " ∆ − 1 + 3! " ! "#$ − 2! " + ! "($ + 6! " ! "#$ − ! " ! " − ! "($ )* + ! 4 § Spatial Discretization: § In schemes that use the Fast Fourier transform, time stepping is done in Fourier space, and nonlinear term is calculated in real space. The derivatives in spectral space are calculated by multiplying by the wave number. For the time-stepping scheme, fixed point iteration is used to calculate the nonlinear term. § High order finite difference discretization for the one dimensional Laplacian operator in given in the previous table. A second iteration is not required to compute the nonlinear term, since the time discretization requires a variable coefficient elliptic equation to be solved at each timestep, for which the iterative conjugate gradient method is well suited, though multigrid methods can also be used.

§ This work focuses on comparing the speed and accuracy of several high order finite difference spatial discretization using a conjugate gradient linear solver and a fast Fourier transform based spatial discretization. § In addition implementations using second and fourth order time-stepping are also included in the comparison. § The work uses accuracy-efficiency frontiers to compare the effectiveness of five hardware platforms ARM CPU, an AMD x86-64 CPU, two Intel x86-64 CPUs and a NEC SX-ACE vector processor. § The example programs are written in Fortran and can be found at https://github.com/bkmgit/KleinGordon1D

§ Hazelhen is a Cray XC 40 supercomputer with Intel Haswell E5-2680v3 chips with a nominal speed of 2.5 GHz and 30 MB L3 Cache. Each node has 24 cores per node (2 chips with 12 cores each) with 136 Gb/s bandwidth and 960 Gflops per node peak performance. § Kabuki is a NEC SX-ACE supercomputer. Each node has 4 cores with 256 GB/s bandwidth and 256 Gflops peak performance. Each chip has a nominal speed of 1 GHz and each core has 1Mb Cache. § Ibex is a heterogeneous cluster with a mix of AMD, INTEL and NVIDIA GPUs. It is made up of 864 nodes. § Isambard is a Cray XC 50 supercomputer with ARM Marvell Thunder X2 chips with a nominal speed of 2.1 GHz and 32MB L3 cache. Each node has 64 cores (2 chips with 32 cores each) with 320 Gb/s bandwidth and 1130 Gflops per node peak performance. Kabuki at HLRS Isambard at UK Ibex at KAUST Hazelhen at HLRS

Hazelhen Kabuki

Ibex - AMD Ibex – Intel Skylake

Isambard - ARM

Hazelhen 10 − 1 Kabuki Ibex Amd Ibex Skylake 10 − 3 L 2 Error at Final Time Isamabard 10 − 5 10 − 7 10 − 9 10 − 11 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 10 3 Compute time (s)

§ High order methods can take advantage of multiple floating point units and so, do not require much more computation time and give smaller error than low order methods. § Their use should be encouraged in the numerical approximation of partial differential equations, this hold also for spectral element methods. § For benchmarks based on mini-applications, compute resources to solution at specified accuracy may be a good metric to use in evaluating performance rather than speed of performing a fixed set of operations. § This would allow for architecture specific flexibility and can minimize cost to solution, though may require some programming effort.

§ Questions/Comments and Collaborations are welcomed § Websites: http://www.fft.report/, http://www.parallelbenchmark.com/, http://parallel.computer/ § Email: samar.aseeri@kaust.edu.sa § Twitter: @samar_hpc § Upcoming Benchmark venues:

Benson K. Muite 1 Arvutiteaduse Institute Tartu likool, Estonia & - PowerPoint PPT Presentation

Benson K. Muite 1 Arvutiteaduse Institute Tartu likool, Estonia & 2 Pole Pole Enterprises, Kenya Samar A. Aseeri King Abdullah University of Science and Technology (KAUST), Saudi Arabia Motivation Klein-Gordon equation Method of

Tartu Art School Tartu Art School Estonia Tartu Tartu Art School Tartu Art School Graphic

Numerical Methods for Computational and Data Sciences Amnir Hadachi and Benson Muite

Oriented Coloring of a Grid Abdullah Makkeh Tartu likool October 4, 2015 Abdullah Makkeh

Data Analysis and Visualization B.K. Muite collaborators S. Arshad, S. Aseeri, D. Acevedo-Feliz,

Mihkel Meidla Mihkel Meidla University of Tartu Tartu University of Estonia Estonia 1 1

WELCOME! BENSON POLYTECHNIC HIGH SCHOOL / DAG #6 (04.25.2019) BENSON POLYTECHNIC H.S. DAG #6 /

CAI Insurance 101 Presented by: Douglas Benson, Esquire Benson PC Makey Towne, CIC, CRM Moody

Benson Polytechnic high School | | MPC 5 P o r tl and P u b l i c S chool s A P A P Ril 1 4

Identity-based encryption and Generic group model (work in progress) Peeter Laud Arvutiteaduse

Experiences using 2decomp&fft to solve Partial Differential Equations using Fourier Spectral

children and neonates: site PI perspective Tuuli Metsvaht, MD, PhD; University of Tartu, Dpt of

project PRIIT KIILASP Tartu City Government The timetable of the meeting 15:00 - 15:15

mobile positioning data REIN AHAS UNIVERSITY OF TARTU, ESTONIA JANIKA RAUN - UNIVERSITY OF

web computer graphics Yaroslava MALASH Tartu, Estonia 2014 Background International 2d year

Tracking moving objects form image sequences Janno Jgeva Mihkel Pajusalu Tartu, 2011 GENERAL

Embrace Disruption You Are Your Process Jim Benson Agile Australia May 2014 Who am I? Jim

Building Java Programs Chapter 11 Sets and Maps reading: 11.2 - 11.3 2 Sets (11.2) set : A

Improving Ibex Performance Greg Chadwick RISC-V Devroom FOSDEM 1st February 2020 Ibex

CARE OF THE EARTH HOLISTIC MISSION Evangelism Aid & Relief Healing and Deliverance

The Espy Project The Espy Project Enabling New Access to Archival Materials Enabling New Access

start with interval contractors Fabrice LE BARS Description 08/06/2015- 2 EASIBEX-MATLAB : A

Efficient Multiple-ISA Embedded Processor Core Design Based on RISC-V Yuanhu Cheng , Libo Huang,

Beyond'the'Wall:' Near0Data'Processing'for'Databases Sam$Xi ,'Ore'Babarinsa,' Manos$Athanassoulis

Webinar: Wordpress Essential Briefing Brace Yourself For Gutenberg! Podcast Interview Check it

Benson K. Muite 1 Arvutiteaduse Institute Tartu likool, Estonia & - PowerPoint PPT Presentation

Benson K. Muite 1 Arvutiteaduse Institute Tartu likool, Estonia & 2 Pole Pole Enterprises, Kenya Samar A. Aseeri King Abdullah University of Science and Technology (KAUST), Saudi Arabia Motivation Klein-Gordon equation Method of

Tartu Art School Tartu Art School Estonia Tartu Tartu Art School Tartu Art School Graphic

Numerical Methods for Computational and Data Sciences Amnir Hadachi and Benson Muite

Oriented Coloring of a Grid Abdullah Makkeh Tartu likool October 4, 2015 Abdullah Makkeh

Data Analysis and Visualization B.K. Muite collaborators S. Arshad, S. Aseeri, D. Acevedo-Feliz,

Mihkel Meidla Mihkel Meidla University of Tartu Tartu University of Estonia Estonia 1 1

WELCOME! BENSON POLYTECHNIC HIGH SCHOOL / DAG #6 (04.25.2019) BENSON POLYTECHNIC H.S. DAG #6 /

CAI Insurance 101 Presented by: Douglas Benson, Esquire Benson PC Makey Towne, CIC, CRM Moody

Benson Polytechnic high School | | MPC 5 P o r tl and P u b l i c S chool s A P A P Ril 1 4

Identity-based encryption and Generic group model (work in progress) Peeter Laud Arvutiteaduse

Experiences using 2decomp&amp;fft to solve Partial Differential Equations using Fourier Spectral

children and neonates: site PI perspective Tuuli Metsvaht, MD, PhD; University of Tartu, Dpt of

project PRIIT KIILASP Tartu City Government The timetable of the meeting 15:00 - 15:15

mobile positioning data REIN AHAS UNIVERSITY OF TARTU, ESTONIA JANIKA RAUN - UNIVERSITY OF

web computer graphics Yaroslava MALASH Tartu, Estonia 2014 Background International 2d year

Tracking moving objects form image sequences Janno Jgeva Mihkel Pajusalu Tartu, 2011 GENERAL

Embrace Disruption You Are Your Process Jim Benson Agile Australia May 2014 Who am I? Jim

Building Java Programs Chapter 11 Sets and Maps reading: 11.2 - 11.3 2 Sets (11.2) set : A

Improving Ibex Performance Greg Chadwick RISC-V Devroom FOSDEM 1st February 2020 Ibex

CARE OF THE EARTH HOLISTIC MISSION Evangelism Aid &amp; Relief Healing and Deliverance

The Espy Project The Espy Project Enabling New Access to Archival Materials Enabling New Access

start with interval contractors Fabrice LE BARS Description 08/06/2015- 2 EASIBEX-MATLAB : A

Efficient Multiple-ISA Embedded Processor Core Design Based on RISC-V Yuanhu Cheng , Libo Huang,

Beyond'the'Wall:' Near0Data'Processing'for'Databases Sam$Xi ,'Ore'Babarinsa,' Manos$Athanassoulis

Webinar: Wordpress Essential Briefing Brace Yourself For Gutenberg! Podcast Interview Check it

Experiences using 2decomp&fft to solve Partial Differential Equations using Fourier Spectral

CARE OF THE EARTH HOLISTIC MISSION Evangelism Aid & Relief Healing and Deliverance