REXI: breaking the time step constraint David Acreman, Jemma - PowerPoint PPT Presentation

REXI: breaking the time step constraint David Acreman, Jemma Shipton, Colin Cotter and Beth Wingate

Why REXI? • Trends in processor design are towards increasing number of cores • Strong scaling of domain decomposition is limited • Timestep limits weak scaling • We need to find parallelism elsewhere https://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend-data/

Rational approximation of exponential integrator (REXI) Apply n forward Euler time steps Approximate the exponential α k and β k are pre-computed complex numbers. Terms in the summation can be calculated in parallel Schreiber et al, 2017, Beyond spatial scalability limitations with a massively parallel method for linear oscillatory problems, International Journal of High Performance Computing Applications

Rational approximation of exponential integrator (REXI) No. of Gaussians Width of Gaussian Approximate the exponential using Gaussian basis functions Approximate Gaussians as sum of rational terms a l and μ are pre-computed constants (Haut et al, 2015) Terms in the sum over M can be calculated in parallel hM > |tλ MAX | Haut et al, 2015, A high-order time-parallel scheme for solving wave propagation problems via the direct construction of an approximate time-evolution operator, IMA Journal of Numerical Analysis (2016) 36, 688–716

REXI study • REXI results presented in Schreiber et al (2017) for benchmark problems applied to shallow water equations • We will also solve the shallow water equations but with some significant di ff erences: • Finite di ff erence or spectral → finite elements (Firedrake) • Regular unit square → icosahedral sphere in physical co-ordinates • Looking for speed up over conventional time stepping Schreiber et al, 2017, Beyond spatial scalability limitations with a massively parallel method for linear oscillatory problems, International Journal of High Performance Computing Applications

Convergence tests • Initial conditions: polar wave • Run REXI with varying number of terms (M) with h=0.2 (width of Gaussian) • Check L2 error norm vs reference solution (implicit mid- point method with 25s time step) • Increase REXI time step (t) and determine the number of terms (M) required to achieve convergence • Expect: hM > |tλ MAX |

h=0.2, refinement level=3 1x10 11 t=7 500s t=15 000s hM > |tλ MAX | t=30 000s 1x10 10 t=60 000s t=120 000s 1x10 9 U L 2 error norm 1x10 8 1x10 7 1x10 6 λ MAX t/ks M 100000 7.5 64 0.0017 10000 15 112 0.0015 64 128 192 256 320 384 448 512 576 640 704 768 832 896 960 Number of REXI terms (M) 30 224 0.0015 Increasing t requires larger M (linear) ✅ 60 432 0.0014 Increasing t increases error ✅ 120 864 0.0016

hM is constrained but what about h on its own? 1x10 11 h=0.1 t=30 000s, refinement level=3 h=0.2 h=0.4 h=0.8 1x10 10 h=1.6 h=2.4 h=3.2 1x10 9 U L 2 error norm 1x10 8 1x10 7 h M hxM 1x10 6 0.2 224 44.8 0.4 112 44.8 100000 32 64 96 128 160 192 224 256 288 320 352 384 0.8 64 51.2 Number of REXI terms (M) hM > |tλ MAX |≈45 ⇒ λ MAX ≈0.0015 1.6 32 51.2

Can we use h=1.6 with a larger t? 1x10 11 1x10 11 h=0.1 h=0.1 t=30 000s t=60 000s h=0.2 h=0.2 h=0.4 h=0.4 h=0.8 h=0.8 1x10 10 1x10 10 h=1.6 h=1.6 1x10 9 1x10 9 U L 2 error norm U L 2 error norm 1x10 8 1x10 8 1x10 7 1x10 7 1x10 6 1x10 6 100000 50 100 150 200 250 300 350 100000 50 100 150 200 250 300 350 Number of REXI terms (M) Number of REXI terms (M) 1x10 11 1x10 11 h=0.1 t=120 000s h=0.1 h=0.2 t=240 000s h=0.2 h=0.4 h=0.4 h=0.8 h=0.8 h=1.6 1x10 10 h=1.6 1x10 10 U L 2 error norm U L 2 error norm 1x10 9 1x10 9 1x10 8 1x10 8 1x10 7 1x10 7 1x10 6 1x10 6 50 100 150 200 250 300 350 50 100 150 200 250 300 350 Number of REXI terms (M) Number of REXI terms (M)

What about resolution ( λ max )? 1x10 11 1x10 11 h=0.1 h=0.1 h=0.2 refinement level=2 refinement level=3 h=0.2 h=0.4 h=0.4 h=0.8 1x10 10 h=0.8 1x10 10 h=1.6 h=1.6 1x10 9 1x10 9 U L 2 error norm U L 2 error norm 1x10 8 1x10 8 1x10 7 1x10 7 1x10 6 1x10 6 100000 100000 50 100 150 200 250 300 350 50 100 150 200 250 300 350 Number of REXI terms (M) Number of REXI terms (M) 1x10 11 1x10 11 h=0.1 h=0.1 refinement level=4 h=0.2 refinement level=5 h=0.2 h=0.4 h=0.4 h=0.8 h=0.8 1x10 10 h=1.6 1x10 10 h=1.6 1x10 9 U L 2 error norm U L 2 error norm 1x10 9 1x10 8 1x10 8 1x10 7 1x10 7 1x10 6 1x10 6 100000 50 100 150 200 250 300 350 50 100 150 200 250 300 350 Number of REXI terms (M) Number of REXI terms (M)

Scaling tests • Measure time for a single REXI step using PyOP2 timed stage (average over three runs, no I/O in timed region) • h=0.2 and 1.6, minimum M for convergence, refinement level 3 • Single node scaling on Archer: 24 cores per node (2x12) • Specify placement to ensure MPI processes are distributed evenly between sockets

h=0.2, refinement level=3 1200 t=7500, M=64 t=15000, M=112 1100 t=30000, M=224 t=60000, M=432 t=120000, M=864 1000 Model time / Wallclock time 900 800 700 600 500 400 300 200 0 4 8 12 16 20 24 No. of processors Reference solution: 115 (1 proc) → 1300 (24 procs)

h=1.6, refinement level=3 9000 t=30000, M=32 t=60000, M=64 t=12000, M=112 8000 t=240000, M=224 7000 Model time / Wallclock time 6000 5000 4000 3000 2000 1000 0 4 8 12 16 20 24 No. of processors Reference solution: 115 (1 proc) → 1300 (24 procs)

Future work • What value of h to use? Does this depend on the initial conditions (or other factors)? • How to trade-o ff speed and accuracy? • For a given spatial resolution (a ff ects λ MAX ) and t • Determine maximum h and minimum M for convergence ( hM > |tλ MAX | ) • Measure error vs reference solution and time to solution • Improve time to solution by reducing MPI overhead: examine in more detail with profiler (e.g. determine load balance)

Build with Intel toolchain and run DG advection example under MPI profiler: Each line is an MPI process Time in MPI_Bcast Communication between processes

REXI: breaking the time step constraint David Acreman, Jemma - PowerPoint PPT Presentation

REXI: breaking the time step constraint David Acreman, Jemma Shipton, Colin Cotter and Beth Wingate Why REXI? Trends in processor design are towards increasing number of cores Strong scaling of domain decomposition is limited

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Quick guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step 3:

Step by step guide Step 1: Purchasing an RSBlog! membership Step 2: Downloading RSBlog! Step 3:

Step by step guide Step 1: Purchasing an RSEvents! membership Step 2: Downloading RSEvents! Step

Step by step guide Step 1: Accessing the account Step 2: Download RSFiles! 2.1 Download the

Step 1 Step 2 Step 3 Step 4 Step 5 Preparation of a sketch Submission of birth map of all

Quick guide Step 1: Purchasing RSMail! Step 2: Download RSMail! Step 3: Installing RSMail! Step

Credential Assessment Mapping Privilege Escalation at Scale Matt Weeks @scriptjunkie1 Adversary

Background CLINICAL TRIAL AGREEMENT ACTIVITY TIME TRACKING PI / Department Reviews Other

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Step by step guide Step 1: Purchasing a RSMembership! membership Step 2: Download RSMembership!

Selection of Design Team Step 3 Design Step 4 June 2013 Project Management Concept

Step by step guide Step 1: Purchasing an RSMail! membership Step 2: Download RSMail! 2.1.

Step by step guide Step 1: Purchasing a RSFirewall! membership Step 2: Download RSFirewall! 2.1.

Step by step guide Step 1: Purchasing a RSTickets!Pro membership Step 2: Downloading

Some KC Tools @ UCD / UL C. Menc a, A. Previti, A. Ignatiev, A. Morgado (et al.) Joao

Global Convergence of Block Coordinate Descent in Deep Learning 1 Jiangxi Normal Univ. * Equal

EI331 Signals and Systems Lecture 31 Bo Jiang John Hopcroft Center for Computer Science

WiFi Can Be the Weakest Link of Round Trip Network Latency in the Wild Changhua Pei , Youjian

Overview of Line Search Topics Problem Definition Problem definition f ( ) Line search

using R frauds, robberies, liabilities, ...) Two complementary approaches: historical data

A Flexible Learning System for Wrapping Tables and Lists or How to Write a Really

Interaction between HNASS security services Visual view of Peers after scanning process