Multiple Right-hand-side Implementation for DDαAMG Shuhei Yamamoto
s.yamamoto@cyi.ac.cy
Simone Bacchio, Jacob Finkenrath
September 22, 2020
1 / 51
Multiple Right-hand-side Implementation for DD AMG Shuhei Yamamoto - - PowerPoint PPT Presentation
Multiple Right-hand-side Implementation for DD AMG Shuhei Yamamoto s.yamamoto@cyi.ac.cy Simone Bacchio, Jacob Finkenrath September 22, 2020 1 / 51 Outline Introduction and Motivation Overview of DD AMG SAP Coarse-grid correction
1 / 51
SAP Coarse-grid correction Performance
Motivation Implementation details Scaling results
Basics Parameters Tuning plots
2 / 51
µ).
3 / 51
N→∞
N
N
4 / 51
f ¯
5 / 51
2
µ=0 (γµ(∆µ + ∆µ) − a∆µ∆µ) + m a where
µ(x − ˆ
6 / 51
3
n=0 Un µν (x) and
µ(x + ˆ
ν(x)
µν(x) =Uν(x + ˆ
µ(x + ˆ
ν(x)Uµ(x)
7 / 51
8 / 51
9 / 51
10 / 51
11 / 51
SAP Coarse-grid correction Performance
Motivation Implementation details Scaling results
Basics Parameters Tuning plots
12 / 51
13 / 51
14 / 51
15 / 51
16 / 51
18 / 51
i
i
19 / 51
rr
bb
20 / 51
21 / 51
22 / 51
23 / 51
c R(b − Dx)
c RD)ε
l+1Rl(b − Dlxl)
l+1RlDl)εl
24 / 51
25 / 51
26 / 51
27 / 51
l+1RlDl)(Il − MlDl)jεl,
28 / 51
l
29 / 51
30 / 51
31 / 51
32 / 51
10 0 10 1 10 2 10 3 10 0 10 1 10 2 SuperMUC-NG - SKL Nodes
DDalphaAMG - Strong scaling - V = 160x80x80x80
single rhs - native ideal scaling
33 / 51
SAP Coarse-grid correction Performance
Motivation Implementation details Scaling results
Basics Parameters Tuning plots
34 / 51
35 / 51
36 / 51
1 rhs 4 rhs 8 rhs Instruction Mix SP Flops DP Flops SP Flops DP Flops SP Flops DP Flops 128-bit 95.26% 86.59% 23.41% 4.99% 24.92% 3.60% 256-bit 2.58% 1.26% 60.68% 78.13% 74.02% 94.76% Total 97.26% 84.03% 98.81%
37 / 51
# nodes 10 0 10 2 10 4 speed up 10 0 10 1 10 2 10 3
1 rhs 4 rhs 8 rhs
# nodes 10 0 10 2 10 4 speed up of coarse grid 10 0 10 1 10 2 10 3
1 rhs 4 rhs 8 rhs
38 / 51
SAP Coarse-grid correction Performance
Motivation Implementation details Scaling results
Basics Parameters Tuning plots
39 / 51
40 / 51
41 / 51
42 / 51
2 4 6 8 10 12 14 Orthoscheme iter 140 160 180 200 220 240 260 280 time (s) CGS Block CGS MGS ICGS Block ICGS IMGS Block IMGS
43 / 51
10 20 30 40 Deflating Eigvecs 110 120 130 140 150 160 time (s) BGMRES GCR DR QR QRDR
44 / 51
10
2
10
1
Residual at the bottom 200 400 600 800 1000 time (s) r1 = 5.1e-01 r1 = 2.6e-01 r1 = 1.4e-01 r1 = 6.9e-02 r1 = 3.6e-02 r1 = 1.8e-02 r1 = 9.4e-03 r1 = 4.8e-03 r1 = 2.5e-03 r1 = 1.3e-03
45 / 51
10 15 20 25 30 35 40 45 50 Max Krylov Space Size 100 150 200 250 300 time (s) #restarts=2 #restarts=4 #restarts=8 #restarts=9 #restarts=20 #restarts=40
46 / 51
4 6 8 10 12 14 Mu Factor 120 140 160 180 200 220 240 time (s) BGMRES with QR
47 / 51
4 rhs 8rhs 100 110 120 130 140 150 160 time (s) Non-block Solver Block Sovler
48 / 51
SAP Coarse-grid correction Performance
Motivation Implementation details Scaling results
Basics Parameters Tuning plots
49 / 51
50 / 51
Journal on Matrix Analysis and Applications 35.4 (2014), pp. 1625–1651. doi: 10.1137/140961912. eprint: https://doi.org/10.1137/140961912. url: https://doi.org/10.1137/140961912. Constantia Alexandrou, Simone Bacchio, and Jacob Finkenrath. “Multigrid approach in shifted linear systems for the non-degenerated twisted mass operator”. In: Comput. Phys. Commun. 236 (2019), pp. 51–64. doi: 10.1016/j.cpc.2018.10.013. arXiv: 1805.09584 [hep-lat].
Ronald Babich et al. “The Role of multigrid algorithms for LQCD”. In: PoS LAT2009 (2009). Ed. by Chuan Liu and Yu Zhu,
10.1103/PhysRevLett.100.041601. arXiv: 0707.4018 [hep-lat]. James Brannick et al. “Multigrid Preconditioning for the Overlap Operator in Lattice QCD”. In: Numer. Math. 132.3 (2016),
M.A. Clark et al. “The Removal of critical slowing down”. In: PoS LATTICE2008 (2008). Ed. by Christopher Aubin et al.,
Saul D. Cohen et al. “Multigrid Algorithms for Domain-Wall Fermions”. In: PoS LATTICE2011 (2011). Ed. by Pavlos Vranas,
Martin Luscher. “Deflation acceleration of lattice QCD simulations”. In: JHEP 12 (2007), p. 011. doi: 10.1088/1126-6708/2007/12/011. arXiv: 0710.5417 [hep-lat]. Martin Luscher. “Local coherence and deflation of the low quark modes in lattice QCD”. In: JHEP 07 (2007), p. 081. doi: 10.1088/1126-6708/2007/07/081. arXiv: 0706.2298 [hep-lat]. Ronald B. Morgan. “Restarted block-GMRES with deflation of eigenvalues”. In: Applied Numerical Mathematics 54.2 (2005). 51 / 51