Challenges in achieving scalable and robust linear solvers
- L. Grigori
Alpines Inria Paris and LJLL, Sorbonne University with H. Al Daas, MPI Magdebourg
- P. Jolivet, CNRS IRIT
- P. H. Tournier, LJLL, CNRS, Alpines, Sorbonne University
Challenges in achieving scalable and robust linear solvers L. - - PowerPoint PPT Presentation
Challenges in achieving scalable and robust linear solvers L. Grigori Alpines Inria Paris and LJLL, Sorbonne University with H. Al Daas, MPI Magdebourg P. Jolivet, CNRS IRIT P. H. Tournier, LJLL, CNRS, Alpines, Sorbonne University September
2 of 49
Motivation of our work
Typically, each iteration of a Krylov solver requires Sparse matrix vector product
Dot products for orthogonalization
When solving complex linear systems arising, e.g. from large discretized
wrt jumps in coefficients / partitioning into irregular subdomains, e.g. one
A few small eigenvalues hinder the convergence of iterative methods 3 of 49
Motivation of our work
Robust preconditioners that guarantee the condition number of
Robust multilevel Additive Schwarz, using Geneo framework Enlarged Krylov methods reduce communication, increase arithmetic intensity - compute sparse matrix-set of vectors product. 4 of 49
Motivation of our work
Robust preconditioners that guarantee the condition number of
Robust multilevel Additive Schwarz, using Geneo framework Enlarged Krylov methods reduce communication, increase arithmetic intensity - compute sparse matrix-set of vectors product. 4 of 49
Recap on Additive Schwarz methods
DOFs partitioned into {Ω1j}N1
j=1 overlapping domains of dimensions
R1j ∈ Rn1j ×n restriction operator, R1j = In(Ω1j, :) A1j ∈ Rn1j ×n1j : restriction of A to domain j, A1j = R1jART
1j
{D1j}N1
j=1: algebraic partition of unity , In = N1 j=1 RT 1j D1jR1j
A1
1 1 1 1 1/2 1 1 1/2
5 of 49
Recap on Additive Schwarz methods
Original idea from Schwarz algorithm at the continuous level (Schwarz
Symmetric formulation, Additive Schwarz (1989) defined as
AS := N1
1j A−1 1j R1j
Restricted Additive Schwarz (Cai & Sarkis 1999) defined as
RAS := N1
1j D1jA−1 1j R1j
In practice, RAS more efficient than AS 6 of 49
Recap on Additive Schwarz methods
Consider an Alternating Min Max layers ordering for IC0 Duplicate data on domain j, include all DOFs at distance 2 plus a
With LjLT
j the IC0 factor of
IC0 := N1
1j D1j(LjLT j )−1R1j
13 14 15 16 17 18 19 20 2 47 97 52 63 64 65 66 67 68 69 70 21 22 23 24 25 26 27 28 3 48 98 53 71 72 73 74 75 76 77 78 29 30 31 32 33 34 35 36 4 49 99 54 79 80 81 82 83 84 85 86 5 6 7 8 9 10 11 12 1 46 96 51 55 56 57 58 59 60 61 62 38 39 40 41 42 43 44 45 37 50 100 87 88 89 90 91 92 93 94 95 138 139 140 141 142 143 144 145 137 150 200 187 188 189 190 191 192 193 194 195 105 106 107 108 109 110 111 112 101 146 196 151 155 156 157 158 159 160 161 162 113 114 115 116 117 118 119 120 102 147 197 152 163 164 165 166 167 168 169 170 121 122 123 124 125 126 127 128 103 148 198 153 171 172 173 174 175 176 177 178 129 130 131 132 133 134 135 136 104 149 199 154 179 180 181 182 183 184 185 186 + ghost data for backward substitution + ghost data for forward substitution
For structured 2D grids, RAS with IC0 in subdomains and overlap 2
with S. Moufawad and S. Cayrols (proofs in their Phds thesis) 7 of 49
Recap on Additive Schwarz methods
AS,1A) ≤ k1c
8 of 49
Recap on Additive Schwarz methods
1 AV1.
AS,2 := V1 (A2)−1 V T 1 + N1
1j (A1j)−1 R1j
1j }}1≤i≤N1 of
AS,2) ≤ k1c + 1
9 of 49
Recap on Additive Schwarz methods
Nicolaides 87 (CG): kernel of the operator (constant vectors)
1j D1jR1j1
Other early references: [Morgan 92] (GMRES), [Chapman, Saad 92],
Estimations of eigenvectors corresponding to smallest eigenvalues /
Geneo [Spillane et al., 2014]: through solving local Gen EVPs, bounds
(V1): size of the coarse space AS-ZEM Nicolaides with rigid body motions, 6 per subdomain Results for 3D elasticity problem provided by F. Nataf
10 of 49
Recap on Additive Schwarz methods
Nicolaides 87 (CG): kernel of the operator (constant vectors)
1j D1jR1j1
Other early references: [Morgan 92] (GMRES), [Chapman, Saad 92],
Estimations of eigenvectors corresponding to smallest eigenvalues /
Geneo [Spillane et al., 2014]: through solving local Gen EVPs, bounds
(V1): size of the coarse space AS-ZEM Nicolaides with rigid body motions, 6 per subdomain Results for 3D elasticity problem provided by F. Nataf
10 of 49
A robust multilevel additive Schwarz preconditioner
11 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
nBBvnB,
nBBvnB ≤ (RvnB)⊤ A (RvnB) = v ⊤Av
12 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
1j u1jk = λ1jkD1jA1jD1ju1jk
1j
N1
1j Z1j, Z1j = span {u1jk | λ1jk < 1/τ}
AS,2Geneo
1 AV1
1 + N1
1j A−1 1j R1j
AS,2GeneoA
13 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
Challenge: can we find an algebraic stable decomposition ? We call { ˜
j=1, ˜
N1
First condition means that ˜
I,Γ ∈ Rn1j ×n1j
j
I,Γ
If these two conditions are satisfied, the construction of the coarse space can be
In Geneo, k = k1, max number of domains that share a vertex. 14 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
1j u1jk = λ1jkD1jA1jD1ju1jk
N1
1j Z1j, Z1j = span {u1jk | λ1jk < 1/τ}
AS,2ALSP
1 AV1
1 + N1
1j A−1 1j R1j
AS,2ALSPA
15 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
For each domain j, we impose the condition
˜
Consider domain 1, where B11 corresponds to interior DOFs, B22 the
We note S(B22) the Schur complement with respect to B22,
11 B12 − B23B−1 33 B32.
16 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
11 B12u ≤ uT ˜
33 B32
11 B12 − B23B−1 33 B32.
17 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
11 B12u ≤ uT ˜
33 B32
11 B12,
11 B12 − B23B−1 33 B32. Then we have,
11 B12 = ˜
33 B32
18 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
j , ˜
j be two SPSD splittings of A associated to domain j and
ALSP, S2 ALSP associated coarse subspaces, eq (2). If
j u ≤ uT ˜
j u, ∀u ∈ Rn, j = 1, . . . N1,
ALSP) ≥ dim(S2 ALSP)
dim(S1,ALSP) associated to ˜
33 B32
dim(S1,ALSP) associated to ˜
11 B12
19 of 49
A robust multilevel additive Schwarz preconditioner Theory of a class of robust two level methods in algebraic setting
Number of deflated vectors per subdomain in GenEO (black) versus
3D elasticity problem, 128 domains
20 40 60 80 100 120 Subdomain number 2 4 6 8 10 12 14 16 Number ofdeflated vectors El3d 128 subdomains uC GenEO
20 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Multispace and multilevel BDDC [Mandel, Sousedik, Dohrmann’08] Algebraic multilevel additive Schwarz method [Borzi, De Simone, Di
Multilevel Schwarz domain decomposition solver for elasticity problems
Multilevel balancing domain decomposition at extreme scales [Badia,
Three level method based on applying recursively the two-level
21 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Given coarse space S1, S1 = span (V1), coarse grid matrix A2 = V ⊤
1 AV1,
MAS = M−1 A1,MAS = V1A−1 2 V T 1 + N1
1j A−1 1j R1j,
For level i = 2 : L, define preconditioner M−1
i
Ai,MAS = ViA−1 i+1V T i
Ni
ij A−1 ij Rij,
Coarse space Si chosen such that condition number of M−1
i,MASAi is
Si = span (Vi), coarse grid matrix Ai+1 = V T
i AiVi
22 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Ωi = [1 : ni] is the set of unknowns at level i, partitioned into Ni
Ri,j ∈ Rni,j×ni restriction operator, Ri,j = Ini(Ωi,j, :) Ri,j,∆ restriction operator to complimentary unknowns, ∆i,j = Ωi \ Ωi,j,
Define superdomains Gi,k, k = 1 : Ni+1, as the concatenation of d
k=1 Gi,k = {1, . . . , Ni}.
23 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
A1 D1;3Z1;3 D1;1Z1;1 V1
1j (local matrix associated to domain j)
1j
τ
˜ ANeu
1j u1jk = λ1jkD1jA1jD1ju1jk.
j=1 D1jR⊤ 1j Z1j, S1 = span (V1), A2 = V T 1 A1V1, A2 ∈ Rn2×n2
A1,MAS = V1A−1 2 V T 1 + N1 j=1 R⊤ 1j A−1 1j R1j
24 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
~ A2;1 =
P4
j=1 V > 1 ~
A1;jV1
A1 (R> 1 D1Z1)>A(R> 1 D1Z1) (R> 11D11Z11)>A(R> 6 D6Z6)
~ A2;1 =
P4j=1 V > 1 ~ A1;jV1
k=(j−1)d+1 V T i−1 ˜
Aij = RijAiRT
ij (local matrix associated to domain j)
Solve Gen EVP, Zij = span
1 τ
Rij ˜ AijR⊤
ij uijk = λijkDijAijDijuijk
Ai ,MAS = ViA−1 i+1V T i
j=1 R⊤ ij A−1 ij Rij
Let Si = Ni
j=1 DijR⊤ ij Zij, Si = span (Vi), Ai+1 = V T i AiVi, Ai+1 ∈ Rni+1×ni+1
end for end for
25 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Let ˜
Ni
Let Gi,1, . . . , Gi,Ni+1 be a set of superdomains at level i associated with
The matrices ˜
i
j
I,Γ
Ni+1
26 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Correspondence between the columns of Vi (on the left) and the rows
No overlapping in Vi is possible through a nonoverlapping partition of
27 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Ai,MAS = ViA−1 i+1V T i
Ni
ij A−1 ij Rij
MAS = M−1 A1,MAS and,
Ai,MASAi) ≤ (kic + 1) (2 + (2kic + 1)kiτ) ,
ki ≤ k1 for i = 2 : L If Neumann matrices are used at the first level, k1 is bounded by the
kic is the minimum number of colors required to colour the graph of Ai Constants independent of N1, number of subdomains at level 1 28 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Ai,MAS = ViA−1 i+1V T i
Ni
ij A−1 ij Rij
MAS = M−1 A1,MAS and,
Ai,MASAi) ≤ (kic + 1) (2 + (2kic + 1)kiτ) ,
ki ≤ k1 for i = 2 : L If Neumann matrices are used at the first level, k1 is bounded by the
kic is the minimum number of colors required to colour the graph of Ai Constants independent of N1, number of subdomains at level 1 28 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Construction of M−1
MAS preconditioner requires O(logd N1) messages.
Application of M−1
MAS preconditioner requires O((log2 N1)logd N1) messages
29 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Heterogeneous difusion problem, FreeFem++ using P2 FE in 3D and P4 in 2D.
Heterogeneous linear elasticity problem, FreeFem++ using P2 FE in 3D and P3 in 2D.
Young’s modulus E and Poisson’s ratio ν,
30 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Number of outer
iterations: 32
Dimensions of A2:
n2 = 25×2, 048 = 51, 200 A3: n3 = 20 × N2
Number of outer
iterations: 19
Dimensions of A2:
n2 = 25×2, 048 = 51, 200 A3: n3 = 20 × N2
31 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Number of outer
iterations: 73
Dimensions of A2: n2 =
50 × 2, 048 = 1.02 · 105 A3: n3 = 20 × N2
Number of outer
iterations: 45
Dimensions of A2:
n2 = 25×2, 048 = 51, 200 A3: n3 = 20 × N2
32 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Machine: IRENE (Genci), Intel Skylake 8168,
2,7 GHz, 24 cores each
Stopping criterion: 10−5 Young’s modulus E and Poisson’s ratio ν take
two values, (E1, ν1) = (2 · 1011, 0.35), and (E2, ν2) = (107, 0.45)
33 of 49
A robust multilevel additive Schwarz preconditioner Extension to multilevel methods
Machine: IRENE (Genci), Intel Skylake 8168,
2,7 GHz, 24 cores each
Stopping criterion: 10−5 (10−2 for 2nd level) Young’s modulus E and Poisson’s ratio ν take
two values, (E1, ν1) = (2 · 1011, 0.35), and (E2, ν2) = (107, 0.45) Linear elasticity, 616 · 106 unknowns, GenEO versus GenEO multilevel # P Deflation Domain Coarse Solve Total # iter subspace factor matrix GenEO 8192 113.3 11.9 27.5 52.0 152.8 53 GenEO multilevel 8192 113.3 11.9 13.2 52.0 138.5 53 . A2 of dimension 328 · 103 × 328 · 103, 40 vectors per subdomain, 10−2 tolerance. A3 of dimension 5120 × 5120, 128 procs MUMPS for factoring subdomains, Arpack, Pardiso for coarse grids.
34 of 49
Enlarged Krylov methods
35 of 49
Enlarged Krylov methods
Partition the matrix into N domains Split the residual r0 into t vectors corresponding to the N domains,
r0 Re
Generate t new basis vectors, obtain an enlarged Krylov subspace
0, ARe 0, A2Re 0, ..., Ak−1Re 0}
Search for the solution of the system Ax = b in Kt,k(A, r0) 36 of 49
Enlarged Krylov methods
At each iteration, the new approximate solution xk is found by
2(xtAx) − btx over x0 + Kt,k:
Can be seen as a particular case of a block Krylov method AX = S(b), such that S(b)ones(t, 1) = b; Re
0 = AX0 − S(b)
Orthogonality condition involves the block residual Rk ⊥ Kt,k 37 of 49
Enlarged Krylov methods
At each iteration, the new approximate solution xk is found by
2(xtAx) − btx over x0 + Kt,k:
Can be seen as a particular case of a block Krylov method AX = S(b), such that S(b)ones(t, 1) = b; Re
0 = AX0 − S(b)
Orthogonality condition involves the block residual Rk ⊥ Kt,k 37 of 49
Enlarged Krylov methods
Block Krylov methods [O’Leary, 1980]: solve systems with multiple rhs
k (A, R0),
k (A, R0) = block − span{R0, AR0, A2R0, ..., Ak−1R0}.
coopCG (Bhaya et al, 2012): solve one system by starting with t different
BRRHS-CG [Nikishin and Yeremin, 1995]: use a block method with t-1
Multiple preconditioners GMRES with multiple preconditioners [Greif, Rees, Szyld, 2011] AMPFETI [Rixen, 97], [Gosselet et al, 2015] And to reduce communication: s-step methods, pipelined methods 38 of 49
Enlarged Krylov methods
A is an SPD matrix, x∗ is the solution of Ax = b ||x∗ − xk||A is the kth error of CG, e0 = x∗ − x0 ||x∗ − xk||A is the kth error of ECG
λmin(A)
where κt = λmax (A)
λt (A) , ˆ
e0 ≡ E0(Φ⊤
1 E0)−1 ... 1
denotes the t eigenvectors associated to the smallest eigenvalues, and E0 is the initial enlarged error.
39 of 49
Enlarged Krylov methods
0 Ar0)−1/2
αk = p⊤
k rk−1
xk = xk−1 + pkαk
rk = rk−1 − Apkαk
zk+1 = rk − pk(p⊤
k Ark)
pk+1 = zk+1(z⊤
k+1Azk+1)−1/2
P ) ← BLAS 1 & 2
0 (Re ⊤ARe 0 )−1/2
i=1 R(i) k ||2 < ε||b||2 do
αk = P⊤
k Rk−1
⊲ t × t matrix
Xk = Xk−1 + Pkαk
Rk = Rk−1 − APkαk
Zk+1 = APk − Pk(P⊤
k AAPk) −
Pk−1(P⊤
k−1AAPk)
Pk+1 = Zk+1(Z ⊤
k+1AZk+1)−1/2
i=1 X (i) k
P ) ← BLAS 3
40 of 49
Enlarged Krylov methods
k+1APi = 0, ∀i ≤ k by using:
k ARk)
k AAPk) − Pk−1(P⊤ k−1AAPk)
k+1AZk+1)−1/2
41 of 49
Enlarged Krylov methods
3 of 5 largest SPD matrices of Tim Davis’ collection Heterogeneous linear elasticity problem discretized with FreeFem++
u ∈ Rd is the unknown displacement field, f is
Young’s modulus E and Poisson’s ratio ν,
Name Size Nonzeros Problem Hook 1498 1,498,023 59,374,451 Structural problem Flan 1565 1,564,794 117,406,044 Structural problem Queen 4147 4,147,110 316,548,962 Structural problem Ela 4 4,615,683 165,388,197 Linear elasticity
42 of 49
Enlarged Krylov methods
1
2
4
5
(+1) (+2) (+2) (+2)
43 of 49
Enlarged Krylov methods
Run on Kebnekaise, Ume˚
Compiled with Intel Suite 18 PETSc 3.7.6 (linked with the MKL) Pure MPI (no threading) Stopping criterion tolerance is set to 10−5 (PETSc default value) Block diagonal preconditioner, number blocks equals number of MPI
Cholesky factorization on the block with MKL-PARDISO solver 44 of 49
Enlarged Krylov methods
252 504 1008 2016 0.0 0.5 1.0 5.0
Runtime (s)
x1.6 x1.6 x1.5 x2.5
Hook, D-Odir(10)
PETSc PCG ECG
250 500 750 1000 1250 1500 252 504 1008 2016 1 5 10
x1.9 x2.2 x2.2 x2.3
Flan, D-Odir(14)
500 1000 1500 2000 2500 3000
# iter
252 504 1008 2016 1 5 10 100
x0.7 x0.7 x0.7 x0.7
Queen, Omin(6)
1200 1400 1600 1800 252 504 1008 2016 10 100 250
x6.9 x6.4 x6.2 x6.3
Ela_4, D-Odir(20)
5000 10000 15000 20000
44 of 49
Conclusions
45 of 49
Conclusions
Multilevel Additive Schwarz available in HPDDM as multilevel Geneo (P. Jolivet) code for reproducing the results available at
Krylov subspace methods:
Enlarged CG: Reverse Communication Interface Enlarged GMRES will be available as well
NLAFET H2020 european project, EMC2 ERC Synergy project, ANR,
46 of 49
Conclusions
Multilevel Additive Schwarz from theory to practice, find an efficient local algebraic splitting that allows
47 of 49
Conclusions
Ashby, S. F., Manteuffel, T. A., and Saylor, P. E. (1990). A taxonomy for conjugate gradient methods. SIAM Journal on Numerical Analysis, 27(6):1542–1568. Daas, H. A. and Grigori, L. (2019). A class of efficient locally constructed preconditioners based on coarse spaces. SIAM Journal on Matrix Analysis and Applications, 40:66–91. Daas, H. A., Grigori, L., Jolivet, P., and Tournier, P. H. (2019). A multilevel Schwarz preconditioner based on a hierarchy of robust coarse spaces. Technical report, Inria, https://hal.archives-ouvertes.fr/hal-02151184. Grigori, L., Moufawad, S., and Nataf, F. (2016). Enlarged Krylov Subspace Conjugate Gradient Methods for Reducing Communication. SIAM Journal on Scientific Computing, 37(2):744–773. Grigori, L. and Tissot, O. (2019). Scalable linear solvers based on enlarged Krylov subspaces with dynamic reduction of search directions. SIAM Journal on Scientific Computing. Hestenes, M. R. and Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards., 49:409–436. OLeary, D. P. (1980). The block conjugate gradient algorithm and related methods. Linear Algebra and Its Applications, 29:293–322. 48 of 49
Conclusions
Spillane, N., Dolean, V., Hauret, P., Nataf, F., Pechstein, C., and Scheichl, R. (2014). Abstract robust coarse spaces for systems of pdes via gereralized eigenproblems in the overlap. Numerische Mathematik, 126(4):741–770. 49 of 49