Mitglied der Helmholtz-Gemeinschaft
An optimized subspace iteration eigensolver applied to sequences
- f dense eigenproblems in ab initio
simulations
PMAA14, Lugano, Switzerland. July 4th
- E. Di Napoli and M. Berljafa
An optimized subspace iteration eigensolver applied to sequences of - - PowerPoint PPT Presentation
Mitglied der Helmholtz-Gemeinschaft An optimized subspace iteration eigensolver applied to sequences of dense eigenproblems in ab initio simulations PMAA14, Lugano, Switzerland. July 4th E. Di Napoli and M. Berljafa Motivation: electronic
Mitglied der Helmholtz-Gemeinschaft
Full-potential Linearized Augmented Plane Waves (FLAPW) self-consistent field cycle
Initial guess for charge density
Compute discretized Kohn-Sham equations Solve a set of eigenproblems
k1 ...P(ℓ) kN
Compute new charge density
Converged?
OUTPUT Electronic structure,
1 every P(ℓ)
k
k x = B(ℓ) k λx is a generalized eigenvalue problem;
2 A and B are DENSE and hermitian (B is positive definite); 3 required: lower 2÷10 % of eigenpairs; 4 momentum vector index: k = 1 : 10÷100; 5 iteration cycle index: ℓ = 1 : 20÷50.
PMAA14, Lugano, Switzerland. July 4th
Folie 2
PMAA14, Lugano, Switzerland. July 4th
Folie 3
PMAA14, Lugano, Switzerland. July 4th
Folie 4
Definitions and solving strategies
PMAA14, Lugano, Switzerland. July 4th
Folie 5
Definitions and solving strategies
PMAA14, Lugano, Switzerland. July 4th
Folie 5
Adjacent iteration cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
direct solver direct solver direct solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
direct solver direct solver direct solver
PMAA14, Lugano, Switzerland. July 4th
Folie 6
Adjacent iteration cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
direct solver direct solver direct solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
direct solver direct solver direct solver
PMAA14, Lugano, Switzerland. July 4th
Folie 6
Adjacent iteration cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
direct solver direct solver direct solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
direct solver direct solver direct solver
PMAA14, Lugano, Switzerland. July 4th
Folie 6
Definition and solving strategies
PMAA14, Lugano, Switzerland. July 4th
Folie 7
Definition and solving strategies
ki ≡ {θ1,...,θn} = diag
ki
ki
j
j
j
j
j
PMAA14, Lugano, Switzerland. July 4th
Folie 7
Adjacent cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
iterative solver iterative solver iterative solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
iterative solver iterative solver iterative solver
PMAA14, Lugano, Switzerland. July 4th
Folie 8
PMAA14, Lugano, Switzerland. July 4th
Folie 9
Properties and algorithm evolution
ki
PMAA14, Lugano, Switzerland. July 4th
Folie 10
PMAA14, Lugano, Switzerland. July 4th
Folie 11
1 Lanczos step. Identify the bounds for the eigenspectrum interval
PMAA14, Lugano, Switzerland. July 4th
Folie 11
1 Lanczos step. Identify the bounds for the eigenspectrum interval
2 Chebyshev filter. Filter a block of vectors W ←
3 Re-orthogonalize the vectors outputted by the filter; W = QR. 4 Compute the Rayleigh quotient G = Q†HQ. 5 Compute the primitive Ritz pairs (Λ,Y) by solving for GY = YΛ. 6 Compute the approximate Ritz pairs (Λ,W ← QY). 7 Check which one among the Ritz vectors converged. 8 Deflate and lock the converged vectors.
PMAA14, Lugano, Switzerland. July 4th
Folie 11
The basic principle
Let |γ| > 1 and Pm denote the set of polynomials of degree smaller or equal to m. Then the extremum
p∈Pm,p(γ)=1 max t∈[−1,1]|p(t)|
is reached by
where Cm is the Chebyshev polynomial of the first kind of order m, defined as
PMAA14, Lugano, Switzerland. July 4th
Folie 12
Chebyshev polynomials
A generic vector v = ∑n
i=1 sixi is very quickly aligned in the direction of the
eigenvector corresponding to the extremal eigenvalue λ1
n
i=1
n
i=1
n
i=2
e )
e
PMAA14, Lugano, Switzerland. July 4th
Folie 13
In practice
PMAA14, Lugano, Switzerland. July 4th
Folie 14
Convergence ratio and residuals
The convergence ratio for the eigenvector xi corresponding to eigenvalue λi / ∈ [α,β] is defined as τ(λi) = |ρi|−1 = min
±
e ± λi −c e 2 −1
The further away λi is from the interval [α,β] the smaller is |ρi|−1 and the faster the convergence to xi is.
i ) ∼
ρi
i
i )
ρi
Res(v
m0 i
)
PMAA14, Lugano, Switzerland. July 4th
Folie 15
1 Chebyshev filter. Initial filter W ←
2 Re-orthogonalize W = QR & compute the Rayleigh quotient G = Q†HQ. 3 Solve the reduced problem GY = YΛ and compute the approximate Ritz
PMAA14, Lugano, Switzerland. July 4th
Folie 16
1 Chebyshev filter. Initial filter W ←
2 Re-orthogonalize W = QR & compute the Rayleigh quotient G = Q†HQ. 3 Solve the reduced problem GY = YΛ and compute the approximate Ritz
4 Optimizer. Compute the polynomial degrees mi ≥ ln
Res(wi)
5 Chebyshev filter. Filter W ←
6 Re-orthogonalize W = QR & compute the Rayleigh quotient G = Q†HQ. 7 Solve the reduced problem GY = YΛ and compute the approximate Ritz
8 lock the converged vectors. 9 Store the residuals Res(wi) of the unconverged vectors.
PMAA14, Lugano, Switzerland. July 4th
Folie 16
PMAA14, Lugano, Switzerland. July 4th
Folie 17
THEORETICAL PEAK PERFORMANCE/CORE=11.71 GFLOPS;
PMAA14, Lugano, Switzerland. July 4th
Folie 18
PMAA14, Lugano, Switzerland. July 4th
Folie 19
PMAA14, Lugano, Switzerland. July 4th
Folie 19
( 1 , 1 6 ) ( 2 , 8 ) ( 4 , 4 ) ( 8 , 2 ) ( 1 6 , 1 ) ( 1 , 3 2 ) ( 2 , 1 6 ) ( 4 , 8 ) ( 8 , 4 ) ( 1 6 , 2 ) ( 3 2 , 1 ) ( 1 , 6 4 ) ( 2 , 3 2 ) ( 4 , 1 6 ) ( 8 , 8 ) ( 1 6 , 4 ) ( 3 2 , 2 ) ( 6 4 , 1 ) Grid Shape 100 200 300 400 500 Time [sec] 16 Cores 32 Cores 64 Cores The interplay between algorithmic block size and grid shape. Algorithmic blocksize 64 Algorithmic blocksize 128 Algorithmic blocksize 256
PMAA14, Lugano, Switzerland. July 4th
Folie 19
As a function of iteration cycles
PMAA14, Lugano, Switzerland. July 4th
Folie 20
Speed-up =
CPU time (input random vectors) CPU time (input approximate eigenvectors)
PMAA14, Lugano, Switzerland. July 4th
Folie 21
PMAA14, Lugano, Switzerland. July 4th
Folie 22
PMAA14, Lugano, Switzerland. July 4th
Folie 23
620 1358 2037 25 50 75 100 125 150 175 200 time [s] ℓ =4 620 1358 2037 number of eigenpairs sought after ℓ =11 620 1358 2037 ℓ =18 EleMRRR ChFSI, 1e-10 ChFSI, 1e-08
PMAA14, Lugano, Switzerland. July 4th
Folie 23
10 20 30 40 50 60 10
−11
10
−10
10
−9
10
−8
10
−7
10
−6
Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 60) Eigenpairs residuals
Y0 = Y
sol + 1e−09 * randn1
TOL = 1e−10 nev = 100 nex = 20 10 20 30 40 50 60 10
−11
10
−10
10
−9
10
−8
10
−7
10
−6
Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 60) Eigenpairs residuals
Y0 = Y
sol + 1e−09 * randn2
TOL = 1e−10 nev = 100 nex = 20 10 20 30 40 50 60 10
−16
10
−14
10
−12
10
−10
10
−8
10
−6
Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 60) Eigenpairs residuals
Y0 = Y
sol + 1e−09 * randn1
TOL = 1e−14 nev = 100 nex = 20 5 10 15 20 25 30 35 10
−16
10
−14
10
−12
10
−10
10
−8
10
−6
Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 37) Eigenpairs residuals
Y0 = Y
sol + 1e−09 * randn1
TOL = 1e−10 nev = 100 nex = 40
PMAA14, Lugano, Switzerland. July 4th
Folie 24
5 10 15 20 25 30 35 40 45 50 55 60 10
−11
10
−10
10
−9
10
−8
10
−7
10
−6 Evolution of residuals for 4 eigenpairs among the lowest 100
Polynomial degree (5 −> 60) Eigenpairs residuals
Res(vm) = a * 1 / |i|m + b * |1|m/|i|m
5 10 15 20 25 30 35 40 45 50 55 60 10
−11
10
−10
10
−9
10
−8
10
−7
10
−6 Evolution of residuals for 4 eigenpairs among the lowest 100
Polynomial degree (5 −> 60) Eigenpairs residuals
Res(vm) = a * 1 / |i|m * sqrt(1 + b * |1|2m )
|ρi|m ≫ 1
PMAA14, Lugano, Switzerland. July 4th
Folie 24
1 Fine-tuning filter optimization by profiling eigenvector degrees to
2 Exploiting different architectures for parallelization (GPUs, Xeon Phi).
PMAA14, Lugano, Switzerland. July 4th
Folie 25
PMAA14, Lugano, Switzerland. July 4th
Folie 26