Mitglied der Helmholtz-Gemeinschaft
A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW
PPAM 2013 Warsaw, Poland, Sept. 10th
- M. Berljafa and E. Di Napoli
A Parallel and Scalable Iterative Solver for Sequences of Dense - - PowerPoint PPT Presentation
Mitglied der Helmholtz-Gemeinschaft A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Motivation and Goals Electronic Structure
Mitglied der Helmholtz-Gemeinschaft
Electronic Structure Band energy gap Conductivity Forces, etc.
Electronic Structure Band energy gap Conductivity Forces, etc.
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 2
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 3
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 4
Self-consistent cycle
Initial guess for charge density
Compute discretized Kohn-Sham equations Solve a set of eigenproblems
k1 ...P(ℓ) kN
Compute new charge density
Converged?
OUTPUT Electronic structure,
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 5
Self-consistent cycle
Initial guess for charge density
Compute discretized Kohn-Sham equations Solve a set of eigenproblems
k1 ...P(ℓ) kN
Compute new charge density
Converged?
OUTPUT Electronic structure,
1 every P(ℓ)
k
k x = B(ℓ) k λx is a generalized eigenvalue problem;
2 A and B are DENSE and hermitian (B is also pos. def.); 3 Pks with different k index have different size and are independent from
4 k = 1 : 10−100
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 5
Adjacent cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
direct solver direct solver direct solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
direct solver direct solver direct solver
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 6
Adjacent cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
direct solver direct solver direct solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
direct solver direct solver direct solver
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 6
Adjacent cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
direct solver direct solver direct solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
direct solver direct solver direct solver
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 6
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 7
collected data on angles b/w eigenvectors of adjacent eigenproblems; Θ(ℓ)
ki ≡ {θ1,...,θn} = diag
ki
, ˜ X(ℓ)
ki
for fixed ki θ(2)
j
≥ θ(3)
j
≥ ··· ≥ θ(N)
j
: θ(2)
j
≫ θ(N)
j
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 7
fixed k
2 6 10 14 18 22 10
−10
10
−8
10
−6
10
−4
10
−2
10
Evolution of subspace angle for eigenvectors of k−point 1 and lowest 75 eigs
Iterations (2 −> 22) Angle b/w eigenvectors of adjacent iterations
AuAg
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 8
j
j ;
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 9
Adjacent cycles
k1
k1 ,Λ(ℓ) k1 )
k2
k2 ,Λ(ℓ) k2 )
kN
kN ,Λ(ℓ) kN )
iterative solver iterative solver iterative solver
k1
k1
k1
k2
k2
k2
kN
kN
kN
iterative solver iterative solver iterative solver
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 10
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 11
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 11
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 11
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 12
1 The ability to receive as input a sizable set of approximate eigenvectors
ki
2 The capacity to solve simultaneously for a substantial portion of
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 13
1 The ability to receive as input a sizable set of approximate eigenvectors
ki
2 The capacity to solve simultaneously for a substantial portion of
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 13
1 Lanczos step. Identify the bounds for the eigenspectrum interval.
2 Chebyshev filter. Filter a block of vectors W ←
3 Re-orthogonalize the vectors outputted by the filter; W = QR. 4 Compute the Rayleigh quotient G = Q†HQ. 5 Compute the primitive Ritz pairs (Λ,Y) by solving for GY = YΛ. 6 Compute the approximate Ritz pairs (Λ,W ← QY). 7 Check which one among the Ritz vectors converged. 8 Deflate and lock the converged vectors.
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 14
Chebyshev polynomials
The Chebyshev polynomial Cm of the first kind of order m, is defined as
−3 −2 −1 1 2 3 100 200 300 400 500
Degree 5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x 10
6
Degree 10
−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 x 10
10
Degree 15
−3 −2 −1 1 2 3 −1.5 −1 −0.5 0.5 1 1.5 x 10
14
Degree 20
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 15
The pseudocode
A simple linear transformation maps [−1,1] − → [α,β] ⊂ R defines c = β+α
2
as the center of the interval and e = β−α
2
as the width of the interval.
1 σ1 ← e/(λ1 −c) 2 Z1 ← σ1
3 σi+1 ←
4 Zi+1 ← 2σi+1
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 16
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 17
THEORETICAL PEAK PERFORMANCE/CORE=11.71 GFLOPS;
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 18
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 19
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 19
( 1 , 1 6 ) ( 2 , 8 ) ( 4 , 4 ) ( 8 , 2 ) ( 1 6 , 1 ) ( 1 , 3 2 ) ( 2 , 1 6 ) ( 4 , 8 ) ( 8 , 4 ) ( 1 6 , 2 ) ( 3 2 , 1 ) ( 1 , 6 4 ) ( 2 , 3 2 ) ( 4 , 1 6 ) ( 8 , 8 ) ( 1 6 , 4 ) ( 3 2 , 2 ) ( 6 4 , 1 ) Grid Shape 100 200 300 400 500 Time [sec] 16 Cores 32 Cores 64 Cores The interplay between algorithmic block size and grid shape. Algorithmic blocksize 64 Algorithmic blocksize 128 Algorithmic blocksize 256
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 19
Speed-up =
CPU time (input random vectors) CPU time (input approximate eigenvectors)
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 20
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 21
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 21
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 22
1 Finalizing filter optimization by adjusting the degree of the polynomial so
2 Parallelization of ChFSI for GPUs;
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 23
PPAM 2013 Warsaw, Poland, Sept. 10th
Folie 24