Mitglied der Helmholtz-Gemeinschaft
(Preconditioning) Chebyshev subspace iteration applied to sequences of dense eigenproblems in ab initio simulations
NASCA 2013 Calais, France, June 24th
- M. Berljafa and E. Di Napoli
(Preconditioning) Chebyshev subspace iteration applied to sequences - - PowerPoint PPT Presentation
Mitglied der Helmholtz-Gemeinschaft (Preconditioning) Chebyshev subspace iteration applied to sequences of dense eigenproblems in ab initio simulations NASCA 2013 Calais, France, June 24th M. Berljafa and E. Di Napoli Motivation and Goals
Mitglied der Helmholtz-Gemeinschaft
Electronic Structure Band energy gap Conductivity Forces, etc.
Electronic Structure Band energy gap Conductivity Forces, etc.
NASCA 2013 Calais, France, June 24th
Folie 2
NASCA 2013 Calais, France, June 24th
Folie 3
NASCA 2013 Calais, France, June 24th
Folie 4
Self-consistent cycle
Initial guess for charge density
Compute discretized Kohn-Sham equations Solve a set of eigenproblems
k1 ...P(ℓ) kN
Compute new charge density
Converged?
OUTPUT Electronic structure,
NASCA 2013 Calais, France, June 24th
Folie 5
Self-consistent cycle
Initial guess for charge density
Compute discretized Kohn-Sham equations Solve a set of eigenproblems
k1 ...P(ℓ) kN
Compute new charge density
Converged?
OUTPUT Electronic structure,
1 every P(ℓ)
k
k x = B(ℓ) k λx is a generalized eigenvalue problem;
2 A and B are DENSE and hermitian (B is also pos. def.); 3 Pks with different k index have different size and are independent from
4 k = 1 : 10−100
NASCA 2013 Calais, France, June 24th
Folie 5
NASCA 2013 Calais, France, June 24th
Folie 6
numerical simulations analyzed employing a parameter-based inverse problem method; collected data on angles b/w eigenvectors of adjacent eigenproblems; discovered evolution of eigenvectors along the sequence.
NASCA 2013 Calais, France, June 24th
Folie 6
fixed k
2 6 10 14 18 22 10
−10
10
−8
10
−6
10
−4
10
−2
10
Evolution of subspace angle for eigenvectors of k−point 1 and lowest 75 eigs
Iterations (2 −> 22) Angle b/w eigenvectors of adjacent iterations
AuAg
NASCA 2013 Calais, France, June 24th
Folie 7
Note: Mathematical model Correlation. Correlation ⇐ numerical analysis of the simulation.
1 Approximate eigenvectors can speed-up iterative solvers (EDN,
2 Developed of a block iterative eigensolver (ChFSI) that can maximally
3 ChFSI is competitive with direct methods for dense problems in ab initio
NASCA 2013 Calais, France, June 24th
Folie 8
NASCA 2013 Calais, France, June 24th
Folie 9
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
NASCA 2013 Calais, France, June 24th
Folie 9
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
j γjxj = λ1
λj λ1 xj
Folie 9
NASCA 2013 Calais, France, June 24th
Folie 9
NASCA 2013 Calais, France, June 24th
Folie 10
NASCA 2013 Calais, France, June 24th
Folie 11
1 The ability to receive as input a sizable set Z0 of approximate
2 The capacity to solve simultaneously for a substantial portion of
NASCA 2013 Calais, France, June 24th
Folie 12
1 The ability to receive as input a sizable set Z0 of approximate
2 The capacity to solve simultaneously for a substantial portion of
NASCA 2013 Calais, France, June 24th
Folie 12
Chebyshev polynomials
The Chebyshev polynomial Cm of the first kind of order m, is defined as
−3 −2 −1 1 2 3 100 200 300 400 500
Degree 5
−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 x 10
6
Degree 10
−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 x 10
10
Degree 15
−3 −2 −1 1 2 3 −1.5 −1 −0.5 0.5 1 1.5 x 10
14
Degree 20 NASCA 2013 Calais, France, June 24th
Folie 13
The basic principle
Let |γ| > 1 and Pm denote the set of polynomials of degree smaller or equal to m. Then the extremum
p∈Pm,p(γ)=1 max t∈[−1,1]|p(t)|
is reached by
A generic vector v is very quickly aligned in the direction of the eigenvector corresponding to the extremal eigenvalue λ1
n
i=1
n
i=1
n
i=2
e )
e
NASCA 2013 Calais, France, June 24th
Folie 14
The pseudocode
A simple linear transformation maps [−1,1] − → [α,β] ⊂ R defines c = β+α
2
as the center of the interval and e = β−α
2
as the width of the interval.
1 σ1 ← e/(λ1 −c) 2 Z1 ← σ1
3 σi+1 ←
4 Zi+1 ← 2σi+1
NASCA 2013 Calais, France, June 24th
Folie 15
1 Lanczos step. Identify the bounds for the interval to be filtered out.
2 Chebyshev filter. Filter a block of vectors W ←
3 QR decomposition. Re-orthogonalize the vectors outputted by the filter;
4 Compute the Rayleigh quotient G = Q†HQ. 5 Compute the primitive Ritz pairs (Λ,Y) by solving for GY = YΛ. 6 Compute the approximate Ritz pairs (Λ,W ← QY). 7 Check which one among the Ritz vectors converged. 8 Deflate and lock the converged vectors.
NASCA 2013 Calais, France, June 24th
Folie 16
NASCA 2013 Calais, France, June 24th
Folie 17
THEORETICAL PEAK PERFORMANCE/CORE=11.71 GFLOPS;
NASCA 2013 Calais, France, June 24th
Folie 18
Speed-up =
CPU time (input random vectors) CPU time (input approximate eigenvectors)
2 4 6 8 10 12 14 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
Iteration index Speed−up Speed−up vs. Iteration index for Nev=256 and three distinct matrix sizes
NaCl −− n=3893 NaCl −− n=6217 NaCl −− n=9273 NASCA 2013 Calais, France, June 24th
Folie 19
Speed-up =
CPU time (input random vectors) CPU time (input approximate eigenvectors)
5 10 15 20 25 30 1 1.5 2 2.5 3 3.5 4
Iteration index Speed−up Speed−up vs. Iteration index for Nev=972 and 2 distinct matrix sizes
AuAg −− n=5638 AuAg −− n=8970 NASCA 2013 Calais, France, June 24th
Folie 19
< 1% 90% 6% 4%
Residuals convergence Rayleigh−Ritz Chebyshev filter Lanczos
NASCA 2013 Calais, France, June 24th
Folie 20
NASCA 2013 Calais, France, June 24th
Folie 20
NASCA 2013 Calais, France, June 24th
Folie 20
NASCA 2013 Calais, France, June 24th
Folie 21
1 Approximate vs. Random:
sequential ChFSI achieves speed-ups in the range 1.5X ÷ 3.5X; parallel versions of ChFSI can achieve speed-ups up to 5X.
2 Parallel ChFSI:
the algorithms scales extremely well even for medium size eigenproblems; depending on the number of cores and percentage of eigenspectrum, ChFSI is competitive with direct eigensolvers; having access to more cores enables the investigation of larger physical systems (larger eigenproblems).
NASCA 2013 Calais, France, June 24th
Folie 22
1 Approximate vs. Random:
sequential ChFSI achieves speed-ups in the range 1.5X ÷ 3.5X; parallel versions of ChFSI can achieve speed-ups up to 5X.
2 Parallel ChFSI:
the algorithms scales extremely well even for medium size eigenproblems; depending on the number of cores and percentage of eigenspectrum, ChFSI is competitive with direct eigensolvers; having access to more cores enables the investigation of larger physical systems (larger eigenproblems).
NASCA 2013 Calais, France, June 24th
Folie 22
NASCA 2013 Calais, France, June 24th
Folie 23
The convergence ratio for the eigenvector wi corresponding to eigenvalue λi / ∈ [α,β] is defined as τ(λi) = |ρi|−1 = min
±
e ± λi −c e 2 −1
The further away λi is from the interval [α,β] the smaller is |ρi|−1 and the faster the convergence to wi is.
NASCA 2013 Calais, France, June 24th
Folie 24
The convergence ratio for the eigenvector wi corresponding to eigenvalue λi / ∈ [α,β] is defined as τ(λi) = |ρi|−1 = min
±
e ± λi −c e 2 −1
The further away λi is from the interval [α,β] the smaller is |ρi|−1 and the faster the convergence to wi is.
1 ) |const.| |ρ1|m
i ) |const.| |ρi|m +|const.| |ρ1|m |ρi|m eps
NASCA 2013 Calais, France, June 24th
Folie 24
i ) = Res(vm0 i )
ρi
NASCA 2013 Calais, France, June 24th
Folie 25
i ) = Res(vm0 i )
ρi
|ρi|(m−m0) .
NASCA 2013 Calais, France, June 24th
Folie 25
2 3 4 5 6 7 8 9 10 11 12 13 100 110 120 130 140 150 160 170 180 190 200 Iteration Index Time [sec]
Standard ChFSI vs Optimized ChFSI
Standard (total time) Optimized (total time) Standard (RR time) Optimized (RR time) Optimized (filter time) Standard (filter time)
ChFSI Total
NASCA 2013 Calais, France, June 24th
Folie 26
2 3 4 5 6 7 8 9 10 11 12 13 80 90 100 110 120 130 140 150 160 170 180
Iteration Index Time[sec] Standard ChFSI vs Optimized ChFSI
Optimized (total time) Standard (total time) Optimized (filter time) Standard (filter time) Optimized (RR time) Standard (RR time)
Polynomial filter
NASCA 2013 Calais, France, June 24th
Folie 26
2 3 4 5 6 7 8 9 10 11 12 13 6 7 8 9 10 11 12 13 14 15
Iteration Index Time[sec] Standard ChFSI vs Optimized ChFSI
Optimized (total time) Standard (total time) Optimized (filter time) Standard (filter time) Optimized (RR time) Standard (RR time)
Rayleigh−Ritz
NASCA 2013 Calais, France, June 24th
Folie 26
1 Finalizing filter optimization by adjusting the degree of the polynomial so
2 Parallelization of ChFSI for GPUs by using OpenACC directives;
NASCA 2013 Calais, France, June 24th
Folie 27
NASCA 2013 Calais, France, June 24th
Folie 28