An optimized subspace iteration eigensolver applied to sequences of - - PowerPoint PPT Presentation

an optimized subspace iteration eigensolver applied to
SMART_READER_LITE
LIVE PREVIEW

An optimized subspace iteration eigensolver applied to sequences of - - PowerPoint PPT Presentation

Mitglied der Helmholtz-Gemeinschaft An optimized subspace iteration eigensolver applied to sequences of dense eigenproblems in ab initio simulations PMAA14, Lugano, Switzerland. July 4th E. Di Napoli and M. Berljafa Motivation: electronic


slide-1
SLIDE 1

Mitglied der Helmholtz-Gemeinschaft

An optimized subspace iteration eigensolver applied to sequences

  • f dense eigenproblems in ab initio

simulations

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa
slide-2
SLIDE 2

Motivation: electronic structure computation

Full-potential Linearized Augmented Plane Waves (FLAPW) self-consistent field cycle

Initial guess for charge density

nstart(r)

Compute discretized Kohn-Sham equations Solve a set of eigenproblems

P(ℓ)

k1 ...P(ℓ) kN

Compute new charge density

n(ℓ)(r)

Converged?

|n(ℓ) −n(ℓ−1)| < η

OUTPUT Electronic structure,

... No Yes

1 every P(ℓ)

k

: A(ℓ)

k x = B(ℓ) k λx is a generalized eigenvalue problem;

2 A and B are DENSE and hermitian (B is positive definite); 3 required: lower 2÷10 % of eigenpairs; 4 momentum vector index: k = 1 : 10÷100; 5 iteration cycle index: ℓ = 1 : 20÷50.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 2

slide-3
SLIDE 3

Outline

Sequences of correlated eigenproblems The algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) ChFSI parallelization and numerical tests

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 3

slide-4
SLIDE 4

Outline

Sequences of correlated eigenproblems The algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) ChFSI parallelization and numerical tests

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 4

slide-5
SLIDE 5

Sequences of eigenproblems

Definitions and solving strategies

Definition: Eigenproblem sequence

A sequence of eigenproblems is a finite and index-ordered set of problems {P}N . = P(1) ···P(ℓ) ···P(N) with same size = n such that the eigenpairs of P(ℓ) are used (directly or indirectly) to initialize P(ℓ+1).

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 5

slide-6
SLIDE 6

Sequences of eigenproblems

Definitions and solving strategies

Definition: Eigenproblem sequence

A sequence of eigenproblems is a finite and index-ordered set of problems {P}N . = P(1) ···P(ℓ) ···P(N) with same size = n such that the eigenpairs of P(ℓ) are used (directly or indirectly) to initialize P(ℓ+1).

Current solving strategy

The set of generalized eigenproblems P(1) ...P(ℓ)P(ℓ+1) ...P(N) is handled as a set of disjoint problems N ×P; Each problem P(ℓ) is solved independently using a direct solver as a black-box from a standard library (i.e. ScaLAPACK).

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 5

slide-7
SLIDE 7

Sequences of Eigenproblems

Adjacent iteration cycles

ITERATION (ℓ) P(ℓ)

k1

(X(ℓ)

k1 ,Λ(ℓ) k1 )

P(ℓ)

k2

(X(ℓ)

k2 ,Λ(ℓ) k2 )

P(ℓ)

kN

(X(ℓ)

kN ,Λ(ℓ) kN )

X ≡ {x1,...,xn}

direct solver direct solver direct solver

ITERATION (ℓ+1) P(ℓ+1)

k1

(X(ℓ+1)

k1

,Λ(ℓ+1)

k1

) P(ℓ+1)

k2

(X(ℓ+1)

k2

,Λ(ℓ+1)

k2

) P(ℓ+1)

kN

(X(ℓ+1)

kN

,Λ(ℓ+1)

kN

) Λ ≡ diag(λ1,...,λn)

direct solver direct solver direct solver

Next cycle

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 6

slide-8
SLIDE 8

Sequences of Eigenproblems

Adjacent iteration cycles

ITERATION (ℓ) P(ℓ)

k1

(X(ℓ)

k1 ,Λ(ℓ) k1 )

P(ℓ)

k2

(X(ℓ)

k2 ,Λ(ℓ) k2 )

P(ℓ)

kN

(X(ℓ)

kN ,Λ(ℓ) kN )

X ≡ {x1,...,xn}

direct solver direct solver direct solver

ITERATION (ℓ+1) P(ℓ+1)

k1

(X(ℓ+1)

k1

,Λ(ℓ+1)

k1

) P(ℓ+1)

k2

(X(ℓ+1)

k2

,Λ(ℓ+1)

k2

) P(ℓ+1)

kN

(X(ℓ+1)

kN

,Λ(ℓ+1)

kN

) Λ ≡ diag(λ1,...,λn)

direct solver direct solver direct solver

Next cycle

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 6

slide-9
SLIDE 9

Sequences of Eigenproblems

Adjacent iteration cycles

ITERATION (ℓ) P(ℓ)

k1

(X(ℓ)

k1 ,Λ(ℓ) k1 )

P(ℓ)

k2

(X(ℓ)

k2 ,Λ(ℓ) k2 )

P(ℓ)

kN

(X(ℓ)

kN ,Λ(ℓ) kN )

X ≡ {x1,...,xn}

direct solver direct solver direct solver

ITERATION (ℓ+1) P(ℓ+1)

k1

(X(ℓ+1)

k1

,Λ(ℓ+1)

k1

) P(ℓ+1)

k2

(X(ℓ+1)

k2

,Λ(ℓ+1)

k2

) P(ℓ+1)

kN

(X(ℓ+1)

kN

,Λ(ℓ+1)

kN

) Λ ≡ diag(λ1,...,λn)

direct solver direct solver direct solver

Next cycle

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 6

slide-10
SLIDE 10

Correlation between eigenproblems

Definition and solving strategies

Definition: Correlation

Two adjacent problems P(ℓ+1) and P(ℓ) are said to be correlated when the eigenpairs (X(ℓ+1),Λ(ℓ+1)) are in some relation with the eigenpairs (X(ℓ),Λ(ℓ)).

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 7

slide-11
SLIDE 11

Correlation between eigenproblems

Definition and solving strategies

Definition: Correlation

Two adjacent problems P(ℓ+1) and P(ℓ) are said to be correlated when the eigenpairs (X(ℓ+1),Λ(ℓ+1)) are in some relation with the eigenpairs (X(ℓ),Λ(ℓ)).

Alternative solving strategy

Consider the entire sequence of eigenproblems P(1),...,P(N) as one self-contained problem; Exploit information provided from the correlation; In FLAPW, correlation appears in the way angles b/w eigenvectors of adjacent eigenproblems evolve [EDN, Blügel and Bientinesi 2012] Θ(ℓ)

ki ≡ {θ1,...,θn} = diag

  • 1 −X(ℓ−1)

ki

, ˜ X(ℓ)

ki

  • for fixed ki

θ(2)

j

θ(3)

j

··· θ(N)

j

: θ(2)

j

≫ θ(N)

j

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 7

slide-12
SLIDE 12

An alternative solving strategy

Adjacent cycles

ITERATION (ℓ) P(ℓ)

k1

(X(ℓ)

k1 ,Λ(ℓ) k1 )

P(ℓ)

k2

(X(ℓ)

k2 ,Λ(ℓ) k2 )

P(ℓ)

kN

(X(ℓ)

kN ,Λ(ℓ) kN )

X ≡ {x1,...,xn}

iterative solver iterative solver iterative solver

ITERATION (ℓ+1) P(ℓ+1)

k1

(X(ℓ+1)

k1

,Λ(ℓ+1)

k1

) P(ℓ+1)

k2

(X(ℓ+1)

k2

,Λ(ℓ+1)

k2

) P(ℓ+1)

kN

(X(ℓ+1)

kN

,Λ(ℓ+1)

kN

) Λ ≡ diag(λ1,...,λn)

iterative solver iterative solver iterative solver

Next cycle

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 8

slide-13
SLIDE 13

Outline

Sequences of correlated eigenproblems The algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) ChFSI parallelization and numerical tests

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 9

slide-14
SLIDE 14

Chebyshev Filtered Subspace Iteration method (ChFSI)

Properties and algorithm evolution

Iterative solver musts input: the full set of multiple starting vectors Z0 ≡ X(ℓ−1)

ki

(:,1 : NEV); needed: it can efficiently use dense linear algebra kernels (i.e. xGEMM); needed: it avoids stalling when facing small clusters of eigenvalues; Chebyshev Subspace Iteration Firstly introduced in [Rutishauser 1969] A version (called CheFSI) tailored to electronic structure computation in [Zhou, Saad, Tiago and Chelikowski 2006] for sparse eigenvalue problems. Our ChFSI: 1) is tailored for dense eigenproblem sequences, 2) introduces a locking mechanism, 3) contains a refining inner loop, and 4) optimizes the polynomial degree.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 10

slide-15
SLIDE 15

ChFSI pseudocode

INPUT: Hamiltonian, approximate eigenvectors – Z0, extreme eigenvalues {λ1,λNEV}, TOL, DEG. OUTPUT: NEV wanted eigenpairs (Λ,W).

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 11

slide-16
SLIDE 16

ChFSI pseudocode

INPUT: Hamiltonian, approximate eigenvectors – Z0, extreme eigenvalues {λ1,λNEV}, TOL, DEG. OUTPUT: NEV wanted eigenpairs (Λ,W).

1 Lanczos step. Identify the bounds for the eigenspectrum interval

corresponding to the wanted eigenspace.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 11

slide-17
SLIDE 17

ChFSI pseudocode

INPUT: Hamiltonian, approximate eigenvectors – Z0, extreme eigenvalues {λ1,λNEV}, TOL, DEG. OUTPUT: NEV wanted eigenpairs (Λ,W).

1 Lanczos step. Identify the bounds for the eigenspectrum interval

corresponding to the wanted eigenspace. REPEAT UNTIL CONVERGENCE:

2 Chebyshev filter. Filter a block of vectors W ←

− Z0.

3 Re-orthogonalize the vectors outputted by the filter; W = QR. 4 Compute the Rayleigh quotient G = Q†HQ. 5 Compute the primitive Ritz pairs (Λ,Y) by solving for GY = YΛ. 6 Compute the approximate Ritz pairs (Λ,W ← QY). 7 Check which one among the Ritz vectors converged. 8 Deflate and lock the converged vectors.

END REPEAT

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 11

slide-18
SLIDE 18

The core of the algorithm: Chebyshev filter

The basic principle

Theorem

Let |γ| > 1 and Pm denote the set of polynomials of degree smaller or equal to m. Then the extremum

min

p∈Pm,p(γ)=1 max t∈[−1,1]|p(t)|

is reached by

pm(t) . = Cm(t) Cm(γ).

where Cm is the Chebyshev polynomial of the first kind of order m, defined as

Cm(t) = cos(marccos(t)), t ∈ [−1,1]; cosh(marccosh(t)), |t| > 1.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 12

slide-19
SLIDE 19

The core of the algorithm: Chebyshev filter

Chebyshev polynomials

A generic vector v = ∑n

i=1 sixi is very quickly aligned in the direction of the

eigenvector corresponding to the extremal eigenvalue λ1

vm = pm(A)v =

n

i=1

si pm(A)xi =

n

i=1

si pm(λi)xi = s1x1 +

n

i=2

si Cm( λi−c

e )

Cm( λ1−c

e

) xi ∼ s1x1

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 13

slide-20
SLIDE 20

The core of the algorithm: Chebyshev filter

In practice

Three-terms recurrence relation

Cm+1 (t) = 2xCm (t)−Cm−1 (t); m ∈ N, C0 (t) = 1, C1 (t) = x Zm . = pm( ˜ H) Z0 with ˜ H = H −cIn FOR: i = 1 → DEG −1 Zi+1 ← 2σi+1 e ˜ H × Zi −σi+1σi Zi−1 xGEMM END FOR.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 14

slide-21
SLIDE 21

Polynomial degree optimization

Convergence ratio and residuals

Definition

The convergence ratio for the eigenvector xi corresponding to eigenvalue λi / ∈ [α,β] is defined as τ(λi) = |ρi|−1 = min

±

  • λi −c

e ± λi −c e 2 −1

  • .

The further away λi is from the interval [α,β] the smaller is |ρi|−1 and the faster the convergence to xi is.

For a set of input vectors V = {v1,v2,...,vnev}

Residuals are a function of m and |ρ|

Res(vm

i ) ∼

Const×

  • 1

ρi

  • m

1 ≤ i ≤ k. Res(vm+m0

i

) ≈ Res(vm0

i )

  • 1

ρi

  • m

⇒ mi ≥ ln

  • TOL

Res(v

m0 i

)

  • /lnρi

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 15

slide-22
SLIDE 22

ChFSI Single Optimization pseudocode

1 Chebyshev filter. Initial filter W ←

− Z0. with DEG= m0.

2 Re-orthogonalize W = QR & compute the Rayleigh quotient G = Q†HQ. 3 Solve the reduced problem GY = YΛ and compute the approximate Ritz

pairs (Λ,W ← QY) and store their residuals Res(wi).

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 16

slide-23
SLIDE 23

ChFSI Single Optimization pseudocode

1 Chebyshev filter. Initial filter W ←

− Z0. with DEG= m0.

2 Re-orthogonalize W = QR & compute the Rayleigh quotient G = Q†HQ. 3 Solve the reduced problem GY = YΛ and compute the approximate Ritz

pairs (Λ,W ← QY) and store their residuals Res(wi). REPEAT UNTIL CONVERGENCE:

4 Optimizer. Compute the polynomial degrees mi ≥ ln

  • TOL

Res(wi)

  • /lnρi.

5 Chebyshev filter. Filter W ←

− Z0 with DEG= mi.

6 Re-orthogonalize W = QR & compute the Rayleigh quotient G = Q†HQ. 7 Solve the reduced problem GY = YΛ and compute the approximate Ritz

pairs (Λ,W ← QY).

8 lock the converged vectors. 9 Store the residuals Res(wi) of the unconverged vectors.

END REPEAT

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 16

slide-24
SLIDE 24

Outline

Sequences of correlated eigenproblems The algorithm: Chebyshev Filtered Sub-space Iteration method (ChFSI) ChFSI parallelization and numerical tests

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 17

slide-25
SLIDE 25

Experimental tests setup

C++ implementation of ChFSI

OMPChFSI – OpenMP parallelization for shared memory EleChFSI – Elemental (MPI) parallelization for distributed memory Matrix sizes: 2,600 ÷ 29,500. Tests were performed on the JUROPA cluster. 2 Intel Xeon 5570 (Nehalem-EP) quad-core processors/node, 2.93GHz; 24 GB/node;

THEORETICAL PEAK PERFORMANCE/CORE=11.71 GFLOPS;

Minimum absolute tolerance of residuals Res(xi) = 10−08 ÷10−10; All numerical data are MEDIAN values over 12 distinct measurements.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 18

slide-26
SLIDE 26

Distributed memory: “Elemental” framework

The core of the library is the two-dimensional cyclic element-wise matrix distribution. p MPI processes are logically viewed as a two-dimensional r ×c process grid with p = r ×c. The matrix A = [aij] is distributed over the grid in such a way that the process (s,t) owns the matrix. As,t =    aγ,δ aγ,δ+c ... aγ+r,δ aγ+r,δ+c ... . . . . . .   , γ ≡ (s+σr) mod r and δ ≡ (t +σc) mod c, σr and σc are arbitrarily chosen alignment parameters.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 19

slide-27
SLIDE 27

Distributed memory: “Elemental” framework

The overall performance is influenced by grid shape and algorithmic block size. Grid Shape

For a given number p > 1 of processors there are several possible choices for r and c making a grid shape (r,c) = r ×c. It can have a significant impact on the overall performance and is usually influenced by the algorithm.

Algorithmic block size

Algorithmic block size: size of blocks of input data, correlated to the square root of the L2 cache. Not only depends on the algorithm itself, but it is also affected by the architecture.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 19

slide-28
SLIDE 28

Distributed memory: “Elemental” framework

ℓ = 20, size n = 13,379, and number of sought after eigenpairs nev = 972. The problem was solved with EleChFSI using 16, 32, and 64 cores, all possible grid shapes (r, c) and three distinct algorithmic block sizes.

( 1 , 1 6 ) ( 2 , 8 ) ( 4 , 4 ) ( 8 , 2 ) ( 1 6 , 1 ) ( 1 , 3 2 ) ( 2 , 1 6 ) ( 4 , 8 ) ( 8 , 4 ) ( 1 6 , 2 ) ( 3 2 , 1 ) ( 1 , 6 4 ) ( 2 , 3 2 ) ( 4 , 1 6 ) ( 8 , 8 ) ( 1 6 , 4 ) ( 3 2 , 2 ) ( 6 4 , 1 ) Grid Shape 100 200 300 400 500 Time [sec] 16 Cores 32 Cores 64 Cores The interplay between algorithmic block size and grid shape. Algorithmic blocksize 64 Algorithmic blocksize 128 Algorithmic blocksize 256

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 19

slide-29
SLIDE 29

ChFSI time profile

As a function of iteration cycles

Time spent in each stage of the algorithm as a function of the iteration index ℓ for a system of size n = 9,273. 3 6 9 12 iteration index ℓ 5 10 15 20 25 time [s] convergence filter Rayleigh - Ritz QR

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 20

slide-30
SLIDE 30

Speed-up

Speed-up =

CPU time (input random vectors) CPU time (input approximate eigenvectors)

3 7 11 15 19 23 Iteration Index

  • 1

2 3 4 Speed-up Au98Ag10 - n =13,379 - 128 cores.

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 21

slide-31
SLIDE 31

Scalability

Strong Scalability (the size of the eigenproblems are kept fixed while the number of cores is progressively increased) for EleChFSI over two systems of size n = 13,379−12,455−9,273 respectively. 16 32 64 128 256 number of cores 10 102 103 time [s] 196.0 138.0 36.0 102.0 71.0 19.2 55.7 38.2 10.8 384.0 269.0 69.8 30.2 21.4 6.5 Au98Ag10 TiO2 Na15Cl14Li

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 22

slide-32
SLIDE 32

EleChFSI versus direct solvers (parallel MRRR)

For the size of eigenproblems here tested the ScaLAPACK implementation

  • f BXINV or MRRR is on par of worse than EleMRRR. For this reason a

direct comparison with ScaLAPACK is not included. 3 6 9 12 eigensystem index ℓ 5 10 15 20 25 30 time [s] EleMRRR EleChFSI, 1e-10 EleChFSI OPT, 1e-10 64 cores 128 cores

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 23

slide-33
SLIDE 33

EleChFSI versus direct solvers (parallel MRRR)

For the size of eigenproblems here tested the ScaLAPACK implementation

  • f BXINV or MRRR is on par of worse than EleMRRR. For this reason a

direct comparison with ScaLAPACK is not included.

620 1358 2037 25 50 75 100 125 150 175 200 time [s] ℓ =4 620 1358 2037 number of eigenpairs sought after ℓ =11 620 1358 2037 ℓ =18 EleMRRR ChFSI, 1e-10 ChFSI, 1e-08

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 23

slide-34
SLIDE 34

Some numerical instabilities

10 20 30 40 50 60 10

−11

10

−10

10

−9

10

−8

10

−7

10

−6

Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 60) Eigenpairs residuals

Y0 = Y

sol + 1e−09 * randn1

TOL = 1e−10 nev = 100 nex = 20 10 20 30 40 50 60 10

−11

10

−10

10

−9

10

−8

10

−7

10

−6

Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 60) Eigenpairs residuals

Y0 = Y

sol + 1e−09 * randn2

TOL = 1e−10 nev = 100 nex = 20 10 20 30 40 50 60 10

−16

10

−14

10

−12

10

−10

10

−8

10

−6

Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 60) Eigenpairs residuals

Y0 = Y

sol + 1e−09 * randn1

TOL = 1e−14 nev = 100 nex = 20 5 10 15 20 25 30 35 10

−16

10

−14

10

−12

10

−10

10

−8

10

−6

Evolution of residuals for the lowest 100 eigenpairs for 10−9 perturbation Polynomial degree (5 −> 37) Eigenpairs residuals

Y0 = Y

sol + 1e−09 * randn1

TOL = 1e−10 nev = 100 nex = 40

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 24

slide-35
SLIDE 35

Some numerical instabilities

5 10 15 20 25 30 35 40 45 50 55 60 10

−11

10

−10

10

−9

10

−8

10

−7

10

−6 Evolution of residuals for 4 eigenpairs among the lowest 100

Polynomial degree (5 −> 60) Eigenpairs residuals

Res(vm) = a * 1 / |i|m + b * |1|m/|i|m

5 10 15 20 25 30 35 40 45 50 55 60 10

−11

10

−10

10

−9

10

−8

10

−7

10

−6 Evolution of residuals for 4 eigenpairs among the lowest 100

Polynomial degree (5 −> 60) Eigenpairs residuals

Res(vm) = a * 1 / |i|m * sqrt(1 + b * |1|2m )

There are probably two effects that contribute to these instabilities: Early locking deprives the search space of needed components for the convergence of the remaining eigenpairs In the re-orthogonalization there are inexact cancellations against small vanishing values. Such “approximate” zeros are multiplied by |ρ1|m

|ρi|m ≫ 1

In the above example: b ∼ 10−10 or b ∼ 10−13

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 24

slide-36
SLIDE 36

Conclusions and future work

Algorithmic strategy

Sequences of “correlated” eigenproblems ⇒ Tailored algorithms Exploiting the correlation of the eigenproblem sequence to speedup the solution of each P(ℓ) is a successful strategy; Combining iterative methods with kernels for dense linear algebra can pay off. The parallelization shows great potential for scalability and parallel efficiency; Uncovering information can lead to further algorithmic optimizations; ONGOING AND FUTURE WORK

1 Fine-tuning filter optimization by profiling eigenvector degrees to

convergence along the sequence so as to further reduce complexity;

2 Exploiting different architectures for parallelization (GPUs, Xeon Phi).

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 25

slide-37
SLIDE 37

References

1 EDN and M.Berliafa An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems Under review for Concurrency and Computation: Practice and Experience. 2 EDN, and M. Berljafa A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science, Vol 8385. [arXiv:1305.5120] 3 EDN, and M. Berljafa Block Iterative Eigensolvers for Sequences of Correlated Eigenvalue Problems

  • Comp. Phys. Comm. 184 (2013), pp. 2478-2488, [arXiv:1206.3768].

4 EDN, P . Bientinesi, and S. Blügel, Correlation in sequences of generalized eigenproblems arising in Density Functional Theory,

  • Comp. Phys. Comm. 183 (2012), pp. 1674-1682, [arXiv:1108.2594].

PMAA14, Lugano, Switzerland. July 4th

  • E. Di Napoli and M. Berljafa

Folie 26