A MATLAB interface for PRIMME for solving Eigenvalue and Singular - - PowerPoint PPT Presentation

a matlab interface for primme for solving eigenvalue and
SMART_READER_LITE
LIVE PREVIEW

A MATLAB interface for PRIMME for solving Eigenvalue and Singular - - PowerPoint PPT Presentation

SIAM CSE 2013 A MATLAB interface for PRIMME for solving Eigenvalue and Singular Value problems Andreas Stathopoulos and Lingfei Wu Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE SciDAC


slide-1
SLIDE 1

SIAM CSE 2013

A MATLAB interface for PRIMME for solving Eigenvalue and Singular Value problems Andreas Stathopoulos and Lingfei Wu Computer Science Department College of William and Mary

Acknowledgment: National Science Foundation, DOE SciDAC

slide-2
SLIDE 2

The problems A large, sparse, Hermitian matrix: Find nev eigenvalues and corresponding eigenvectors Axi = λixi A large, sparse, N ×M matrix Find nev singular values and corresponding left and right singular vectors Avi = σiui SVD problem solved as a Hermitian eigenvalue problem on Normal equations ATA or AAT, Augmented matrix [0 A;AT 0]

[ 2 ]

slide-3
SLIDE 3

Available software Lanczos-based, no preconditioning

  • ARPACK (Sorensen, Lehoucq)
  • TRLAN (Wu, Simon)
  • Industrial strength Lanczos (Grimes, Lewis, Simon)

Preconditioned eigensolver packages with various methods

  • ANASAZI (Baker, Thornquist, Lehoucq, Hetmaniuk)
  • PRIMME (AS, J.M.)
  • SLEPc (Roman et al.)

Specific method preconditioned eigensolver software

  • BLOPEX (Knyazev)
  • JADAMILU (Bollhoefer, Notay)

[ 3 ]

slide-4
SLIDE 4

A.S. and J. R. McCombs (2006) PRIMME: PReconditioned Iterative MultiMethod Eigensolver

  • Near optimality through GD+k and JDQMR methods
  • Over 12 methods accessible through PRIMME.
  • Dynamic choice between best methods
  • Block versions of methods
  • Interior eigenvalues too
  • Full set of defaults for non expert users
  • Full customizability for expert users
  • Parallel, high performance implementation
  • C and Fortran interfaces, real and complex
  • Accessible also in SLEPc

Download: www.cs.wm.edu/∼andreas

[ 4 ]

slide-5
SLIDE 5

PRIMME shown robust and efficient

1 2 3 4 5 6 7 8 9 10

Matvec ratios over PRIMME’s GD+1 Andrews Lap7p1M Plate33K cfd1

  • r56f

PRIMME JDQMR PRIMME GD+k Anasazi IRTR Anasazi LOBPCG Anasazi BD

2 4 6 8 10 12

Time ratios over JDqmr Andrews Lap7p1M Plate33K cfd1

  • r56f

PRIMME JDQMR PRIMME GD+k Anasazi IRTR Anasazi LOBPCG Anasazi BD

Typically: GD+1 smallest number of matrix-vector ops JDMQR lowest time (if matrix sparse enough)

[ 5 ]

slide-6
SLIDE 6

Dynamic method chooses between the fastest two algorithms

5 10 15 20 25 30 35 40 45 50 10

1

10

2

Matrix Fillet_13K: Dynamic matches best method Number of smallest eigenvalues found Time in seconds GD+k JDQMR Dynamic

[ 6 ]

slide-7
SLIDE 7

PRIMME multi-layer interface – End user #include "primme.h" primme_params primme; primme_Initialize(&primme); primme.n = n; primme.numEvals = 20; primme.matrixMatvec = MV(x,y,k) primme.applyPreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme); Usually achieves full potential of the method

[ 7 ]

slide-8
SLIDE 8

PRIMME multi-layer interface – Advanced user

#include "primme.h" primme_params primme; primme.

  • utputFile

= stdout printLevel = 5 numEvals = 10 aNorm = 1.0 eps = 1.0e-12 maxBasisSize = 15 minRestartSize = 7 maxBlockSize = 1 maxOuterIterations = 10000 maxMatvecs = 300000 target = primme_smallest numTargetShifts = 0 targetShifts = 1.0 2.0 locking = 1 initSize = 0 numOrthoConst = 0; iseed = -1 restarting.scheme = primme_thick restarting.maxPrevRetain = 1 correction.precondition = 1 correction.robustShifts = 1 correction.maxInnerIterations = -1 correction.relTolBase = 1.5 correction.convTest = adaptive_ETolerance correction.projectors.LeftQ = 1 correction.projectors.LeftX = 1 correction.projectors.RightQ = 0 correction.projectors.SkewQ = 0 correction.projectors.RightX = 1 correction.projectors.SkewX = 1 matrixMatvec = MV(x,y,k) applyPreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme);

[ 8 ]

slide-9
SLIDE 9

PRIMME in MATLAB Benefits to PRIMME users

  • Optimized Sparse, Block Matrix-Vector (MV) function
  • Optimized libraries for Sparse matrix inversion and ILU preconditioing
  • Optimized BLAS/LAPACK libraries
  • Ease of use/development
  • Easier to build and experiment on a SVD solver

Benefits to MATLAB users

  • Availability of a preconditioned eigensolver
  • Availability of a preconditioned singular value solver
  • As robust and easy to use as eigs() but much faster

[ 9 ]

slide-10
SLIDE 10

Interface similar to eigs() [evals] = [evecs, evals] = [evecs, evals, resnorms] = [evecs, evals, resnorms, primmeStats] = primme_eigs(A) primme_eigs(A, numEvals) primme_eigs(A, numEvals, target) primme_eigs(A, numEvals, target, opts) primme_eigs(A, numEvals, target, opts, eigsMethod) primme_eigs(A, numEvals, target, opts, eigsMethod, P) primme_eigs(A, numEvals, target, opts, eigsMethod, P1,P2) primme_eigs(A, numEvals, target, opts, eigsMethod, Pfun) primme_eigs(Afun, dim,...) We allow Afun even for smallest magnitude eigenvalues

[ 10 ]

slide-11
SLIDE 11

PRIMME vs PRIMME MEX vs eigs() No preconditioner

  • Five smallest eigenvalues — eigs(Afun) to avoid inversion
  • PRIMME compiled -O3 including SparseMatVec,BLAS,LAPACK
  • PRIMME MEX uses MATLAB’s SparseMatVec, BLAS, LAPACK

0.5 1 1.5 2 2.5 3

Time ratios over MEX JDqmr A n d r e w s C

  • n

e A P l a t e 3 3 K f i n a n 5 1 2 c f d 1 3 D s p e c t r a l w a v e 2

PRIMME JDQMR PRIMME GD+k PRIMME MEX JDQMR PRIMME MEX GD+k eigs()

  • MATLAB Library benefits
  • PRIMME MEX faster than

eigs()

[ 11 ]

slide-12
SLIDE 12

Performance of MATLAB’s Block Sparse Matvec

5 10 15 20 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Blocksize Time ratio Time(k MVs) / Time(block k MV) Andrews Cone_A finan512 Plate33K_A0 cfd1 3Dspectralwave2

Block size of 2 biggest gain. Larger block no much better.

[ 12 ]

slide-13
SLIDE 13

PRIMME MEX with block size of 2 In Block Krylov methods # Matvecs increase almost linearly with block size Using the MATLAB routines, block=2 offers better robustness with only small additional expense

5 10 15 20 25 0.8 1 1.2 1.4 1.6 1.8 2 Number of eigenvalues of 3dspectralwave2 Ratio over non−blocking PRIMME mex: Time(block=k)/Time(block=1) block = 2 block = 3 block = 4 5 10 15 20 25 1 1.2 1.4 1.6 1.8 2 2.2 Number of eigenvalues of Andrews Ratio over non−blocking PRIMME mex: Time(block=k)/Time(block=1) block = 2 block = 3 block = 4 [ 13 ]

slide-14
SLIDE 14

Using PRIMME for SVD

  • Allow access to full functionality of PRIMME
  • Allow choice of normal equations (ATA) or augmented matrix B (OAAO)
  • Allow for various preconditioning techniques:

– P ≈ (ATA)−1 or P ≈ B−1 =

  • A−1

A−T

  • directly

– P ≈ A−1 explicitly or through ILU: P = U−1L−1 ≈ A−1. Use as: PPT ≈ (ATA)−1 or as 0 P PT 0

  • ≈ B−1

(matrix products not formed) – Pfun user provided function for implementing any of the above P

[ 14 ]

slide-15
SLIDE 15

Using PRIMME for SVD [S] = [U, S, V] = [U, S, V, norms, primmeout] = primme_svds(A) primme_svds(A, numSvs) primme_svds(A, numSvs, target) primme_svds(A, numSvs, target, opts) primme_svds(A, numSvs, target, opts, eigsMethod) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, P) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, P1,P2) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, Pfun) primme_svds(Afun, M, N,...)

[ 15 ]

slide-16
SLIDE 16

What threshold to converge to? Let (ui,σi,vi) approximate singular triplet. Define: rv = Avi −σiui ru = ATui −σivi rATA = ATAvi −σ 2

i vi

rB = B v u

  • −σi

v u

  • If vi = 1, ui = Avi/σi = 1, then rv = 0 and

ru = rATA σi = rB √ 2 Want to guarantee ru < δA, so converge each method to rATA < σi δ A rB < √ 2 δ A

[ 16 ]

slide-17
SLIDE 17

Which SVD method? Convergence speed issue ATA very fast for largest SVs (squared gaps) slow for smallest but still much faster than OAAO OAAO slower for largest eigenvalues extremely slow and not robust for smallest (interior) SVs Accuracy issue OAAO can converge up to Aεmach ATA can only converge up to A2εmach ⇒ for small σi cannot reach needed σiδA, if δ < εmachA/σi Our PRIMME SVDS solution: Use ATA to residual threshold max

  • σiδA,A2εmach
  • Use OAAO to improve the ATA approximations to the required threshold

[ 17 ]

slide-18
SLIDE 18

Accuracy limit and dynamic switching

500 1000 1500 2000 2500 3000 10

−10

10

−5

10 10

5

Matvecs ||AT u − σ v||

A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4

OAAO ATA to limit ATA to eps||A||

2

switch to OAAO

[ 18 ]

slide-19
SLIDE 19

Performance of primme svds small matrices

2 3 4 5 6 Ratio Time of svds(B−1) / primme_svds(A−1A−T) p d e 2 9 6 1 d w 2 4 8 b w m 2 c − 1 9 s a y l

  • r

4 1 4 10 numSvals

When A = LU or A ≈ LU, store LT,UT for faster memory access. Still less memory than svds inverting B.

[ 19 ]

slide-20
SLIDE 20

Performance of primme svds medium size matrices — 1 smallest sv

10

−1

10 10

1

10

2

10

3

Seconds Larger matrices where inversion possible −− nsvs = 1 cage11 κ=12 N=39K finan512 κ=98 N=75K torso3 κ=5.8e2 N=259K wang3 κ=6.2e3 N=26K wang4 κ=4.0e5 N=26K chipcool0 κ=8.3e6 N=20K SVDS(B−1) primme_svds(A−1A−T) primme_svds(ILU precond)

Shift-invert speedup factor 4 When preconditioning effective speedup 1000.

[ 20 ]

slide-21
SLIDE 21

Performance of primme svds medium size matrices — 10 smallest svs

10 10

1

10

2

10

3

Seconds Larger matrices where inversion possible −− nsvs = 10 cage11 κ=12 N=39K finan512 κ=98 N=75K torso3 κ=5.8e2 N=259K wang3 κ=6.2e3 N=26K wang4 κ=4.0e5 N=26K chipcool0 κ=8.3e6 N=20K SVDS(B−1) primme_svds(A−1A−T) primme_svds(ILU precond)

Preconditioned benefits decrease when more values needed Shift-invert still speedup 2-4

[ 21 ]

slide-22
SLIDE 22

Performance of primme svds large size matrices — 4 smallest svs Factorization not possible, PRIMME the only alternative matrix: cage14 dielFilterV2real G3 circuit N 1,505,785 1,157,456 1,585,478 κ(A) 12 6.0E7 2.2E7 MV Sec MV Sec MV Sec ilu(A) – 1.3 – 5.8 – 0.2 primme svds 128 61.2 3494 4198 101595 33077

[ 22 ]

slide-23
SLIDE 23

Conclusions

  • MATLAB interface to PRIMME
  • MATLAB’s optimized libraries and sparse Matvec improve PRIMME
  • PRIMME svds() more flexible and efficient than svds()
  • Dynamic switching of ATA and OAAO methods in primme svds()

[ 23 ]

slide-24
SLIDE 24

Performance of primme svds No preconditioning — 10 largest svs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 A n d r e w s f i n a n 5 1 2 m a t r i x 9 n l p k k t 8 Ratio over eigs() Time of primme_svds() / time of eigs()

[ 24 ]

slide-25
SLIDE 25

Performance of primme svds medium size matrices — 4 smallest svs

10 10

1

10

2

10

3

Seconds Larger matrices where inversion possible −− nsvs = 4 cage11 κ=12 N=39K finan512 κ=98 N=75K torso3 κ=5.8e2 N=259K wang3 κ=6.2e3 N=26K wang4 κ=4.0e5 N=26K chipcool0 κ=8.3e6 N=20K SVDS(B−1) primme_svds(A−1A−T) primme_svds(ILU precond)

[ 25 ]