SLIDE 1
A MATLAB interface for PRIMME for solving Eigenvalue and Singular - - PowerPoint PPT Presentation
A MATLAB interface for PRIMME for solving Eigenvalue and Singular - - PowerPoint PPT Presentation
SIAM CSE 2013 A MATLAB interface for PRIMME for solving Eigenvalue and Singular Value problems Andreas Stathopoulos and Lingfei Wu Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE SciDAC
SLIDE 2
SLIDE 3
Available software Lanczos-based, no preconditioning
- ARPACK (Sorensen, Lehoucq)
- TRLAN (Wu, Simon)
- Industrial strength Lanczos (Grimes, Lewis, Simon)
Preconditioned eigensolver packages with various methods
- ANASAZI (Baker, Thornquist, Lehoucq, Hetmaniuk)
- PRIMME (AS, J.M.)
- SLEPc (Roman et al.)
Specific method preconditioned eigensolver software
- BLOPEX (Knyazev)
- JADAMILU (Bollhoefer, Notay)
[ 3 ]
SLIDE 4
A.S. and J. R. McCombs (2006) PRIMME: PReconditioned Iterative MultiMethod Eigensolver
- Near optimality through GD+k and JDQMR methods
- Over 12 methods accessible through PRIMME.
- Dynamic choice between best methods
- Block versions of methods
- Interior eigenvalues too
- Full set of defaults for non expert users
- Full customizability for expert users
- Parallel, high performance implementation
- C and Fortran interfaces, real and complex
- Accessible also in SLEPc
Download: www.cs.wm.edu/∼andreas
[ 4 ]
SLIDE 5
PRIMME shown robust and efficient
1 2 3 4 5 6 7 8 9 10
Matvec ratios over PRIMME’s GD+1 Andrews Lap7p1M Plate33K cfd1
- r56f
PRIMME JDQMR PRIMME GD+k Anasazi IRTR Anasazi LOBPCG Anasazi BD
2 4 6 8 10 12
Time ratios over JDqmr Andrews Lap7p1M Plate33K cfd1
- r56f
PRIMME JDQMR PRIMME GD+k Anasazi IRTR Anasazi LOBPCG Anasazi BD
Typically: GD+1 smallest number of matrix-vector ops JDMQR lowest time (if matrix sparse enough)
[ 5 ]
SLIDE 6
Dynamic method chooses between the fastest two algorithms
5 10 15 20 25 30 35 40 45 50 10
1
10
2
Matrix Fillet_13K: Dynamic matches best method Number of smallest eigenvalues found Time in seconds GD+k JDQMR Dynamic
[ 6 ]
SLIDE 7
PRIMME multi-layer interface – End user #include "primme.h" primme_params primme; primme_Initialize(&primme); primme.n = n; primme.numEvals = 20; primme.matrixMatvec = MV(x,y,k) primme.applyPreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme); Usually achieves full potential of the method
[ 7 ]
SLIDE 8
PRIMME multi-layer interface – Advanced user
#include "primme.h" primme_params primme; primme.
- utputFile
= stdout printLevel = 5 numEvals = 10 aNorm = 1.0 eps = 1.0e-12 maxBasisSize = 15 minRestartSize = 7 maxBlockSize = 1 maxOuterIterations = 10000 maxMatvecs = 300000 target = primme_smallest numTargetShifts = 0 targetShifts = 1.0 2.0 locking = 1 initSize = 0 numOrthoConst = 0; iseed = -1 restarting.scheme = primme_thick restarting.maxPrevRetain = 1 correction.precondition = 1 correction.robustShifts = 1 correction.maxInnerIterations = -1 correction.relTolBase = 1.5 correction.convTest = adaptive_ETolerance correction.projectors.LeftQ = 1 correction.projectors.LeftX = 1 correction.projectors.RightQ = 0 correction.projectors.SkewQ = 0 correction.projectors.RightX = 1 correction.projectors.SkewX = 1 matrixMatvec = MV(x,y,k) applyPreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme);
[ 8 ]
SLIDE 9
PRIMME in MATLAB Benefits to PRIMME users
- Optimized Sparse, Block Matrix-Vector (MV) function
- Optimized libraries for Sparse matrix inversion and ILU preconditioing
- Optimized BLAS/LAPACK libraries
- Ease of use/development
- Easier to build and experiment on a SVD solver
Benefits to MATLAB users
- Availability of a preconditioned eigensolver
- Availability of a preconditioned singular value solver
- As robust and easy to use as eigs() but much faster
[ 9 ]
SLIDE 10
Interface similar to eigs() [evals] = [evecs, evals] = [evecs, evals, resnorms] = [evecs, evals, resnorms, primmeStats] = primme_eigs(A) primme_eigs(A, numEvals) primme_eigs(A, numEvals, target) primme_eigs(A, numEvals, target, opts) primme_eigs(A, numEvals, target, opts, eigsMethod) primme_eigs(A, numEvals, target, opts, eigsMethod, P) primme_eigs(A, numEvals, target, opts, eigsMethod, P1,P2) primme_eigs(A, numEvals, target, opts, eigsMethod, Pfun) primme_eigs(Afun, dim,...) We allow Afun even for smallest magnitude eigenvalues
[ 10 ]
SLIDE 11
PRIMME vs PRIMME MEX vs eigs() No preconditioner
- Five smallest eigenvalues — eigs(Afun) to avoid inversion
- PRIMME compiled -O3 including SparseMatVec,BLAS,LAPACK
- PRIMME MEX uses MATLAB’s SparseMatVec, BLAS, LAPACK
0.5 1 1.5 2 2.5 3
Time ratios over MEX JDqmr A n d r e w s C
- n
e A P l a t e 3 3 K f i n a n 5 1 2 c f d 1 3 D s p e c t r a l w a v e 2
PRIMME JDQMR PRIMME GD+k PRIMME MEX JDQMR PRIMME MEX GD+k eigs()
- MATLAB Library benefits
- PRIMME MEX faster than
eigs()
[ 11 ]
SLIDE 12
Performance of MATLAB’s Block Sparse Matvec
5 10 15 20 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Blocksize Time ratio Time(k MVs) / Time(block k MV) Andrews Cone_A finan512 Plate33K_A0 cfd1 3Dspectralwave2
Block size of 2 biggest gain. Larger block no much better.
[ 12 ]
SLIDE 13
PRIMME MEX with block size of 2 In Block Krylov methods # Matvecs increase almost linearly with block size Using the MATLAB routines, block=2 offers better robustness with only small additional expense
5 10 15 20 25 0.8 1 1.2 1.4 1.6 1.8 2 Number of eigenvalues of 3dspectralwave2 Ratio over non−blocking PRIMME mex: Time(block=k)/Time(block=1) block = 2 block = 3 block = 4 5 10 15 20 25 1 1.2 1.4 1.6 1.8 2 2.2 Number of eigenvalues of Andrews Ratio over non−blocking PRIMME mex: Time(block=k)/Time(block=1) block = 2 block = 3 block = 4 [ 13 ]
SLIDE 14
Using PRIMME for SVD
- Allow access to full functionality of PRIMME
- Allow choice of normal equations (ATA) or augmented matrix B (OAAO)
- Allow for various preconditioning techniques:
– P ≈ (ATA)−1 or P ≈ B−1 =
- A−1
A−T
- directly
– P ≈ A−1 explicitly or through ILU: P = U−1L−1 ≈ A−1. Use as: PPT ≈ (ATA)−1 or as 0 P PT 0
- ≈ B−1
(matrix products not formed) – Pfun user provided function for implementing any of the above P
[ 14 ]
SLIDE 15
Using PRIMME for SVD [S] = [U, S, V] = [U, S, V, norms, primmeout] = primme_svds(A) primme_svds(A, numSvs) primme_svds(A, numSvs, target) primme_svds(A, numSvs, target, opts) primme_svds(A, numSvs, target, opts, eigsMethod) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, P) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, P1,P2) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, Pfun) primme_svds(Afun, M, N,...)
[ 15 ]
SLIDE 16
What threshold to converge to? Let (ui,σi,vi) approximate singular triplet. Define: rv = Avi −σiui ru = ATui −σivi rATA = ATAvi −σ 2
i vi
rB = B v u
- −σi
v u
- If vi = 1, ui = Avi/σi = 1, then rv = 0 and
ru = rATA σi = rB √ 2 Want to guarantee ru < δA, so converge each method to rATA < σi δ A rB < √ 2 δ A
[ 16 ]
SLIDE 17
Which SVD method? Convergence speed issue ATA very fast for largest SVs (squared gaps) slow for smallest but still much faster than OAAO OAAO slower for largest eigenvalues extremely slow and not robust for smallest (interior) SVs Accuracy issue OAAO can converge up to Aεmach ATA can only converge up to A2εmach ⇒ for small σi cannot reach needed σiδA, if δ < εmachA/σi Our PRIMME SVDS solution: Use ATA to residual threshold max
- σiδA,A2εmach
- Use OAAO to improve the ATA approximations to the required threshold
[ 17 ]
SLIDE 18
Accuracy limit and dynamic switching
500 1000 1500 2000 2500 3000 10
−10
10
−5
10 10
5
Matvecs ||AT u − σ v||
A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4
OAAO ATA to limit ATA to eps||A||
2
switch to OAAO
[ 18 ]
SLIDE 19
Performance of primme svds small matrices
2 3 4 5 6 Ratio Time of svds(B−1) / primme_svds(A−1A−T) p d e 2 9 6 1 d w 2 4 8 b w m 2 c − 1 9 s a y l
- r
4 1 4 10 numSvals
When A = LU or A ≈ LU, store LT,UT for faster memory access. Still less memory than svds inverting B.
[ 19 ]
SLIDE 20
Performance of primme svds medium size matrices — 1 smallest sv
10
−1
10 10
1
10
2
10
3
Seconds Larger matrices where inversion possible −− nsvs = 1 cage11 κ=12 N=39K finan512 κ=98 N=75K torso3 κ=5.8e2 N=259K wang3 κ=6.2e3 N=26K wang4 κ=4.0e5 N=26K chipcool0 κ=8.3e6 N=20K SVDS(B−1) primme_svds(A−1A−T) primme_svds(ILU precond)
Shift-invert speedup factor 4 When preconditioning effective speedup 1000.
[ 20 ]
SLIDE 21
Performance of primme svds medium size matrices — 10 smallest svs
10 10
1
10
2
10
3
Seconds Larger matrices where inversion possible −− nsvs = 10 cage11 κ=12 N=39K finan512 κ=98 N=75K torso3 κ=5.8e2 N=259K wang3 κ=6.2e3 N=26K wang4 κ=4.0e5 N=26K chipcool0 κ=8.3e6 N=20K SVDS(B−1) primme_svds(A−1A−T) primme_svds(ILU precond)
Preconditioned benefits decrease when more values needed Shift-invert still speedup 2-4
[ 21 ]
SLIDE 22
Performance of primme svds large size matrices — 4 smallest svs Factorization not possible, PRIMME the only alternative matrix: cage14 dielFilterV2real G3 circuit N 1,505,785 1,157,456 1,585,478 κ(A) 12 6.0E7 2.2E7 MV Sec MV Sec MV Sec ilu(A) – 1.3 – 5.8 – 0.2 primme svds 128 61.2 3494 4198 101595 33077
[ 22 ]
SLIDE 23
Conclusions
- MATLAB interface to PRIMME
- MATLAB’s optimized libraries and sparse Matvec improve PRIMME
- PRIMME svds() more flexible and efficient than svds()
- Dynamic switching of ATA and OAAO methods in primme svds()
[ 23 ]
SLIDE 24
Performance of primme svds No preconditioning — 10 largest svs
0.1 0.2 0.3 0.4 0.5 0.6 0.7 A n d r e w s f i n a n 5 1 2 m a t r i x 9 n l p k k t 8 Ratio over eigs() Time of primme_svds() / time of eigs()
[ 24 ]
SLIDE 25