PMAA July 4, 2014 Accurate computation of smallest singular values - - PowerPoint PPT Presentation

pmaa
SMART_READER_LITE
LIVE PREVIEW

PMAA July 4, 2014 Accurate computation of smallest singular values - - PowerPoint PPT Presentation

PMAA July 4, 2014 Accurate computation of smallest singular values using the PRIMME eigensolver Lingfei Wu Andreas Stathopoulos Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE Scidac


slide-1
SLIDE 1

PMAA

July 4, 2014

Accurate computation of smallest singular values using the PRIMME eigensolver Lingfei Wu Andreas Stathopoulos Computer Science Department College of William and Mary

Acknowledgment: National Science Foundation, DOE Scidac

slide-2
SLIDE 2

The ubiquitous SVD Mostly for largest singular values

  • Graph/Data/Text mining
  • Image-processing
  • Model reduction in PDEs
  • Variance reduction in Monte Carlo

[ 2 ]

slide-3
SLIDE 3

The ubiquitous SVD Mostly for largest singular values

  • Graph/Data/Text mining
  • Image-processing
  • Model reduction in PDEs
  • Variance reduction in Monte Carlo

But also for smallest singular values

  • similar applications but for functions of matrices (e.g., A−1).

low rank preconditioning variance reduction

  • computation of pseudospectrum
  • sensitivity analysis

[ 3 ]

slide-4
SLIDE 4

Solving the large, sparse SVD problem Find k smallest singular values and corresponding left and right singular vectors

  • f large, sparse A ∈ ℜm×n

Avi = σiui, σ1 ≤ ... ≤ σk Solution approaches:

  • A Hermitian eigenvalue problem on

– Normal equations matrix C = ATA or C = AAT – Augmented matrix B = 0 AT A 0

  • Lanczos bidiagonalization method (LBD)

A = PBdQT, where Bd = XΣY T, U = PX and V = QY

  • Jacobi-Davidson for SVD (JDSVD)

uses B but projection based on two subspaces

[ 4 ]

slide-5
SLIDE 5

Differences between methods Asymptotic Krylov

  • Theorem. A Krylov method computing σ 2

n on C is twice as fast asymptotically

than for computing σn on B

  • Theorem. A Krylov method computing σ 2

1 on C is always faster asymptotically

than for computing σ1 on B.

[ 5 ]

slide-6
SLIDE 6

Differences between methods Krylov degree Lanczos on C builds: Vk = Kk(ATA,v1) LBD builds: Vk = Kk(ATA,v1), Uk = Kk(AAT,Av1) JDSVD builds: Uk = Kk/2(AAT,u1)⊕Kk/2(AAT,Av1) Vk = Kk/2(ATA,v1)⊕Kk/2(ATA,ATu1) Lanczos on B builds Kk(B,[v1;u1]). If u1 = 0, Uk Vk

  • =
  • Kk/2(ATA,v1)

Kk/2(AAT,Av1)

  • .

LBD and Lanczos on C twice the degree of augmented approaches

[ 6 ]

slide-7
SLIDE 7

Differences between methods Accuracy Squaring C = ATA (even implicitly) means problems LBD on A achieves r of O(Aεmach) Eigenmethod on B achieves r of O(Aεmach) Eigenmethod on C achieves r only up to O(κ(A)Aεmach) Only LBD seems to be both efficient and accurate...

[ 7 ]

slide-8
SLIDE 8

LBD Advantages of LBD:

  • Same space as Lanczos on C but accurate since working on A
  • Lanczos global convergence to many singular triplets

Drawbacks of LBD:

  • Orthogonality loss, large memory demands ⇒ restart
  • Interior spectral info ⇒ Harmonic/refined procedures

⇒ irregular and slow convergence

  • No general purpose software for recent advances in LBD
  • Cannot use preconditioning

[ 8 ]

slide-9
SLIDE 9

JDSVD Advantages of JDSVD:

  • Can take advantage of preconditioning
  • More flexible that eigenmethods on B
  • Accurate

Drawbacks of JDSVD:

  • Correction equation on B maximally indefinite
  • Outer iteration slow
  • Only MATLAB implementation

[ 9 ]

slide-10
SLIDE 10

What is missing in real world software Extremely challenging task for small SVs:

  • large sparse matrix ⇒ no Shift-Invert
  • slow convergence ⇒ effective restarting and preconditioning
  • Currently only unpreconditioned SVD software:

– SVDPACK: Lanczos and tracemin on B or C only largest SVs – PROPACK: LBD for largest SVs, Shift-Invert for smallest SVs – SLEPc: no preconditioning, still in development ⇒ calls for full functionality, highly-optimized SVD solver PRIMME: PReconditioned Iterative MultiMethod Eigensolver

[ 10 ]

slide-11
SLIDE 11

Building an SVD solver on PRIMME PRIMME: PReconditioned Iterative MultiMethod Eigensolver

  • Over 12 eigenmethods including near optimal GD+k and JDQMR methods
  • Extensive support for interior eigenvalues

Accepts initial guesses for all required eigenvectors Accepts many shifts and finds the closest eigenvalue to each shift

  • Accepts any preconditioner for M ≈ C−1 or M ≈ B−1, or

if M ≈ A−1, uses MMT ≈ C−1 and M MT 0

  • ≈ B−1

if M ≈ (ATA)−1 (e.g., RIF), uses MMT ≈ C−1 and

  • AM

MAT

  • ≈ B−1
  • Robust framework: subspace acceleration, locking, block methods
  • Parallel, high performance implementation

[ 11 ]

slide-12
SLIDE 12

Methods to use. Why not LBD?

500 1000 1500 2000 2500 3000 3500 10

−10

10

−8

10

−6

10

−4

10

−2

10 10

2

Matvecs ||AT u − σ v||

pde2961: smallest singular values without restarting

GD on C GD on B JDSVD LANSVD ||A||*1e−10

[ 12 ]

slide-13
SLIDE 13

Restarting changes the game

1000 2000 3000 4000 5000 6000 7000 10

−10

10

−8

10

−6

10

−4

10

−2

10 10

2

Matvecs ||AT u − σ v||

pde2961: smallest singular values with restarting

GD+1 on C GD+1 on B JDSVD+1−Refined IRRHLB ||A||*1e−10

[ 13 ]

slide-14
SLIDE 14

primme svds: A two-stage strategy

  • GD+k on C extremal evs and near optimal restart best for a few smallest SVs
  • or JDQMR on C if A is very sparse
  • Limited by accuracy

Our solution: a hybrid, two-stage singular value method

  • Stage I: work on C to residual tolerance max
  • σiδuserA,A2εmach
  • Stage II: work on B to improve the approximations from C to δuserA

[ 14 ]

slide-15
SLIDE 15

primme svds: an example

500 1000 1500 2000 2500 3000 10

−10

10

−5

10 10

5

Matvecs ||AT u − σ v||

A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4

OAAO ATA to limit ATA to eps||A||

2

switch to OAAO

[ 15 ]

slide-16
SLIDE 16

Stage I of primme svds (C)

  • Flexible stopping criterion

Let ( ˜ σ, ˜ u, ˜ v) be a targeted singular triplet of A rv = A˜ v− ˜ σ ˜ u, ru = AT ˜ u− ˜ σ ˜ v, rC = C ˜ v− ˜ σ 2 ˜ v, rB = B ˜ v ˜ u

  • − ˜

σ ˜ v ˜ u

  • .

If vi = 1, ui = Avi/σi = 1, then rv = 0 and ru = rC ˜ σ = rB √ 2 Thus, the stopping criterion for the methods on C becomes, δC = max(δuser ˜ σ/A,εmach)

  • Explicit Rayleigh-Ritz of converged eigenpairs

Improves directions of individual eigenvectors

[ 16 ]

slide-17
SLIDE 17

Stage II of primme svds (B) Excellent initial guesses (shifts, eigenvectors) from C ⇒ Use JDQMR Subspace acceleration and optimal stopping in JDQMR ⇒ Better than iterative refinment Irregular convergence of Rayleigh Ritz (RR) on B ⇒ Enhance PRIMME with Refined Projection (minBVy− ˜ σVy) QR only column update per step QR factorization only at restart

[ 17 ]

slide-18
SLIDE 18

Implementation of primme svds

  • Developed PRIMME MEX, a MATLAB interface for PRIMME
  • UI extends MATLAB’s eigs() and svds() to PRIMME’s full functionality
  • External stopping criterion, refined projection implemented in PRIMME

[ 18 ]

slide-19
SLIDE 19

Evaluation: Test matrices

Square Matrix dw2048 fidap4 jagmesh8 lshp3025

  • rder

2048 1601 1141 3025 κ(A) 5.3e3 5.2e3 5.9e4 2.2e5 A2 1.0e0 1.6e0 6.8e0 7.0e0 Rectangular Matrix well1850 plddb lp bnl2 rows m: 1850 3049 2324 cols n: 712 5069 4486 κ(A) 1.1e2 1.2e4 7.8e3 A2 1.8e0 1.4e2 2.1e2

We compare against:

  • JDSVD (Hochstenbach, 2001)
  • IRRHLB (Jia, 2010)
  • SVDIFP (Ye, 2014)
  • IRLBA (Baglama, 2005)
  • MATLAB svds() (ARPACK, 1998)

[ 19 ]

slide-20
SLIDE 20

No preconditioning — first stage only — 1 or 10 smallest triplets

1 2 3 4 5 6 7 x 10

4

lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 Number of Matrix−Vectors 1 smallest w/o preconditioning, tol = 1e−8 primme_svds IRRHLB JDSVD IRLBA SVDIFP 2 4 6 8 x 10

4

lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 Number of Matrix−Vectors 10 smallest w/o preconditioning, tol = 1e−8 primme_svds IRRHLB JDSVD IRLBA SVDIFP

primme svds: fastest and most robust

[ 20 ]

slide-21
SLIDE 21

No preconditioning — 1e-14 accuracy — 1 or 10 smallest triplets

2 4 6 8 x 10

4

lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 Number of Matrix−Vectors 1 smallest w/o preconditioning, tol = 1e−14 primme_svds(gd+k) primme_svds(jdqmr) IRRHLB JDSVD IRLBA SVDIFP 2 4 6 8 10 x 10

4

l s h p 3 2 5 j a g m e s h 8 d w 2 4 8 l p b n 1 2 p l d d b w e l l 1 8 5 Number of Matrix−Vectors 10 smallest w/o preconditioning, tol = 1e−14 primme_svds(gd+k) primme_svds(jdqmr) IRRHLB JDSVD IRLBA SVDIFP

primme svds: fastest and most robust at high accuracy

[ 21 ]

slide-22
SLIDE 22

With preconditioning — 1e-8/1e-14 — 1 smallest triplets

500 1000 1500 2000 2500 f i d a p 4 j a g m e s h 8 l s h p 3 2 5 d e t e r 4 p l d d b l p b n l 2 Number of Matrix−Vectors 10 smallest with P = ilutp(1e−3) or RIF(1e−3), tol = 1e−8 primme_svds JDSVD SVDIFP 1000 2000 3000 4000 fidap4 jagmesh8 lshp3025 deter4 plddb lpbnl2 Number of Matrix−Vectors 10 smallest with P = ilutp(1e−3) or RIF(1e−3), tol = 1e−14 primme_svds(gd+k) primme_svds(jdqmr) JDSVD SVDIFP

primme svds: significantly more robust and efficient Note JDSVD’s problem with rectangular matrices

[ 22 ]

slide-23
SLIDE 23

With shift and invert operator

200 400 600 800 1000 1200 f i d a p 4 j a g m e s h 8 l s h p 3 2 5 d e t e r 4 p l d d b l p b n l 2 Number of Matrix−Vectors Shift and Invert technique for 10 smallest primme_svds(C−1) MATLAB svds(B

−1)

svdifp(C−1)

primme svds on C is faster than MATLAB svds

[ 23 ]

slide-24
SLIDE 24

Smallest triplet of very large matrices in low tolerance

Matrix debr cage14 thermal2 sls Rucci1 rows m: 1048576 1505785 1228045 1748122 1977885 cols n: 1048576 1505785 1228045 62729 109900 σ1 1.11E-20 9.52E-02 1.61E-06 9.99E-1 1.04E-03 κ(A) 3.6E+20 1.01E+1 7.48E+6 1.30E+3 6.74E+3 Preconditioner No ILU(0) ILU(1e-3) RIF(1e-3) RIF(1e-3)

δ = 1e-12 primme svds SVDIFP JDSVD Matrix Precond.time MV Sec MV Sec MV Sec debr — 539 84.3 * * 1971 474.6 cage14 2e0 19 11 33 28 111 185 thermal2 3.69e+3 419 506 – – 309 535 sls 3.29e+3 1779 170 * * – – Rucci1 6.92e+4 4728 1087 4649 6464 – – primme svds far more robust and more efficient

[ 24 ]

slide-25
SLIDE 25

Conclusions

  • primme svds: computes a few singular triplets based on PRIMME
  • Key idea: a two-stage strategy

– take advantage of faster convergence on normal equations matrix – resolve remaining accuracy by exploiting power of PRIMME on augmented matrix – Many existing or new methods could be applied at each stage

  • Robust and efficient both with and without preconditioning
  • A highly optimized production software!

[ 25 ]

slide-26
SLIDE 26

PRIMME PRIMME: PReconditioned Iterative MultiMethod Eigensolver

  • primme svds and PRIMME’s MATLAB interface will be available soon
  • C implementation of primme svds will be released with next version of PRIMME

Download: www.cs.wm.edu/∼andreas/software

[ 26 ]