A Robust and Efficient Parallel SVD Solver Based on Restarted - - PowerPoint PPT Presentation

a robust and efficient parallel svd solver based on
SMART_READER_LITE
LIVE PREVIEW

A Robust and Efficient Parallel SVD Solver Based on Restarted - - PowerPoint PPT Presentation

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation A Robust and Efficient Parallel SVD Solver Based on Restarted Lanczos Bidiagonalization Jose E. Roman Universidad Polit ecnica de Valencia, Spain (joint work with V.


slide-1
SLIDE 1

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

A Robust and Efficient Parallel SVD Solver Based on Restarted Lanczos Bidiagonalization

Jose E. Roman

Universidad Polit´ ecnica de Valencia, Spain (joint work with V. Hernandez and A. Tomas)

Harrachov 2007

slide-2
SLIDE 2

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Outline

1

Overview of SLEPc Summary of Functionality The SVD in SLEPc

2

Restarted Lanczos Bidiagonalization Lanczos Bidiagonalization Dealing with Loss of Orthogonality Restarted Bidiagonalization Enhancements for Parallel Efficiency

3

Evaluation Numerical Robustness Parallel Performance

slide-3
SLIDE 3

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

What Users Need

Provided by PETSc

◮ Abstraction of mathematical objects: vectors and matrices ◮ Efficient linear solvers (direct or iterative) ◮ Easy programming interface ◮ Run-time flexibility, full control over the solution process ◮ Parallel computing, mostly transparent to the user

Provided by SLEPc

◮ State-of-the-art eigensolvers and SVD solvers ◮ Spectral transformations

slide-4
SLIDE 4

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Summary

PETSc: Portable, Extensible Toolkit for Scientific Computation Software for the scalable (parallel) solution of algebraic systems arising from partial differential equation (PDE) simulations

◮ Developed at Argonne National Lab since 1991 ◮ Usable from C, C++, Fortran77/90 ◮ Focus on abstraction, portability, interoperability ◮ Extensive documentation and examples ◮ Freely available and supported through email

Current version: 2.3.3 (released May 2007) http://www.mcs.anl.gov/petsc

slide-5
SLIDE 5

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Summary

SLEPc: Scalable Library for Eigenvalue Problem Computations A general library for solving large-scale sparse eigenproblems on parallel computers

◮ For standard and generalized eigenproblems ◮ For real and complex arithmetic ◮ For Hermitian or non-Hermitian problems

Current version: 2.3.3 (released June 2007) http://www.grycap.upv.es/slepc

slide-6
SLIDE 6

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

PETSc/SLEPc Numerical Components

PETSc

Vectors Index Sets

Indices Block Indices Stride Other

Matrices

Compressed Sparse Row Block Compressed Sparse Row Block Diagonal Dense Other

Preconditioners

Additive Schwarz Block Jacobi Jacobi ILU ICC LU Other

Krylov Subspace Methods

GMRES CG CGS Bi-CGStab TFQMR Richardson Chebychev Other

Nonlinear Systems

Line Search Trust Region Other

Time Steppers

Euler Backward Euler Pseudo Time Step Other

SLEPc

SVD Solvers

Cross Product Cyclic Matrix Lanczos Thick Res. Lanczos

Eigensolvers

Krylov-Schur Arnoldi Lanczos Other

Spectral Transform

Shift Shift-and-invert Cayley Fold

slide-7
SLIDE 7

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Singular Value Decomposition (SVD)

A = UΣV ∗ where

◮ A is an m × n rectangular matrix ◮ U = [u1, u2, . . . , um] is a m × m unitary matrix ◮ V = [v1, v2, . . . , vn] is a n × n unitary matrix ◮ Σ is a m × n diagonal matrix with entries Σii = σi ◮ σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0 ◮ If A is real, U and V are real and orthogonal ◮ Each (σi, ui, vi) is called a singular triplet

slide-8
SLIDE 8

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thin SVD

A = Un Σn V ∗

n

In SLEPc we compute a partial SVD, that is, only a subset of the singular triplets

slide-9
SLIDE 9

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

SVD Solvers Based on EPS

Cross product matrix

A∗Ax = λx AA∗y = λy The eigenvalues are λi = σ2

i and the eigenvectors xi = vi or

yi = ui

Cyclic matrix

H(A)x = λx H(A) = A A∗

  • The eigenvalues are ±σi and the eigenvectors

1 √ 2

±ui vi

slide-10
SLIDE 10

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Basic Usage

Usual steps for solving an SVD problem with SLEPc:

  • 1. Create an SVD object
  • 2. Define the SVD problem
  • 3. (Optionally) Specify options for the solution
  • 4. Run the SVD solver
  • 5. Retrieve the computed solution
  • 6. Destroy the SVD object

All these operations are done via a generic interface, common to all the SVD solvers

slide-11
SLIDE 11

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Simple Example

SVD svd; /* singular value solver context */ Mat A; /* matrix */ Vec u, v; /* singular vectors */ PetscReal sigma; /* singular value */ SVDCreate(PETSC_COMM_WORLD, &svd); SVDSetOperator(svd, A); SVDSetFromOptions(svd); SVDSolve(svd); SVDGetConverged(svd, &nconv); for (i=0; i<nconv; i++) { SVDGetSingularTriplet(svd, i, &sigma, u, v); } SVDDestroy(svd);

slide-12
SLIDE 12

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Run-Time Examples

% program -svd_type lanczos -svd_tol 1e-12 -svd_max_it 200 % program -svd_type trlanczos -svd_nsv 4 % program -svd_type cross -svd_eps_type krylovschur

  • svd_ncv 30 -svd_smallest
  • svd_monitor_draw

% program -svd_type cyclic -svd_eps_type arpack

  • svd_st_type sinvert -svd_st_shift 1

% mpirun -np 16 program ...

slide-13
SLIDE 13

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Bidiagonalization

Compute the SVD in two stages [Golub and Kahan, 1965]:

  • 1. A = PBQ∗

A = P B Q∗

  • 2. B = XΣY ∗, with U = PX and V = QY

Lanczos bidiagonalization computes part of the info: Pk, Bk, Qk → Ritz approximations: ˜ σi, ˜ ui = Pkxi, ˜ vi = Qkyi

slide-14
SLIDE 14

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Lanczos Bidiagonalization

Equating the first k columns AQk = PkBk A∗Pk = QkB∗

k + βkqk+1e∗ k

Bk =          α1 β1 α2 β2 α3 β3 ... ... αk−1 βk−1 αk          αj =p∗

jAqj

βj =p∗

jAqj+1

Double recursion: αjpj = Aqj − βj−1pj−1, βjqj+1 = A∗pj − αjqj

slide-15
SLIDE 15

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Golub-Kahan-Lanczos Bidiagonalization

Golub-Kahan-Lanczos algorithm

Choose a unit-norm vector q1 Set β0 = 0 For j = 1, 2, . . . , k pj = Aqj − βj−1pj−1 αj = pj2 pj = pj/αj qj+1 = A∗pj − αjqj βj = qj+12 qj+1 = qj+1/βj end Loss of orthogonality is an issue

slide-16
SLIDE 16

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Algorithm with Orthogonalization

Lanczos bidiagonalization with orthogonalization

Choose a unit-norm vector q1 For j = 1, 2, . . . , k pj = Aqj Orthogonalize pj with respect to Pj−1 αj = pj2 pj = pj/αj qj+1 = A∗pj Orthogonalize qj+1 with respect to Qj βj = qj+12 qj+1 = qj+1/βj end

slide-17
SLIDE 17

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

One-Sided Orthogonalization

Orthogonalizing right vectors is enough [Simon and Zha, 2000]

One-Sided Lanczos bidiagonalization

Choose a unit-norm vector q1 Set β0 = 0 For j = 1, 2, . . . , k pj = Aqj − βj−1pj−1 αj = pj2 pj = pj/αj qj+1 = A∗pj Orthogonalize qj+1 with respect to Qj βj = qj+12 qj+1 = qj+1/βj end

slide-18
SLIDE 18

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Restarted Bidiagonalization

Required k can be arbitrarily large (slow convergence, many singular triplets) Problems: storage and computational effort Solution: restart the computation when a certain k is reached Explicit restart: re-run with a “better” q1 (e.g. use Ritz vector associated to the first value) Thick restart: a better alternative that avoids to explicitly compute a new initial vector Idea: keep ℓ-dimensional subspace with relevant spectral information [Baglama and Reichel, 2005]

slide-19
SLIDE 19

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thick-Restart Lanczos Bidiagonalization

Compact Lanczos Bidiagonalization of order ℓ + 1: A ˜ Qℓ+1 = ˜ Pℓ+1 ˜ Bℓ+1 A∗ ˜ Pℓ+1 = ˜ Qℓ+1 ˜ B∗

ℓ+1 + ˜

βℓ+1˜ qℓ+2e∗

k+1

˜ Qℓ+1 = [˜ v1, ˜ v2, . . . , ˜ vℓ, qk+1] ˜ vi = Qkyi right Ritz vectors Residual of full decomposition ˜ Pℓ+1 = [˜ u1, ˜ u2, . . . , ˜ uℓ, ˜ pℓ+1] ˜ ui = Pkxi left Ritz vectors New left initial vector ˜ pℓ+1 = f/f, f = Aqk+1 − ℓ

i=1 ˜

ρi˜ ui, ˜ αℓ+1 = f ˜ qℓ+2 = g/g, g = A∗˜ pℓ+1 − ˜ αℓ+1qk+1, ˜ βℓ+1 = g

slide-20
SLIDE 20

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thick-Restart Lanczos Bidiagonalization

Compact Lanczos Bidiagonalization of order ℓ + 1: A ˜ Qℓ+1 = ˜ Pℓ+1 ˜ Bℓ+1 A∗ ˜ Pℓ+1 = ˜ Qℓ+1 ˜ B∗

ℓ+1 + ˜

βℓ+1˜ qℓ+2e∗

k+1

˜ Bℓ+1 =        ˜ σ1 ˜ ρ1 ˜ σ2 ˜ ρ2 ... . . . ˜ σℓ ˜ ρℓ ˜ αℓ+1        ˜ σi are Ritz values ˜ pℓ+1 = f/f, f = Aqk+1 − ℓ

i=1 ˜

ρi˜ ui, ˜ αℓ+1 = f

slide-21
SLIDE 21

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thick-Restart Lanczos Bidiagonalization

The decomposition can be extended to order k and the Lanczos relations are maintained ˜ Bk =               ˜ σ1 ˜ ρ1 ˜ σ2 ˜ ρ2 ... . . . ˜ σℓ ˜ ρℓ ˜ αℓ+1 βℓ+1 ... ... αk−1 βk−1 αk              

slide-22
SLIDE 22

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thick-Restart Lanczos Bidiagonalization

Thick-restarted Lanczos bidiagonalization

Input: Matrix A, initial unit-norm vector q1, and number of steps k Output: ℓ ≤ k Ritz triplets

  • 1. Build an initial Lanczos bidiagonalization of order k
  • 2. Compute Ritz approximations of the singular triplets
  • 3. Truncate to a Lanczos bidiagonalization of order ℓ
  • 4. Extend to a Lanczos bidiagonalization of order k
  • 5. If not satisfied, go to step 2
slide-23
SLIDE 23

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thick-Restart Lanczos Bidiagonalization

One-Sided Lanczos bidiagonalization – restarted

pℓ+1 = Aqℓ+1 − B1:ℓ,ℓ+1Pℓ αℓ+1 = pℓ+12, pℓ+1 = pℓ+1/αℓ+1 For j = ℓ + 1, ℓ + 2, . . . , k qj+1 = A∗pj Orthogonalize qj+1 with respect to Qj βj = qj+12 qj+1 = qj+1/βj If j < k pj+1 = Aqj+1 − βjpj αj+1 = pj+12 pj+1 = pj+1/αj+1 end end

slide-24
SLIDE 24

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Enhancements for Parallel Efficiency

In general, eigensolvers require high-quality orthogonality for numerical robustness

◮ Classical Gram-Schmidt with selective refinement (DGKS

criterion) In parallel computations, the number of synchronizations should be reduced to a minimum [Hernandez et al., 2007]

◮ Estimation of the norm ◮ Delayed normalization

slide-25
SLIDE 25

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Enhancements for Parallel Efficiency

For j = ℓ + 1, ℓ + 2, . . . , k qj+1 = A∗pj c = Q∗

jqj+1

ρ = qj+12 αj = pj2 pj = pj/αj qj+1 = qj+1/αj c = c/αj ρ = ρ/αj qj+1 = qj+1 − Qjc βj =

  • ρ2 − j

i=1 c2 j

If βj < ηρ c = Q∗

jqj+1

ρ = qj+12 qj+1 = qj+1 − Qjc βj =

  • ρ2 − j

i=1 c2 j

end qj+1 = qj+1/βj If j < k pj+1 = Aqj+1 − βjpj end end DGKS criterion Estimation

  • f the norm

Delayed normalization

slide-26
SLIDE 26

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Numerical Robustness Evaluation

Compute the 10 largest singular values of 232 matrices available at MatrixMarket site Solver settings:

◮ restarting with a maximum of 30 basis vectors ◮ stopping criteria with a tolerance of 10−7

slide-27
SLIDE 27

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Maximum Relative Residual

ξ =

  • Av − σu2

2 + AT u − σv2 2

σ

  • u2

2 + v2 2

10-15 10-10 10-5 100 105 Cross product (Krylov-Schur) 10-15 10-10 10-5 100 105 Thick restarted Lanczos 10-15 10-10 10-5 100 105 Thick restarted Lanczos one-side

slide-28
SLIDE 28

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Parallel Performance

Compute the 10 largest singular values of two matrices, restarting with a maximum of 30 basis vectors

Speed-up

Calculated as the ratio of elapsed time with p processors to elapsed time of the fastest algorithm with one processor Sp = Tp T1

Computer used

◮ Cluster of 55 Pentium Xeon biprocessors with SCI interconnect

Only one processor per node was used

slide-29
SLIDE 29

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Performance in Xeon cluster (1)

AF23560 matrix

Order 23,560 Non-zeros 484,256 Sparsity 0.0087 % Largest matrix from Matrix Market NEP collection

10 20 30 40 50 10 20 30 40 50 Speed-up Number of nodes Cross product (KS) TR Lanczos one-side TR Lanczos Ideal

slide-30
SLIDE 30

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Performance in Xeon cluster (2)

PRE2 matrix

Order 659,033 Non-zeros 5,834,044 Sparsity 0.0013 % Non-symmetric matrix from University of Florida sparse matrix collection

10 20 30 40 50 10 20 30 40 50 Speed-up Number of nodes Cross product (KS) TR Lanczos one-side TR Lanczos Ideal

slide-31
SLIDE 31

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Conclusion

Two specific SVD solvers in SLEPc

◮ lanczos: explicit-restart Lanczos bidiagonalization ◮ trlanczos: thick-restart Lanczos bidiagonalization

One-sided orthogonalization available in both cases Performance:

◮ As efficient as Krylov-Schur on cross-product matrix A∗A ◮ Slightly more robust numerically ◮ Presumably more accurate in small singular values

However, small singular values are difficult to converge: need to implement harmonic or refined projection

slide-32
SLIDE 32

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

References

◮ G. H. Golub and W. Kahan (1965).

Calculating the Singular Values and Pseudo-Inverse of a Matrix.

  • J. Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2:205–224.

◮ J. Baglama and L. Reichel (2005).

Augmented Implicitly Restarted Lanczos Bidiagonalization Methods. SIAM J. Sci. Comput., 27(1):19–42.

◮ H. D. Simon and H. Zha (2000).

Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications. SIAM J. Sci. Comput., 21(6):2257–2274.

◮ V. Hernandez, J. E. Roman, and A. Tomas (2007).

Parallel Arnoldi Eigensolvers with Enhanced Scalability via Global ... Parallel Comput., 33(7–8):521–540.

slide-33
SLIDE 33

Overview of SLEPc Restarted Lanczos Bidiagonalization Evaluation

Thanks! http://www.grycap.upv.es/slepc slepc-maint@grycap.upv.es