LSRN: A Parallel Iterative Solver for Strongly Over- or - PowerPoint PPT Presentation

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems Xiangrui Meng Joint with Michael A. Saunders and Michael W. Mahoney Stanford University June 19, 2012 Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 1 / 27

Strongly over- or under-determined least squares We are interested in computing the unique min-length solution, denoted by x ∗ , to minimize � Ax − b � 2 , x ∈ R n where A ∈ R m × n with m ≫ n or m ≪ n and b ∈ R m . A could be rank deficient. When the system is under-determined, we may want to solve it with Tikhonov regularization: 1 2 + λ 2 � Ax − b � 2 2 � x � 2 minimize 2 , x ∈ R n where λ > 0 is a regularization parameter. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 2 / 27

Strongly rectangular data m n number of SNPs (10 6 ) number of subjects (10 3 ) SNP number of pixels in each image (10 3 ) number of images (10 8 ) TinyImages PDE number of degrees of freedom number of time steps sensor network size of sensing data number of sensors NLP number of words and n -grams number of principle components Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 3 / 27

Traditional algorithms: singular value decomposition It is well known that x ∗ = V Σ − 1 U T b , where U Σ V T is A ’s economy-sized SVD. The time complexity is O ( mn 2 + n 3 ) . Pros: ◮ High precision and robust to rank deficiency. ◮ Implemented in LAPACK. Cons: ◮ Hard to take advantage of sparsity. ◮ Hard to implement in a parallel environment. ◮ Incapable of implicit inputs. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 4 / 27

Traditional algorithms: iterative methods Iterative methods are widely used to solve large-scale sparse linear systems. Practical methods include, for example, CGLS and LSQR. Pros: ◮ Low cost per iteration. ◮ Taking A as a linear operator. ◮ Easy to implement in a parallel environment. ◮ Capable of computing approximate solutions. Cons: ◮ Hard to predict the number of iterations needed. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 5 / 27

Traditional algorithms: iterative methods The convergence rate of an iterative method is affected by the condition number of A , denoted by κ ( A ) = σ max ( A ) /σ + min ( A ) . For a family of iterative methods, we have � x ( k ) − x ∗ � A T A � k � κ ( A ) − 1 ≤ 2 . � x ( 0 ) − x ∗ � A T A κ ( A ) + 1 However, estimating κ ( A ) is generally as hard as solving the least squares problem itself. For ill-conditioned problems (e.g., κ ( A ) ≈ 10 6 ), the convergence speed would be very low. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 6 / 27

Before LSRN: random sampling (Drineas, Mahoney, and Muthukrishnan 2006) Random sample some rows of A based on its leverage scores: u i = � U i ∗ � 2 2 , i = 1 , . . . , m , where U is an orthonormal basis matrix of range ( A ) . If the sample size s > O ( n 2 log ( 1 /δ ) /ǫ 4 ) , with probability at least 1 − δ , the subsampled solution ˆ x gives an ( 1 + ǫ ) -approximation: x − b � 2 ≤ ( 1 + ǫ ) � Ax ∗ − b � 2 . � A ˆ How to compute or estimate the leverage scores? Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 7 / 27

Before LSRN: row mixing (Drineas, Mahoney, Muthukrishnan, and Sarl´ os 2007) Mix rows of A via randomized Hadamard transform to make leverage scores distributed uniformly: ˜ A = HDA , where D is a diagonal matrix with diagonal elements chosen randomly from + 1 and − 1, and H is the Hadamard matrix. Sample s = O ( d log ( nd ) /ǫ ) rows of ˜ A uniformly and solve the subsampled problem: x = ( SHDA ) † ( SHD ) b , ˆ where S is the sample matrix. With probability, say, at least 0.8, ˆ x gives an ( 1 + ǫ ) -approximation. Running time: O ( mn log m + n 3 log ( mn ) /ǫ ) . Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 8 / 27

Before LSRN: iteratively solving (Rokhlin and Tygert 2008) Combine random Givens rotations and the randomized Fourier transform for row mixing. In stead of solving the subsampled problem, use the QR factorization of the subsampled matrix to create preconditioners. Solve the preconditioned system min x � AR − 1 y − b � 2 iteratively. In theory, choosing s ≥ 4 n 2 guarantees that the condition number is at most 3 with high probability. In practice, choosing s = 4 n produced a condition number less than 3 in all their test problems. Running time: O ( mn log n + mn log ( 1 /ǫ ) + n 3 ) . Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 9 / 27

Before LSRN: Blendenpik (Avron , Maymounkov , and Toledo 2010) High-performance black-box solver. Extensive experiments on selecting randomized projections. Outperforms LAPACK on strongly over-determined problems. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 10 / 27

Before LSRN: what are missing? Rank deficiency. Sparse A or implicit A . Under-determined problems (with Tikhonov regularization). Parallel implementation. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 11 / 27

Algorithm LSRN (for strongly over-determined systems) 1: Choose an oversampling factor γ > 1, e.g., γ = 2. Set s = ⌈ γ n ⌉ . 2: Generate G = randn ( s , m ) , a Gaussian matrix. 3: Compute ˜ A = GA . 4: Compute ˜ A ’s economy-sized SVD: ˜ Σ ˜ U ˜ V T . 5: Let N = ˜ V ˜ Σ − 1 . 6: Iteratively compute the min-length solution ˆ y to minimize y ∈ R r � ANy − b � 2 . 7: Return ˆ x = N ˆ y . Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 12 / 27

Algorithm LSRN (for strongly under-determined systems) 1: Choose an oversampling factor γ > 1, e.g., γ = 2. Set s = ⌈ γ m ⌉ . 2: Generate G = randn ( n , s ) , a Gaussian matrix. 3: Compute ˜ A = AG . 4: Compute ˜ A ’s economy-sized SVD: ˜ Σ ˜ U ˜ V T . 5: Let M = ˜ U ˜ Σ − 1 . 6: Iteratively compute the min-length solution ˆ x to � M T Ax − M T b � 2 . minimize x ∈ R n 7: Return ˆ x . Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 13 / 27

Why we choose Gaussian random projection Gaussian random projection has the best theoretical result on conditioning, can be generated super fast, uses level 3 BLAS on dense matrices, speeds up automatically on sparse matrices and fast operators, still works (with an extra “allreduce” operation) when A is partitioned along its bigger dimension. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 14 / 27

Preconditioning for linear least squares Given a matrix N ∈ R n × p , consider the following least squares problem: � ANy − b � 2 . minimize y ∈ R p Denote its min-length solution by y ∗ . We proved that x ∗ = Ny ∗ if range ( N ) = range ( A T ) . Similarly, we proved that x ∗ is the min-length solution to � M T Ax − M T b � 2 , minimize x ∈ R n if range ( M ) = range ( A ) . Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 15 / 27

Theoretical properties of LSRN x = x ∗ almost surely. In exact arithmetic, ˆ The distribution of the spectrum of AN is the same as that of the pseudoinverse of a Gaussian matrix of size s × r . κ ( AN ) is independent of all the entries of A and hence κ ( A ) . � For any α ∈ ( 0 , 1 − r / s ) , we have � � � κ ( AN ) ≤ 1 + α + r / s ≥ 1 − 2 e − α 2 s / 2 , P � 1 − α − r / s where r is the rank of A . It means that, if we choose s = 2 n ≥ 2 r for a large-scale problem, we have κ ( AN ) < 6 with high probability and hence we only need around 100 iterations to reach machine precision. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 16 / 27

Tikhonov regularization If we want to solve an under-determined system with Tikhonov regularization: 1 2 + λ 2 � Ax − b � 2 2 � x � 2 2 . minimize we can first re-write it as 2 √ � �� 1 � z � z � � � minimize s.t. ( A / λ, I ) = b , � � r r 2 � � 2 which is asking for the min-length solution of an under-determined √ system. We have x ∗ = z ∗ / λ , where ( z ∗ , r ∗ ) solves the above problem. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 17 / 27

Implementation Shared memory (C++ with MATLAB interface) ◮ Multi-threaded ziggurat random number generator (Marsaglia and Tsang 2000) , generating 10 9 numbers in less than 2 seconds using 12 CPU cores. ◮ A na¨ ıve implementation of multi-threaded dense-sparse matrix multiplications. Message passing (Python) ◮ Single-threaded BLAS for matrix-matrix and matrix-vector products. ◮ Multi-threaded BLAS/LAPACK for SVD. ◮ Using the Chebyshev semi-iterative method (Golub and Varga 1961) instead of LSQR. Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel Iterative Solver June 19, 2012 18 / 27

LSRN: A Parallel Iterative Solver for Strongly Over- or - PowerPoint PPT Presentation

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems Xiangrui Meng Joint with Michael A. Saunders and Michael W. Mahoney Stanford University June 19, 2012 Meng, Saunders, Mahoney (Stanford) LSRN: A Parallel

An Iterative Solver for the Diffusion The Methods Progress So Far... Equation Alan Davidson

Strongly Connected Components Detection Strongly Connected Components A directed graph is

Strongly connected components Finding strongly-connected components A strongly connected component

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative

Systerel Smart Solver Forum Mthodes Formelles October 2014 S3 S3 for C Systerel Smart Solver

A CDCL(LA) Solver SPASS-SATT A CDCL(LA) Solver Translation: fun (=SPASS) sated (=SATT)

A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW

Parallel Iterative Poisson Solver for a Distributed Memory Architecture Eric Dow Aerospace

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

OpenSMT2 A Parallel, Interpolating SMT Solver Antti Hyvrinen, Matteo Marescotti, Leonardo Alt,

A massivelly parallel multigrid solver using PETSc for unstructured meshes on Tier0

Time-Parallel optimal control solver for parabolic equations Mohamed Kamal RIAHI, joint work

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator 18.337 Parallel Computing

Accelerate Iterative Methods Good Algorithms Mixed Precision Iterative Methods Good

homomorphic encryption with optimised hardware designs Dr Ciara Rafferty 15 January 2018 CSIT

Antiquity Civilizing Prostitute, Immoral Goddess: Women in The Epic of Gilgamesh M E E T I N G 2

Rigidity of boundary actions Kathryn Mann Brown University The PSL 2 ( R ) character variety

M5-01 M2-01a M4-10a Hurrian Vase M5-02 Diagram of Mitannian pottery M5-02a M1-01a M5-03

Reachability algorithm using zones B. Srivathsan Chennai Mathematical Institute, India In a

Under-Approximation Refinement for Timed Automata Bachelors Thesis Kevin Grimm Department of

OUTLINE Model Checking in a Nutshell Timed automata and TCTL Timed Automata, TCTL A

Shadowing Orbits for Transition Chains of Invariant Tori Clark Robinson Northwestern University