Investigation into a Parallel Singular Value Decomposition Travis - PowerPoint PPT Presentation

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael Lewis Courant Institute, New York December 18, 2012 Askham, Delong, Boston Mike Parallel SVD

The Singular Value Decomposition Given an m × n matrix A , its singular value decomposition is A = U Σ V T where U ∈ R m × m and V ∈ R n × n are orthogonal matrices and Σ ∈ R m × n is diagonal with nonnegative entries. The SVD is ubiquitous in numerical computing. Features of the decomposition The SVD can provide an optimal low-rank approximation to a matrix. Given an SVD you can form a pseudoinverse to your matrix. Example application areas Data compression. Signal processing. Pattern recognition. Certain least squares problems. Askham, Delong, Boston Mike Parallel SVD

The Bidiagonalization Step Given a matrix A , a bidiagonalization of A is a decomposition A = UBV T where, again, U ∈ R m × m and V ∈ R n × n are orthogonal matrices but the matrix B ∈ R m × n is bidiagonal. What’s the point: All further calculations can then be done on the sparse matrix B . Can be done stably using Householder reflectors. A finite time algorithm (old fashioned). Askham, Delong, Boston Mike Parallel SVD

Householder Reflectors and the Bidiagonalization Algorithm Left Householder     ˜ ˜ ˜ ˜ ˜ ˜ x x x x x x x x x x x x     x x x x x x 0 ˜ x ˜ x ˜ x x ˜ x ˜         x x x x x x 0 ˜ x x ˜ ˜ x ˜ x ˜ x     →     x x x x x x 0 x ˜ x ˜ ˜ x ˜ x ˜ x         x x x x x x 0 ˜ x ˜ x x ˜ x ˜ ˜ x 0 ˜ ˜ ˜ ˜ ˜ x x x x x x x x x x x Right Householder     x x x x x x 0 0 0 0 ˜ x ˜ x x ˜ x ˜ ˜ x ˜ x     0 ˜ ˜ ˜ ˜ ˜ 0 x x x x x         0 ˜ x x ˜ x ˜ x ˜ ˜ x 0     →     � 0 ˜ x ˜ x ˜ x x ˜ x ˜ 0 A         0 x ˜ x ˜ ˜ x ˜ x x ˜ 0 0 ˜ x ˜ x ˜ x ˜ x x ˜ 0 Askham, Delong, Boston Mike Parallel SVD

Computational Considerations Features of the algorithm Dense calculations. The application of a Housholder reflector is independent for each column. The size of the working set reduces with each step. Coding the algorithm for a GPU Reduction: calculating vector norm to get the reflector Both vector level and matrix level parallelization in applying the reflector (to do) Askham, Delong, Boston Mike Parallel SVD

Formulation of Problem We consider the following equation: B = U Σ V T for B ∈ R n × ( n +1) where   b 11 b 21 0 · · · 0   . ... .   0 b 12 b 22 .   B =  .  ... ... ... .   . 0 0 · · · 0 b 1 n b 2 n is a bidiagonal matrix. Our goal: To obtain the singular values (1) quickly, with (2) low memory usage Askham, Delong, Boston Mike Parallel SVD

Reduction of Problem We notice that this system can be reduced to solving similar subproblems, namely   B 1 0   B = b 1 k e k b 2 k e 1 0 B 2 where (1) B i are themselves bidiagonal matrices, and (2) e j denotes the standard unit vectors with 1 in the j th component This is the fundamental design of any divide and conquer method. What makes this process different is that we maintain low memory usage by not retaining the singular vectors. Askham, Delong, Boston Mike Parallel SVD

Formulation of Recursion Given that B i = U i (Σ i 0) ( V i v i ) T are the decomposition of the sub matrices, it can be shown that � � T B = ˜ ˜ U ( M 0) V ˜ v where   r 0 b 1 k l 1 b 2 k f 2   M = 0 Σ 1 0 0 0 Σ 2 where (1) l 1 is the last row in V 1 , (2) f 2 the first row in V 2 , and (3) r 0 can be similarly derived from results of the lower levels. Askham, Delong, Boston Mike Parallel SVD

Formulation of Recursion (cont’d) For simplicity, let z = ( r 0 b 1 k l 1 b 2 k f 2 ) D i = diag d i M = U M (Σ 0) V M Observe that � � T B = ˜ ˜ = U (Σ 0) ( V v ) T UU M (Σ 0) V V M ˜ v implies the singular values of M are the singular values of B the first row of V = ˜ f V M and the last row of V = ˜ lV M (and thus require O ( n ) space at any given time) Askham, Delong, Boston Mike Parallel SVD

Conclusion of Recursion This implies that by holding the (1) singular values, and (2) first and last rows from lower levels, we can backsolve our desired singular values. (Assuming we can solve for Σ and V M for a given M *cough*) Askham, Delong, Boston Mike Parallel SVD

Solving for Σ , V M Given our z and d = ( d 1 , d 2 , . . . , d n ), with d 1 ≡ 0, we observe � z 2 j f ( σ i ) = 1 + = 0 d 2 j − σ 2 i j This is known as the secular equation. Observe (assuming we permute the system to order the d i ) 0 < σ 1 < d 2 < σ 2 < . . . < σ n Given our σ i , there exists a solution to find the columns v i of V M . Askham, Delong, Boston Mike Parallel SVD

Observation on Computation The singular values can be computed independently The individual components of f = ˜ f V M and l = ˜ lV M can be computed independently by calculating each column of V M and doing the dot product. This suggests the above calculations are amenable to OpenMP (and they are) Askham, Delong, Boston Mike Parallel SVD

Finding Singular Vectors Given the singular values σ i , right singular vectors are found using a “Twisted factorization”, which is essentially an inverse power iteration. Left singular vectors are found from right singular vectors with a matrix multiply. Now we have the singular vectors for B , apply Householder reflections to get U and V for A . Recall the inverse power iteration to find the eigenvector v of a matrix A corresponding to it’s smallest eigenvalue. We pick an initial x 0 and then solve repeatedly x n +1 ˜ x n +1 = x n , x n +1 = A ˜ x n +1 || || ˜ With shift we can find eigenvectors of the eigenvalue closest to some µ , in our case σ 2 i . We solve ( B t B − σ 2 i I )˜ x = γ k e k Askham, Delong, Boston Mike Parallel SVD

Twisted Factorization The twisted factoriation allows us to do an inverse power iteration accurately and quickly. First we compute the “forward” and “backward” factorizations for each singular value. i I = P ( D 1) P t = Q ( D 2) Q t B t B − σ 2     1 0 0 0 1 0 0 . . . q 1 . . .     . . . .     1 0 . 0 1 . p 1 . . . q 2 . . .      .    ... . P = Q =     0 1 . 0 0 1 0 p 2 . . .     . .  ... ...   ... ...  . .  0 . 0   0 .  q m − 1 0 . . . 0 p m − 1 1 0 0 . . . . . . 1 D 1 = diag( D 1 1 , D 1 2 , . . . , D 1 m ) , D 2 = diag( D 2 1 , D 2 2 , . . . , D 2 m ) Askham, Delong, Boston Mike Parallel SVD

Twisted Factorization 2 Given the Forward and Backward factorizations of B t B − σ 2 i I , the Twisted factorization at k is B t B − σ 2 i I = N k D k N t k Where   1   ...   p 1     ...   1     N k = p k − 1 1 q k     ...   1     ...   q m − 1 1 D k = diag( D 1 1 , . . . , D 1 k − 1 , γ k , D 2 k +1 , . . . , D 2 m ) Where γ k = D 1 k + D 2 k − ( α 2 k + β 2 k − 1 − σ 2 i ) Askham, Delong, Boston Mike Parallel SVD

Solving the Twisted Factorization for singular vectors We solve the system N k D k N t k ˜ x = γ k e k first, which through the black magic of linear algebra is equivalent to N t k ˜ x = e k . For additional accuracy (needed specifically with clustered singular values), we do one more backsolve N k D k N t k x = ˜ x to get our final singular value x for the reduced bidiagonal matrix. y = Bx σ i gives our left singular vector for the reduced bidiagonal matrix. We still need to include the effects of the householder reflections from the bidiagonalization. A = H 1 BH t 2 = H 1 ( Y Σ X t ) H t 2 We apply the reflectors stored from the bidiagonalization part to the outputs X and Y . Askham, Delong, Boston Mike Parallel SVD

Parallelization Ideas This process is completely independent for each σ . Do the whole thing on the GPU ABANDONED IMMEDIATELY - lots of O(n) sequential calculations per work item. Things have to be stored in global, lots of global access. Can we at least do the Gamma Calculation on the GPU? O ( N 2 ) independent entries ABANDONED EVENTUALLY - way too much data transfer from the CPU to the device, O( N 2 ), not THAT many flops. Ok, let’s use OpenMP. This seemed to work. Askham, Delong, Boston Mike Parallel SVD

Investigation into a Parallel Singular Value Decomposition Travis - PowerPoint PPT Presentation

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael Lewis Courant Institute, New York December 18, 2012 Askham, Delong, Boston Mike Parallel SVD The Singular Value Decomposition Given an m n

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

CS475 / CS675 Lecture 19: July 5, 2016 Singular value decomposition Reading: [TB] Chapter 31

Investigation of Thermal Decomposition Investigation of Thermal Decomposition Process of

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Laboratory Investigation of Laboratory Investigation of Laboratory Investigation of Laboratory

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

/ Link Invariants from Braided Monoidal On the PROB of Singular Braids Categories Singular

Embodiment 4-15-16 Good old-fashioned artificial intelligence In the early days of AI, symbol

Elastic Electron Scattering for Proton Charge Radius Determination Toshimi Suda Tohoku University

Engaging the Public through Content Marketing Welcome! The webinar will begin at 2:00 p.m. CT

Outline American Foundation for Suicide Preventions Interactive Screening Program

escape from a tower. He found a rope in his cell that was half as long enough to permit him to

A F RAMEWORK FOR C OST - EFFECTIVE D EPENDENCE -B ASED D YNAMIC I MPACT A NALYSIS Haipeng Cai and

Influencing Future Of Procurement Event by CIPS Northern Emirates Agenda: - Welcome Note -

C H E M I C A L A N D A D V A N C E D M A T E R I A L I N D U S T R I E S D I V I S I O N 1