low rank approximation lecture 3
play

Low Rank Approximation Lecture 3 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Basic randomized salgorithm for low-rank approx Must read: Halko/Martinsson/Tropp2010:


  1. Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Basic randomized salgorithm for low-rank approx Must read: Halko/Martinsson/Tropp’2010: Finding Structure with Randomness... Randomized Algorithm: 1. Draw Gaussian random matrix Ω ∈ R n × r . 2. Perform block mat-vec Y = A Ω . 3. Compute (economic) QR decomposition Y = QR . 4. Form B = Q T A . 5. Return ˆ A = QB (in factorized form) Exact recovery: If A has rank r , we recover � A = A with probability 1. 2

  3. Three test matrices (a) The 100 × 100 Hilbert matrix A defined by A ( i , j ) = 1 / ( i + j − 1 ) . (b) The matrix A defined by A ( i , j ) = exp ( − γ | i − j | / n ) with γ = 0 . 1. (c) 30 × 30 diagonal matrix with diagonal entries 10 , 0 . 99 10 , 0 . 98 100 , 0 . 99 100 , 0 . 98 1 , 0 . 99 , 0 . 98 , 1 1 10 , 100 , . . . 10 0 10 -10 10 -20 0 20 40 60 80 100 Singular values of test matrices 3

  4. Randomized algorithm applied to test matrices errors measured in spectral norm: (a) Hilbert matrix, r = 5: Exact mean std 0 . 0019 0 . 0092 0 . 0099 (b) Matrix with slower decay, r = 25: Exact mean std 0 . 0034 0 . 012 0 . 002 (c) Matrix with staircase sv, r = 7: Exact mean std 0 . 010 0 . 038 0 . 025 4

  5. Randomized algorithm applied to test matrices errors measured in Frobenius norm: (a) Hilbert matrix, r = 5: Exact mean std 0 . 0019 0 . 0093 0 . 0099 (b) Matrix with slower decay, r = 25: Exact mean std 0 . 011 0 . 024 0 . 001 (c) Matrix with staircase sv, r = 7: Exact mean std 0 . 014 0 . 041 0 . 024 5

  6. Basic randomized algorithms for low-rank approx Add oversampling. (usually small) integer p Randomized Algorithm: 1. Draw standard Gaussian random matrix Ω ∈ R n × ( r + p ) . 2. Perform block mat-vec Y = A Ω . 3. Compute (economic) QR decomposition Y = QR . 4. Form B = Q T A . 5. Return � A = QB (in factorized form) Problem: ˆ A has rank r + p > r . Solution: Compress B ≈ T r ( B ) � Q T r ( B ) has rank r . Error: � Q T r ( B ) − A � = � Q T r ( B ) − QB + QB − A � �T r ( B ) − B � + � ( I − QQ T ) A � ≤ 6

  7. Basic randomized algorithms for low-rank approx Add oversampling. (usually small) integer p Randomized Algorithm: 1. Draw standard Gaussian random matrix Ω ∈ R n × ( r + p ) . 2. Perform block mat-vec Y = A Ω . 3. Compute (economic) QR decomposition Y = QR . 4. Form B = Q T A . 5. Return � A = Q T r ( B ) (in factorized form) Gold standard best rank- r approximation error: ◮ spectral norm: σ r + 1 � ◮ Frobenius norm: σ 2 r + 1 + · · · + σ 2 n . 7

  8. Randomized algorithm applied to test matrices errors measured in spectral norm: (a) Hilbert matrix, r = 5: Exact mean std 0 . 0019 0 . 0092 0 . 0099 p = 0 0 . 0019 0 . 0026 0 . 0019 p = 1 0 . 0019 0 . 0019 0 . 0001 p = 2 (b) Matrix with slower decay, r = 25: Exact mean std 0 . 0034 0 . 012 0 . 002 p = 0 0 . 0034 0 . 011 0 . 0017 p = 1 0 . 0034 0 . 010 0 . 0015 p = 2 0 . 0034 0 . 0064 0 . 0008 p = 10 0 . 0034 0 . 0037 0 . 0002 p = 25 (c) Matrix with staircase sv, r = 7: Exact mean std 0 . 010 0 . 038 0 . 025 p = 0 0 . 010 0 . 021 0 . 012 p = 1 0 . 010 0 . 012 0 . 005 p = 2 8

  9. Analysis: basic setting Goal: Say something sensible about � ( I − QQ T ) A � . Expected value, tail bounds, ... Multivariate normal distribution X ∼ N ( µ, Σ) with mean µ ∈ R n and (positive definite) covariance matrix Σ ∈ R n × n has density � � 1 − 1 2 ( x − µ ) T Σ − 1 ( x − µ ) � f X ( x ) = exp ( 2 π ) n det (Σ) X ∼ N ( 0 , I m ) is called a Gaussian random vector. Lemma Let V ∈ R m × n be orthonormal. If X ∼ N ( 0 , I m ) then V T X ∼ N ( 0 , I n ) . This invariance of Gaussian random vectors allows us to assume w.l.o.g. in the analysis that A = Σ diagonal (and square). 9

  10. Analysis: r = 1 , p = 0 Partition � ω 1 � � σ 1 � 0 Ω = , A = . ω 2 Σ 2 0 Then � � � � σ 1 ω 1 1 Y = A Ω = = σ 1 ω 1 . 1 Σ 2 ω 2 σ 1 ω 1 Σ 2 ω 2 σ 1 ω 1 Σ 2 ω 2 but ω − 1 1 Problem: Need to control norm of (reciprocal of 1 standard normal random variable) is Cauchy distribution with undefined mean and variance. Need to consider p ≥ 2. 10

  11. Analysis: r = 1 , p ≥ 2 Lemma Y , Π Y be orthogonal projectors onto subspaces ˜ Let Π ˜ Y ⊂ Y . Then � Π ˜ Y A � F ≤ � Π Y A � F , � ( I − Π ˜ Y ) A � F ≥ � ( I − Π Y ) A � F . for any matrix A. Proof. EFY. � � � � σ 1 Ω 1 1 ˜ Y = A Ω = , Y = , σ − 1 1 Σ 2 Ω 2 Ω † Σ 2 Ω 2 1 where Ω † 1 is pseudoinverse of Ω 1 . Because Ω 1 is a row vector and nonzero (with probability one), we obtain Ω † 1 = Ω T 1 / � Ω 1 � 2 2 . Thus, range ( ˜ Y ) ⊂ range ( Y ) . 1 Σ 2 Ω 2 Ω † Thus, with f = σ − 1 1 , we get � 1 � 1 � ( I − QQ T ) A � F ≤ � ( I − ˜ Q ˜ ˜ Q T ) A � F , Q = � , f 1 + � f � 2 2 11

  12. Analysis: r = 1 , p ≥ 2 � ( I − ˜ Q ˜ F − � ˜ Q T ) A � 2 F = � A � 2 Q T A � 2 F σ 2 1 � ˜ Q T A � 2 ( σ 2 1 + � Σ 2 f � 2 1 = 2 ) ≥ F 1 + � f � 2 1 + � f � 2 2 2 σ 2 1 ( 1 − � f � 2 2 ) = σ 2 1 − � Σ 2 Ω 2 Ω † 1 � 2 ≥ 2 In summary, we have: Lemma � ( I − ˜ Q ˜ Q T ) A � 2 F ≤ � Σ 2 � 2 F + � Σ 2 Ω 2 Ω † 1 � 2 F . To analyze red term, we use � � �� E � Σ 2 Ω 2 Ω † � Σ 2 Ω 2 Ω † F · E � Ω † 1 � 2 1 � 2 = � Σ 2 � 2 1 � 2 F | Ω 1 F = E E F . (See exercises for proof that E � A Ω B � 2 F = � A � 2 F � B � 2 F for Gaussian matrix Ω and constant matrices A , B .) 12

  13. Analysis: r = 1 , p ≥ 2 F and � Ω † It remains to control � Ω 1 � 2 1 � 2 F = 1 / � Ω 1 � 2 F . � Ω 1 � 2 F is a sum of p + 1 squared independent standard normal random variables. 2 π e − x 2 / 2 . Pdf for Y = X 2 zero 1 Pdf for X ∼ N ( 0 , 1 ) given by f X ( x ) = √ for nonpositive values. For y > 0, we obtain Pr ( −√ y ≤ X ≤ √ y ) Pr ( 0 ≤ Y ≤ y ) = � √ y 2 e − x 2 / 2 d x = √ 2 π 0 � y 1 e − t / 2 d t , = √ 2 π 0 Y is called chi-squared distribution (1 degree of freedom): Y ∼ χ 2 1 . Ω 1 ∼ χ 2 p + 1 chi-squared distribution with p + 1 d.o.f.; pdf 2 − ( p + 1 ) / 2 Γ(( p + 1 ) / 2 ) x ( p + 1 ) / 2 − 1 exp ( − x / 2 )) , f Ω 1 ( x ) = x > 0 . 13

  14. Analysis: r = 1 , p ≥ 2 � p � − 1 � 1 � Ω † 1 � 2 Ω 2 ∼ Inv − χ 2 ( p + 1 ) , F = = 1 , i � Ω 1 � 2 F i = 1 the inverse-chi-squared distribution with p + 1 degrees of freedom. Pdf given by 2 − ( p + 1 ) / 2 Γ(( p + 1 ) / 2 ) x − ( p + 1 ) / 2 − 1 exp ( − 1 / ( 2 x )) . 10 8 6 4 2 0 0 0.2 0.4 0.6 0.8 1 pdf for p = 1, p = 3, p = 9 14

  15. Analysis: r = 1 , p ≥ 2 Textbook results: E � Ω † ◮ E � Ω 1 � 2 1 � 2 F = ( p − 1 ) − 1 F = p + 1, Tail bound by [Laurent/Massart’2000]: � � � � t 2 � Ω 1 � 2 ◮ P F ≤ p + 1 − t ≤ exp − 4 ( p + 1 ) Theorem For r = 1 , p ≥ 2 , we have � 1 E � ( I − QQ T ) A � F ≤ 1 + p − 1 � Σ 2 � F . Probability of deviating from this upper bound decays exponentially, as indicated by tail bound for χ 2 p + 1 . 15

  16. Analysis: general r , p ≥ 2 Lemma F + � Σ 2 Ω 2 Ω † � ( I − QQ T ) A � 2 F ≤ � Σ 2 � 2 1 � 2 F . Proof. � Σ 1 Ω 1 � � I r � ˜ F = Σ 2 Ω 2 Ω † 1 Σ − 1 Y = A Ω = , Y = , 1 . Σ 2 Ω 2 F ONB � I r � ˜ ( I r + F T F ) − 1 / 2 . Q = F F ≤ � ( I − ˜ Q ˜ F − � ˜ � ( I − QQ T ) A � 2 Q T ) A � 2 F = � A � 2 Q T A � 2 F . � � � � � ˜ � 2 � 2 � ( I r + F T F ) − 1 / 2 Σ 1 � ( I r + F T F ) − 1 / 2 F T Σ 2 Q T A � 2 = F + F F � � � 2 � ( I r + F T F ) − 1 / 2 Σ 1 F = trace (Σ 1 ( I r + F T F ) − 1 Σ 1 ) ≥ trace (Σ 1 ( I r − F T F )Σ 1 ) = � Σ 1 � 2 F − � Σ 2 Ω 2 Ω † 1 � 2 ≥ F (Proof of trace (Σ 1 ( I r + F T F ) − 1 Σ 1 ) ≥ trace (Σ 1 ( I r − F T F )Σ 1 ) on next slide.) 16

  17. Let G = F T F . Then we have � � ( I + G ) − 1 − ( I − G ) = G 2 ( I + G ) and hence ( I + G ) − 1 − ( I − G ) = ( I + G ) − 1 G 2 = G ( I + G ) − 1 G is symmetric positive semidefinite. This implies that � � ( I + G ) − 1 − ( I − G ) Σ 1 Σ 1 is symmetric positive semidefinite as well and hence � � � � ( I + G ) − 1 − ( I − G ) 0 ≤ trace Σ 1 Σ 1 or, equivalently, trace (Σ 1 ( I r + F T F ) − 1 Σ 1 ) ≥ trace (Σ 1 ( I r − F T F )Σ 1 ) . 17

  18. Analysis: general r , p ≥ 2 Again use E � Σ 2 Ω 2 Ω † 1 � 2 F = � Σ 2 � 2 F · E � Ω † 1 � 2 F . By standard results in multivariate statistics, we have r E � Ω † 1 � 2 F = p − 1 . Sketch of argument: ◮ Ω 1 Ω T 1 ∼ W r ( r + p ) (Wishart distribution with r + p degrees of freedom) 1 ) − 1 ∼ W − 1 ◮ (Ω 1 Ω T ( r + p ) (inverse Wishart distribution with r + p r degrees of freedom) 1 ) − 1 = 1 ◮ E (Ω 1 Ω T p − 1 I r ; see Page 96 in [Muirhead’1982] 1 � ◮ Result follows from � Ω † 1 � 2 F = � Ω T 1 (Ω 1 Ω T 1 ) − 1 � 2 (Ω 1 Ω T 1 ) − 1 F = trace 1 R. J. Muirhead, Aspects of Multivariate Statistical Theory, Wiley, New York, NY, 1982. 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend