low rank approximation lecture 5
play

Low Rank Approximation Lecture 5 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Randomized column/row sampling Aim: Obtain rank- r approximation from randomly selected rows and


  1. Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Randomized column/row sampling Aim: Obtain rank- r approximation from randomly selected rows and columns of A . Popular sampling strategies: ◮ Uniform sampling. ◮ Sampling based on row/column norms. ◮ Sampling based on more complicated quantities. 2

  3. Preliminaries on randomized sampling Exponential function example from Lecture 4 (Slide 14). Comparison between best approximation, greedy approximation, approximation obtained by randomly selecting rows. 10 0 10 0 10 -5 10 -5 10 -10 10 -10 0 2 4 6 8 10 0 2 4 6 8 10 10 0 10 0 10 -5 10 -5 10 -10 10 -10 0 2 4 6 8 10 0 2 4 6 8 10 3

  4. Preliminaries on randomized sampling A simple way to fool uniformly random row selection: � 0 ( n − r ) × r � U = I r for n very large and r ≪ n . 4

  5. Column sampling Basic algorithm aiming at rank- r approximation: 1. Sample (and possibly rescale) k > r columns of A � m × k matrix C . 2. Compute SVD C = U Σ V T and set Q = U r ∈ R m × r . 3. Return low-rank approximation QQ T A . ◮ Can be combined with streaming algorithm [Liberty’2007] to limit memory/cost of working with C . ◮ Quality of approximation crucially depends on sampling strategy. 5

  6. Column sampling Lemma For any matrix C ∈ R m × r , let Q be the matrix computed above. Then 2 ≤ σ r + 1 ( A ) 2 + 2 � AA T − CC T � 2 . � A − QQ T A � 2 Proof. We have ( A − QQ T A )( A − QQ T A ) T ( I − QQ T ) CC T ( I − QQ T ) + ( I − QQ T )( AA T − CC T )( I − QQ T ) = Hence, � A − QQ T A � 2 � ( A − QQ T A )( A − QQ T A ) T � = λ max 2 + � AA T − CC T � 2 ( I − QQ T ) CC T ( I − QQ T ) � � ≤ λ max σ r + 1 ( C ) 2 + � AA T − CC T � 2 . = The proof is completed by applying Weyl’s inequality: σ r + 1 ( C ) 2 = λ r + 1 ( CC T ) ≤ λ r + 1 ( AA T ) + � AA T − CC T � 2 . 6

  7. Random column sampling Using the lemma, the goal now becomes to approximate the matrix product AA T using column samples of A . Notation: � a 1 � � c 1 � A = · · · a n , C = · · · c k General sampling method: Input: A ∈ R m × n , probabilities p 1 , . . . , p n � = 0, integer k . Output: C ∈ R m × k containing selected columns of A . 1: for t = 1 , . . . , k do Pick j t ∈ { 1 , . . . , n } with P [ j t = ℓ ] = p ℓ , ℓ = 1 , . . . , n , 2: independently and with replacement. � Set c t = a j t / kp j t . 3: 4: end for 7

  8. Random column sampling Lemma For the matrix C returned by algorithm, it holds that n a 2 i ℓ a 2 Var [( CC T ) ij ] = 1 − 1 � j ℓ E [ CC T ] = AA T , k ( AA T ) 2 ij . k p ℓ ℓ = 1 � c t c T 1 Proof. For fixed i , j , consider X t = t ) ij = kp jt a i , j t a j , j t , for which n 1 a i ,ℓ a j ,ℓ = 1 � k ( AA T ) ij . E [ X t ] = p ℓ kp ℓ ℓ = 1 Analogously, n a 2 i ℓ a 2 t ] − E [ X t ] 2 = 1 − 1 j ℓ Var ( X t ) = E [( X t − E [ X t ]) 2 ] = E [ X 2 � k 2 ( AA T ) 2 ij . k 2 p ℓ ℓ = 1 t X t ] = k · E [ X t ] = ( AA T ) ij , Because of independence, it follows E [ � and analogously for variance. 8

  9. Random column sampling As a consequence of the lemma, E [ � AA T − CC T � 2 E [( AA T − CC T ) 2 � F ] = ij ] ij � Var [( CC T ) ij ] = ij n a 2 i ℓ a 2 1 − 1 � � j ℓ � � k ( AA T ) 2 = ij k p ℓ ij ℓ = 1 � n � 1 1 � � a ℓ � 4 2 − � AA T � 2 = . F k p ℓ ℓ = 1 Lemma F minimizes E [ � AA T − CC T � 2 The choice p ℓ = � a ℓ � 2 2 / � A � 2 F ] and yields F ] = 1 E [ � AA T − CC T � 2 � � A � 4 F − � AA T � 2 � F k Proof. Established by showing that this choice of p ℓ satisfies first-order conditions of constrained optimization problem. 9

  10. Random column sampling Norm based sampling: Input: A ∈ R m × n , integer k . Output: C ∈ R m × k containing selected columns of A . 1: Set p ℓ = � a ℓ � 2 2 / � A � 2 F for ℓ = 1 , . . . , n . 2: for t = 1 , . . . , k do Pick j t ∈ { 1 , . . . , n } with P [ j t = ℓ ] = p ℓ , ℓ = 1 , . . . , n , 3: independently and with replacement. � Set c t = a j t / kp j t . 4: 5: end for 5: Compute SVD C = U Σ V T and set Q = U r ∈ R m × r . 5: Return low-rank approximation QQ T A . 10

  11. Random column sampling Lemma For the matrix C returned by algorithm, it holds with probability 1 − δ that η � AA T − CC T � F ≤ √ � A � F , k � where η = 1 + 8 · log( 1 /δ ) . Proof. Aim at applying Azuma-Hoeffding inequality. Define F ( i 1 , i 2 , . . . , i k ) = � AA T − CC T � F , � a i 1 · · · � with C = a i k . Quantify the effect of varying an index (w.l.o.g. the first one) on F : | F ( i 1 , i 2 , . . . , i k ) − F ( i ′ 1 , i 2 , . . . , i k ) | � � AA T − CC T � F − � AA T − C ′ C ′ T � F � � = � 1 1 � CC T − C ′ C ′ T � F ≤ � a i 1 � 2 1 � 2 ≤ 2 + � a i ′ 2 kp i 1 kp i ′ 1 2 k � A � 2 ≤ F := ∆ . 11

  12. Random column sampling This implies that Doob martingales g ℓ = E [ f ( i 1 , . . . , i k ) | i 1 , . . . , i ℓ ] for 1 ≤ ℓ ≤ k satisfy | g ℓ + 1 − g ℓ | ≤ ∆ . Note that g k = E [ � AA T − CC T � F ] . By lemma and Jensen’s inequality √ we know that g k ≤ � A � 2 F / k . Applying Azuma-Hoeffding inequality to g k yields √ � AA T − CC T � F ≥ � A � 2 ≤ exp( − γ 2 / 2 k ∆ 2 ) =: δ. � � P F / k + γ � Setting γ = 8 · log( 1 /δ ) completes the proof. 12

  13. Random column sampling Theorem (Drineas/Kannan/Mahoney’2006) For the matrix Q returned by the algorithm above it holds that � A − QQ T A � 2 ≤ σ 2 r + 1 ( A ) + ε � A � 2 F for k ≥ 4 /ε 2 . � � E 2 With probability at least 1 − δ , � A − QQ T A � 2 2 ≤ σ 2 r + 1 ( A ) + ε � A � 2 � 8 · log( 1 /δ )) 2 /ε 2 . F for k ≥ 4 ( 1 + Proof. Follows from combining very first lemma with last two lemmas. Remarks: ◮ Dependence of k on ε pretty bad. Unlikely to achieve something significantly better without assuming further properties of A (e.g., incoherence of singular vectors) with sampling based on row norms only. ◮ Simple “counter example”: � � 1 1 1 1 ∈ R n × ( n + 1 ) . A = √ n e 1 √ n e 1 · · · √ n e 1 √ n e 2 13

  14. Random column sampling [Drineas/Mahoney/Muthukrishnan’2007]: Let V k contain k dominant right singular vectors of A . Setting p ℓ = � V k ( ℓ, :) � 2 2 / k , ℓ = 1 , . . . , n and sampling O ( k 2 (log 1 /δ ) /ε 2 ) columns 1 yields � A − QQ T A � F ≤ ( 1 + ε ) � A − T k ( A ) � F with probability 1 − δ . Relative error bound! CUR decomposition can be obtained by applying ideas to rows and columns (yielding R and C , respectively) and choosing U appropriately. 1 There are variants that improve this to O ( k log k log( 1 /δ ) /ε 2 ) . 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend