using friendly tail bounds for sums of random matrices
play

Using Friendly Tail Bounds for Sums of Random Matrices Joel A. - PowerPoint PPT Presentation

Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA, ONR, and AFOSR 1 . Matrix .


  1. Using Friendly Tail Bounds for Sums of Random Matrices ❦ Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA, ONR, and AFOSR 1

  2. . Matrix . Rademacher Series Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 2

  3. The Norm of a Matrix Rademacher Series Theorem 1. [Oliveira 2010, T 2010] Suppose ❧ B 1 , B 2 , . . . are fixed matrices with dimensions d 1 × d 2 , and ❧ ε 1 , ε 2 , . . . are independent Rademacher RVs. Define d := d 1 + d 2 , and introduce the matrix variance �� � � � σ 2 := max � � � j B j B ∗ � , j B ∗ j B j � � � � j � � � Then � � 2 σ 2 log d � � j ε j B j � ≤ E � � � �� � � ≤ d · e − t 2 / 2 σ 2 � j ε j B j � ≥ t P � � � Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 3

  4. Example: Modulation by Random Signs Fixed matrix, in captivity:   c 11 c 12 c 13 . . . c 21 c 22 c 23 . . .   C =   c 31 c 32 c 33 . . .   . . . ... . . . . . . d 1 × d 2 Random matrix, formed by randomly flipping the signs of the entries:   ε 11 c 11 ε 12 c 12 ε 13 c 13 . . . ε 21 c 21 ε 22 c 22 ε 23 c 23 . . .   Z =   ε 31 c 31 ε 32 c 32 ε 33 c 33 . . .   . . . ... . . . . . . d 1 × d 2 The family { ε jk } consists of independent Rademacher random variables [Q] What is the typical value of � Z � ? Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 4

  5. The Random Matrix, qua Rademacher Series Rewrite the random matrix:   ε 11 c 11 ε 12 c 12 ε 13 c 13 . . . ε 21 c 21 ε 22 c 22 ε 23 c 23 . . . �   Z = = jk ε jk c jk E jk   ε 31 c 31 ε 32 c 32 ε 33 c 33 . . .   . . . ... . . . . . . d 1 × d 2 The symbol E jk denotes the d 1 × d 2 matrix unit     E jk =  ← j   1  ↑ k Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 5

  6. Computing the Matrix Variance The first term in the matrix variance σ 2 satisfies jk | c jk | 2 E jk E kj � jk ( c jk E jk )( c jk E jk ) ∗ � � � � � � = � � � � � � � � � �� k | c jk | 2 � � = E jj � � � � j k | c 1 k | 2 � �   � � � � k | c 2 k | 2 � = � � �   ... � � � � � k | c jk | 2 = max j The same argument applies to the second term. Thus, σ 2 = max � j | c jk | 2 � � k | c jk | 2 , max k � max j Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 6

  7. The Norm of a Randomly Modulated Matrix Suppose Z = � Theorem 2. [T 2010] jk ε jk c jk E jk , where ❧ C is a fixed d 1 × d 2 matrix, and ❧ { ε jk } is an independent family of Rademacher RVs. Define d := d 1 + d 2 , and compute the matrix variance σ 2 = max � j | c jk | 2 � � k | c jk | 2 , max k � max j Then 2 σ 2 log d � E � Z � ≤ P {� Z � ≥ t } ≤ d · e − t 2 / 2 σ 2 This result also holds when { ε jk } is an iid family of standard normal RVs. Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 7

  8. Comparison with the Literature For the random matrix Z = [ ε jk c jk ] ... [T 2010] , obtained via matrix Rademacher bound: � E � Z � ≤ 2 log d · σ [Seginer 2000] , obtained with path-counting arguments: � 4 E � Z � ≤ const · log d · σ [Lata� la 2005] , obtained with chaining arguments: � �� � jk | c jk | 4 E � Z � ≤ const · σ + 4 Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 8

  9. . Matrix . Chernoff Inequality Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 9

  10. The Matrix Chernoff Bound Suppose Y = � Theorem 3. [T 2010] j X j , where ❧ X 1 , X 2 , . . . are random psd matrices with dimension d , and ❧ λ max ( X j ) ≤ R almost surely. Define µ min := λ min ( E Y ) and µ max := λ max ( E Y ) . Then E λ min ( Y ) ≥ 0 . 6 µ min − R log d E λ max ( Y ) ≤ 1 . 8 µ max + R log d � µ min /R e − t � P { λ min ( Y ) ≤ (1 − t ) · µ min } ≤ d · (1 − t ) 1 − t � µ max /R e t � P { λ max ( Y ) ≥ (1 + t ) · µ max } ≤ d · (1 + t ) 1+ t Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 10

  11. Example: Random Submatrices Fixed matrix, in captivity:   | | | | | C = . . . c 1 c 2 c 3 c 4 c n   | | | | | d × n Random matrix, formed by picking random columns:   | | | Z = . . . c 2 c 3 c n   | | | d × n ↑ ↑ ↑ [Q] What is the typical value of σ 1 ( Z ) ? What about σ d ( Z ) ? Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 11

  12. Model for Random Submatrix ❧ Let C be a fixed d × n matrix with columns c 1 , . . . , c n ❧ Let δ 1 , . . . , δ n be independent 0–1 random variables with mean s/n ❧ Define ∆ = diag( δ 1 , . . . , δ n ) ❧ Form a random submatrix Z by turning off columns from C   δ 1   | | | δ 2   . . . Z = C ∆ = c 1 c 2 c n ...       | | | d × n δ n n × n ❧ Note that Z typically consists of about s columns from C Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 12

  13. The Random Submatrix, qua PSD Sum ❧ The largest and smallest singular values of Z satisfy σ 1 ( Z ) 2 = λ max ( ZZ ∗ ) σ d ( Z ) 2 = λ min ( ZZ ∗ ) ❧ Define the psd matrix Y = ZZ ∗ , and observe that � n Y = ZZ ∗ = C ∆ 2 C ∗ = C ∆ C ∗ = k =1 δ k c k c ∗ k ❧ We have expressed Y as a sum of independent psd random matrices Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 13

  14. Preparing to Apply the Chernoff Bound ❧ Consider the random matrix � k δ k c k c ∗ Y = k ❧ The maximal eigenvalue of each summand is bounded as k ) ≤ max k � c k � 2 R = max k λ max ( δ k c k c ∗ ❧ The expectation of the random matrix Y is E ( Y ) = s k = s � n k =1 c k c ∗ n CC ∗ n ❧ The mean parameters satisfy µ max = λ max ( E Y ) = s µ min = λ min ( E Y ) = s n σ 1 ( C ) 2 n σ d ( C ) 2 and Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 14

  15. What the Chernoff Bound Says Applying the Chernoff bound, we reach = E λ max ( Y ) ≤ 1 . 8 · s n σ 1 ( C ) 2 + max k � c k � 2 � σ 1 ( Z ) 2 � 2 · log d E = E λ min ( Y ) ≥ 0 . 6 · s n σ d ( C ) 2 − max k � c k � 2 σ d ( Z ) 2 � � 2 · log d E ❧ Matrix C has n columns; the random submatrix Z includes about s ❧ The singular value σ i ( Z ) 2 inherits an s/n share of σ i ( C ) 2 for i = 1 , d ❧ Additive correction reflects number d of rows of C , max column norm ❧ [Gittens, T 2011] The remaining singular values have similar behavior Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 15

  16. Key Example: Unit-Norm Tight Frame ❧ A d × n unit-norm tight frame C satisfies CC ∗ = n � c k � 2 and 2 = 1 for k = 1 , 2 , . . . , n d I ❧ Specializing the inequalities from the previous slide... ≤ 1 . 8 · s σ 1 ( Z ) 2 � � d + log d E ≥ 0 . 6 · s σ d ( Z ) 2 � � d − log d E ❧ Choose s ≥ 1 . 67 d log d columns for a nontrivial lower bound ❧ Sharp condition s > d log d also follows from matrix Chernoff bound ❧ Earlier work: [Rudelson 1999, Rudelson–Vershynin 2007, T 2008] Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 16

  17. . Matrix . Bernstein Inequality Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 17

  18. The Matrix Bernstein Inequality Suppose Z = � Theorem 4. [Oliveira 2010, T 2010] j W j , where ❧ W 1 , W 2 , . . . are independent random matrices with dimension d 1 × d 2 , ❧ E W j = 0 , and ❧ � W j � ≤ R almost surely. Define d := d 1 + d 2 , and introduce the matrix variance σ 2 := max �� � � � � � � j E ( W j W ∗ j E ( W ∗ j ) � , j W j ) � � � � � � � Then 2 σ 2 log d + 1 � E � Z � ≤ 3 R log d − t 2 / 2 � � P {� Z � ≥ t } ≤ d · exp σ 2 + Rt/ 3 Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 18

  19. Example: Randomized Matrix Multiplication Product of two matrices, in captivity:   — — c ∗ 1 — c ∗ —     2 | | | | |   — c ∗ — BC ∗ =   3 . . . b 1 b 2 b 3 b 4 b n     — — c ∗   4 | | | | | . .   . d 1 × n   — c ∗ — n n × d 2 [Idea] Approximate multiplication by random sampling First reference (?): [Drineas–Mahoney–Kannan 2004] Some recent work: [Magen–Zousias 2010], [Magdon-Ismail 2010], [Hsu–Kakade–Zhang 2011] Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 19

  20. A Sampling Model for Tutorial Purposes ❧ Assume � b k � 2 = 1 and � c k � 2 = 1 for k = 1 , 2 , . . . , n ❧ Construct a random variable W whose value is a d 1 × d 2 matrix ❧ Draw K ∼ uniform { 1 , 2 , . . . , n } ❧ Set W = n · b K c ∗ K ❧ The random matrix W is an unbiased estimator of the product BC ∗ � n � n k =1 ( n · b k c ∗ k =1 b k c ∗ k = BC ∗ E W = k ) · P { K = k } = ❧ Approximate BC ∗ by averaging s independent copies of W Z = 1 � s j =1 W j ≈ BC ∗ s Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend