column subset selection
play

Column Subset Selection Joel A. Tropp Applied & Computational - PowerPoint PPT Presentation

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1 Column Subset


  1. Column Subset Selection ❦ Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1

  2. Column Subset Selection   A =   τ = { }   A τ =   Column Subset Selection , MMDS, Stanford, June 2008 2

  3. Spectral Norm Reduction Theorem 1. [Kashin–Tzafriri] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which n | τ | ≥ � A τ � ≤ C . and � A � 2 Examples: ❧ A has identical columns. Then | τ | ≥ 1 . ❧ A has orthonormal columns. Then | τ | ≥ n . Column Subset Selection , MMDS, Stanford, June 2008 3

  4. Spectral Norm Reduction Theorem 1. [Kashin–Tzafriri] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which n | τ | ≥ � A τ � ≤ C . and � A � 2 Theorem 2. [T 2007] There is a randomized, polynomial-time algorithm that produces the set τ . Overview: ❧ Randomly select columns ❧ Remove redundant columns Column Subset Selection , MMDS, Stanford, June 2008 4

  5. Random Column Selection: Intuitions ❧ Random column selection reduces norms ❧ A random submatrix gets “its share” of the total norm ❧ Submatrices with small norm are ubiquitous ❧ Random selection is a form of regularization ❧ Added benefit: Dimension reduction Column Subset Selection , MMDS, Stanford, June 2008 5

  6. Example: What Can Go Wrong   1 1 1 1 1 1 A =  1 1  1 1 1 1   1 1 1 1 A τ =   1 1 √ � A � = � A τ � = 2 = ⇒ No reduction! Column Subset Selection , MMDS, Stanford, June 2008 6

  7. The ( ∞ , 2) Operator Norm Definition 3. The ( ∞ , 2) operator norm of a matrix B is � B � ∞ , 2 = max {� Bx � 2 : � x � ∞ = 1 } . B Proposition 4. If B has s columns, then the best general bound is � B � ∞ , 2 ≤ √ s � B � . Column Subset Selection , MMDS, Stanford, June 2008 7

  8. Random Reduction of ( ∞ , 2) Norm Lemma 5. Suppose the n columns of A have unit ℓ 2 norm. Draw a uniformly random subset σ of columns whose cardinality 2 n | σ | = � A � 2 . Then � E � A σ � ∞ , 2 ≤ C | σ | . ❧ Problem: How can we use this information? Column Subset Selection , MMDS, Stanford, June 2008 8

  9. Pietsch Factorization Theorem 6. [Pietsch, Grothendieck] Every matrix B can be factorized as B = T D where ❧ D is diagonal and nonnegative with trace( D 2 ) = 1 , and � ❧ � B � ∞ , 2 ≤ � T � ≤ π/ 2 � B � ∞ , 2 D T Column Subset Selection , MMDS, Stanford, June 2008 9

  10. Pietsch and Norm Reduction Lemma 7. Suppose B has s columns. There is a set τ of column indices for which � B τ � ≤ √ π · 1 | τ | ≥ s √ s � B � ∞ , 2 . and 2 Proof. Consider a Pietsch factorization B = T D . Select j : d 2 � � τ = jj ≤ 2 /s . Since � d 2 jj = 1 , Markov’s inequality implies | τ | ≥ s/ 2 . Calculate � � � B τ � = � T D τ � ≤ � T � · � D τ � ≤ π/ 2 � B � ∞ , 2 · 2 /s. Column Subset Selection , MMDS, Stanford, June 2008 10

  11. Proof of Kashin–Tzafriri ❧ Suppose the n columns of A have unit ℓ 2 norm ❧ Lemma 5 provides (random) σ for which 2 n � | σ | = and � A σ � ∞ , 2 ≤ C | σ | � A � 2 ❧ Lemma 7 applied to B = A σ yields a subset τ ⊂ σ for which � B τ � ≤ √ π · | τ | ≥ | σ | 1 and · � B � ∞ , 2 2 � | σ | ❧ Simplify � A τ � ≤ C √ π n | τ | ≥ and � A � 2 ❧ Note: This is almost an algorithm Column Subset Selection , MMDS, Stanford, June 2008 11

  12. Pietsch and Eigenvalues ❧ Consider a matrix B with Pietsch factorization B = T D ❧ Suppose � T � ≤ α ❧ Calculate � Bx � 2 2 = � T Dx � 2 B = T D = ⇒ ∀ x 2 2 ≤ α 2 � Dx � 2 � Bx � 2 = ⇒ ∀ x 2 x ∗ ( B ∗ B ) x ≤ α 2 · x ∗ D 2 x = ⇒ ∀ x B ∗ B − α 2 D 2 � x ∗ � = ⇒ x ≤ 0 ∀ x λ max ( B ∗ B − α 2 D 2 ) ≤ 0 = ⇒ Column Subset Selection , MMDS, Stanford, June 2008 12

  13. Pietsch is Convex ❧ Key new idea: Can find Pietsch factorizations by convex programming min λ max ( B ∗ B − α 2 F ) subject to F diagonal , F ≥ 0 , trace( F ) = 1 ❧ If value at F ⋆ is nonpositive, then we have a factorization � ≤ α B = ( BF − 1 / 2 ) · F 1 / 2 � BF − 1 / 2 � � with ⋆ ⋆ ⋆ ❧ Proof of Kashin–Tzafriri offers target value for α ❧ Can also perform binary search to approximate minimal value of α Column Subset Selection , MMDS, Stanford, June 2008 13

  14. An Optimization over the Simplex ❧ Express F = diag( f ) ❧ Constraints delineate the probability simplex: ∆ = { f : trace( f ) = 1 and f ≥ 0 } ❧ Objective function and its subdifferential: J ( f ) = λ max ( B ∗ B − α 2 diag( f )) − α 2 | u | 2 : u top evec. B ∗ B − α 2 diag( f ) , � u � 2 = 1 � � ∂J ( f ) = conv ❧ Obtain min J ( f ) subject to f ∈ ∆ Column Subset Selection , MMDS, Stanford, June 2008 14

  15. Entropic Mirror Descent 1. Intialize f (1) ← s − 1 e and k ← 1 2. Compute a subgradient: θ ∈ ∂J ( f ( k ) ) 3. Determine step size: � 2 log s β k ← k � θ � 2 ∞ 4. Update variable: f ( k ) ◦ exp {− β k θ } f ( k +1) ← trace( f ( k ) ◦ exp {− β k θ } ) 5. Increment k ← k + 1 , and return to 2. References: [Eggermont 1991, Beck–Teboulle 2003] Column Subset Selection , MMDS, Stanford, June 2008 15

  16. Other Formulations ❧ Modified primal to simultaneously identify α min λ max ( B ∗ B − α 2 F ) + α 2 subject to F diagonal , F ≥ 0 , trace( F ) = 1 , α ≥ 0 ❧ Dual problem is the famous maxcut SDP: max � B ∗ B , Z � subject to diag( Z ) = e , Z � 0 Column Subset Selection , MMDS, Stanford, June 2008 16

  17. Related Results Theorem 8. [Bourgain–Tzafriri 1991] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which √ c n | τ | ≥ κ ( A τ ) ≤ 3 . and � A � 2 Examples: ❧ A has identical columns. Then | τ | ≥ 1 . ❧ A has orthonormal columns. Then | τ | ≥ c n . Column Subset Selection , MMDS, Stanford, June 2008 17

  18. Related Results Theorem 8. [Bourgain–Tzafriri 1991] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which √ c n | τ | ≥ κ ( A τ ) ≤ 3 . and � A � 2 Theorem 9. [T 2007] There is a randomized, polynomial-time algorithm that produces the set τ . Column Subset Selection , MMDS, Stanford, June 2008 18

  19. To learn more... E-mail: ❧ jtropp@acm.caltech.edu Web: http://www.acm.caltech.edu/~jtropp Papers in Preparation: ❧ T, “Column subset selection, matrix factorization, and eigenvalue optimization” ❧ T, “Paved with good intentions: Computational applications of matrix column partitions” ❧ . . . Column Subset Selection , MMDS, Stanford, June 2008 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend