Column Subset Selection Joel A. Tropp Applied & Computational - PowerPoint PPT Presentation

Column Subset Selection ❦ Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1

Column Subset Selection   A =   τ = { }   A τ =   Column Subset Selection , MMDS, Stanford, June 2008 2

Spectral Norm Reduction Theorem 1. [Kashin–Tzafriri] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which n | τ | ≥ � A τ � ≤ C . and � A � 2 Examples: ❧ A has identical columns. Then | τ | ≥ 1 . ❧ A has orthonormal columns. Then | τ | ≥ n . Column Subset Selection , MMDS, Stanford, June 2008 3

Spectral Norm Reduction Theorem 1. [Kashin–Tzafriri] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which n | τ | ≥ � A τ � ≤ C . and � A � 2 Theorem 2. [T 2007] There is a randomized, polynomial-time algorithm that produces the set τ . Overview: ❧ Randomly select columns ❧ Remove redundant columns Column Subset Selection , MMDS, Stanford, June 2008 4

Random Column Selection: Intuitions ❧ Random column selection reduces norms ❧ A random submatrix gets “its share” of the total norm ❧ Submatrices with small norm are ubiquitous ❧ Random selection is a form of regularization ❧ Added benefit: Dimension reduction Column Subset Selection , MMDS, Stanford, June 2008 5

Example: What Can Go Wrong   1 1 1 1 1 1 A =  1 1  1 1 1 1   1 1 1 1 A τ =   1 1 √ � A � = � A τ � = 2 = ⇒ No reduction! Column Subset Selection , MMDS, Stanford, June 2008 6

The ( ∞ , 2) Operator Norm Definition 3. The ( ∞ , 2) operator norm of a matrix B is � B � ∞ , 2 = max {� Bx � 2 : � x � ∞ = 1 } . B Proposition 4. If B has s columns, then the best general bound is � B � ∞ , 2 ≤ √ s � B � . Column Subset Selection , MMDS, Stanford, June 2008 7

Random Reduction of ( ∞ , 2) Norm Lemma 5. Suppose the n columns of A have unit ℓ 2 norm. Draw a uniformly random subset σ of columns whose cardinality 2 n | σ | = � A � 2 . Then � E � A σ � ∞ , 2 ≤ C | σ | . ❧ Problem: How can we use this information? Column Subset Selection , MMDS, Stanford, June 2008 8

Pietsch Factorization Theorem 6. [Pietsch, Grothendieck] Every matrix B can be factorized as B = T D where ❧ D is diagonal and nonnegative with trace( D 2 ) = 1 , and � ❧ � B � ∞ , 2 ≤ � T � ≤ π/ 2 � B � ∞ , 2 D T Column Subset Selection , MMDS, Stanford, June 2008 9

Pietsch and Norm Reduction Lemma 7. Suppose B has s columns. There is a set τ of column indices for which � B τ � ≤ √ π · 1 | τ | ≥ s √ s � B � ∞ , 2 . and 2 Proof. Consider a Pietsch factorization B = T D . Select j : d 2 � � τ = jj ≤ 2 /s . Since � d 2 jj = 1 , Markov’s inequality implies | τ | ≥ s/ 2 . Calculate � � � B τ � = � T D τ � ≤ � T � · � D τ � ≤ π/ 2 � B � ∞ , 2 · 2 /s. Column Subset Selection , MMDS, Stanford, June 2008 10

Proof of Kashin–Tzafriri ❧ Suppose the n columns of A have unit ℓ 2 norm ❧ Lemma 5 provides (random) σ for which 2 n � | σ | = and � A σ � ∞ , 2 ≤ C | σ | � A � 2 ❧ Lemma 7 applied to B = A σ yields a subset τ ⊂ σ for which � B τ � ≤ √ π · | τ | ≥ | σ | 1 and · � B � ∞ , 2 2 � | σ | ❧ Simplify � A τ � ≤ C √ π n | τ | ≥ and � A � 2 ❧ Note: This is almost an algorithm Column Subset Selection , MMDS, Stanford, June 2008 11

Pietsch and Eigenvalues ❧ Consider a matrix B with Pietsch factorization B = T D ❧ Suppose � T � ≤ α ❧ Calculate � Bx � 2 2 = � T Dx � 2 B = T D = ⇒ ∀ x 2 2 ≤ α 2 � Dx � 2 � Bx � 2 = ⇒ ∀ x 2 x ∗ ( B ∗ B ) x ≤ α 2 · x ∗ D 2 x = ⇒ ∀ x B ∗ B − α 2 D 2 � x ∗ � = ⇒ x ≤ 0 ∀ x λ max ( B ∗ B − α 2 D 2 ) ≤ 0 = ⇒ Column Subset Selection , MMDS, Stanford, June 2008 12

Pietsch is Convex ❧ Key new idea: Can find Pietsch factorizations by convex programming min λ max ( B ∗ B − α 2 F ) subject to F diagonal , F ≥ 0 , trace( F ) = 1 ❧ If value at F ⋆ is nonpositive, then we have a factorization � ≤ α B = ( BF − 1 / 2 ) · F 1 / 2 � BF − 1 / 2 � � with ⋆ ⋆ ⋆ ❧ Proof of Kashin–Tzafriri offers target value for α ❧ Can also perform binary search to approximate minimal value of α Column Subset Selection , MMDS, Stanford, June 2008 13

An Optimization over the Simplex ❧ Express F = diag( f ) ❧ Constraints delineate the probability simplex: ∆ = { f : trace( f ) = 1 and f ≥ 0 } ❧ Objective function and its subdifferential: J ( f ) = λ max ( B ∗ B − α 2 diag( f )) − α 2 | u | 2 : u top evec. B ∗ B − α 2 diag( f ) , � u � 2 = 1 � � ∂J ( f ) = conv ❧ Obtain min J ( f ) subject to f ∈ ∆ Column Subset Selection , MMDS, Stanford, June 2008 14

Entropic Mirror Descent 1. Intialize f (1) ← s − 1 e and k ← 1 2. Compute a subgradient: θ ∈ ∂J ( f ( k ) ) 3. Determine step size: � 2 log s β k ← k � θ � 2 ∞ 4. Update variable: f ( k ) ◦ exp {− β k θ } f ( k +1) ← trace( f ( k ) ◦ exp {− β k θ } ) 5. Increment k ← k + 1 , and return to 2. References: [Eggermont 1991, Beck–Teboulle 2003] Column Subset Selection , MMDS, Stanford, June 2008 15

Other Formulations ❧ Modified primal to simultaneously identify α min λ max ( B ∗ B − α 2 F ) + α 2 subject to F diagonal , F ≥ 0 , trace( F ) = 1 , α ≥ 0 ❧ Dual problem is the famous maxcut SDP: max � B ∗ B , Z � subject to diag( Z ) = e , Z � 0 Column Subset Selection , MMDS, Stanford, June 2008 16

Related Results Theorem 8. [Bourgain–Tzafriri 1991] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which √ c n | τ | ≥ κ ( A τ ) ≤ 3 . and � A � 2 Examples: ❧ A has identical columns. Then | τ | ≥ 1 . ❧ A has orthonormal columns. Then | τ | ≥ c n . Column Subset Selection , MMDS, Stanford, June 2008 17

Related Results Theorem 8. [Bourgain–Tzafriri 1991] Suppose the n columns of A have unit ℓ 2 norm. There is a set τ of column indices for which √ c n | τ | ≥ κ ( A τ ) ≤ 3 . and � A � 2 Theorem 9. [T 2007] There is a randomized, polynomial-time algorithm that produces the set τ . Column Subset Selection , MMDS, Stanford, June 2008 18

To learn more... E-mail: ❧ jtropp@acm.caltech.edu Web: http://www.acm.caltech.edu/~jtropp Papers in Preparation: ❧ T, “Column subset selection, matrix factorization, and eigenvalue optimization” ❧ T, “Paved with good intentions: Computational applications of matrix column partitions” ❧ . . . Column Subset Selection , MMDS, Stanford, June 2008 19

Column Subset Selection Joel A. Tropp Applied & Computational - PowerPoint PPT Presentation

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1 Column Subset

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining Wang , Aarti Singh Machine

Theorem 7.56 SUBSET-SUM is NP Complete ANSHUMAN MOHANTY SUBSET-SUM Problem Consider a set of

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

W4231: Analysis of Algorithms Subset Sum The Subset Sum problem is defined as follows: 11/30/99

More Recursion Summary Topics: more recursion Subset sum: finding if a subset of an

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Just Relax Convex Programming Methods for Subset Selection and Sparse Approximation Joel A.

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Finite subset spaces of the circle Christopher Tuffley Institute of Fundamental Sciences Massey

Backtracking And Branch And Bound Subset & Permutation Problems Subset problem of size n.

Optimization of quadratic forms and t -norm forms on interval domains and computational complexity

Linear switched DAEs: Lyapunov exponents, a converse Lyapunov theorem, and Barabanov norms Stephan

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC

On the Query Complexity of Real Functionals Hugo Fre, Walid Gomaa, Mathieu Hoyrup Hugo

Norm Monitoring with Asymmetric Information Jiaqi Li (University of Nottingham) Felipe Meneguzzi

l p -Norm Constrained Quadratic Programming: Conic Approximation Methods Wenxun Xing Department

A Context-based Institutional Normative Environment Henrique Lopes Cardoso, Eugnio Oliveira

Column Subset Selection Joel A. Tropp Applied & Computational - PowerPoint PPT Presentation

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1 Column Subset

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining Wang , Aarti Singh Machine

Theorem 7.56 SUBSET-SUM is NP Complete ANSHUMAN MOHANTY SUBSET-SUM Problem Consider a set of

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

W4231: Analysis of Algorithms Subset Sum The Subset Sum problem is defined as follows: 11/30/99

More Recursion Summary Topics: more recursion Subset sum: finding if a subset of an

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Just Relax Convex Programming Methods for Subset Selection and Sparse Approximation Joel A.

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Finite subset spaces of the circle Christopher Tuffley Institute of Fundamental Sciences Massey

Backtracking And Branch And Bound Subset &amp; Permutation Problems Subset problem of size n.

Optimization of quadratic forms and t -norm forms on interval domains and computational complexity

Linear switched DAEs: Lyapunov exponents, a converse Lyapunov theorem, and Barabanov norms Stephan

Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer*, Ron

Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC

On the Query Complexity of Real Functionals Hugo Fre, Walid Gomaa, Mathieu Hoyrup Hugo

Norm Monitoring with Asymmetric Information Jiaqi Li (University of Nottingham) Felipe Meneguzzi

l p -Norm Constrained Quadratic Programming: Conic Approximation Methods Wenxun Xing Department

A Context-based Institutional Normative Environment Henrique Lopes Cardoso, Eugnio Oliveira

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Backtracking And Branch And Bound Subset & Permutation Problems Subset problem of size n.