Column Subset Selection Joel A. Tropp Applied & Computational - - PowerPoint PPT Presentation

column subset selection
SMART_READER_LITE
LIVE PREVIEW

Column Subset Selection Joel A. Tropp Applied & Computational - - PowerPoint PPT Presentation

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST) Research supported in part by NSF, DARPA, and ONR 1 Column Subset


slide-1
SLIDE 1

Column Subset Selection

Joel A. Tropp

Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Thanks to B. Recht (Caltech, IST)

Research supported in part by NSF, DARPA, and ONR 1

slide-2
SLIDE 2

Column Subset Selection

A =    

τ = { }

Aτ =    

Column Subset Selection, MMDS, Stanford, June 2008 2

slide-3
SLIDE 3

Spectral Norm Reduction

Theorem 1. [Kashin–Tzafriri] Suppose the n columns of A have unit ℓ2 norm. There is a set τ of column indices for which |τ| ≥ n A2 and Aτ ≤ C.

Examples:

❧ A has identical columns. Then |τ| ≥ 1. ❧ A has orthonormal columns. Then |τ| ≥ n.

Column Subset Selection, MMDS, Stanford, June 2008 3

slide-4
SLIDE 4

Spectral Norm Reduction

Theorem 1. [Kashin–Tzafriri] Suppose the n columns of A have unit ℓ2 norm. There is a set τ of column indices for which |τ| ≥ n A2 and Aτ ≤ C. Theorem 2. [T 2007] There is a randomized, polynomial-time algorithm that produces the set τ.

Overview:

❧ Randomly select columns ❧ Remove redundant columns

Column Subset Selection, MMDS, Stanford, June 2008 4

slide-5
SLIDE 5

Random Column Selection: Intuitions

❧ Random column selection reduces norms ❧ A random submatrix gets “its share” of the total norm ❧ Submatrices with small norm are ubiquitous ❧ Random selection is a form of regularization ❧ Added benefit: Dimension reduction

Column Subset Selection, MMDS, Stanford, June 2008 5

slide-6
SLIDE 6

Example: What Can Go Wrong

A =    

1 1 1 1 1 1 1 1 1 1 1 1

Aτ =    

1 1 1 1 1 1

A = Aτ = √ 2 = ⇒ No reduction!

Column Subset Selection, MMDS, Stanford, June 2008 6

slide-7
SLIDE 7

The (∞, 2) Operator Norm

Definition 3. The (∞, 2) operator norm of a matrix B is B∞,2 = max {Bx2 : x∞ = 1} .

B

Proposition 4. If B has s columns, then the best general bound is B∞,2 ≤ √s B .

Column Subset Selection, MMDS, Stanford, June 2008 7

slide-8
SLIDE 8

Random Reduction of (∞, 2) Norm

Lemma 5. Suppose the n columns of A have unit ℓ2 norm. Draw a uniformly random subset σ of columns whose cardinality |σ| = 2n A2. Then E Aσ∞,2 ≤ C

  • |σ|.

❧ Problem: How can we use this information?

Column Subset Selection, MMDS, Stanford, June 2008 8

slide-9
SLIDE 9

Pietsch Factorization

Theorem 6. [Pietsch, Grothendieck] Every matrix B can be factorized as B = T D where ❧ D is diagonal and nonnegative with trace(D2) = 1, and ❧ B∞,2 ≤ T ≤

  • π/2 B∞,2

D T

Column Subset Selection, MMDS, Stanford, June 2008 9

slide-10
SLIDE 10

Pietsch and Norm Reduction

Lemma 7. Suppose B has s columns. There is a set τ of column indices for which |τ| ≥ s 2 and Bτ ≤ √π · 1 √s B∞,2 .

  • Proof. Consider a Pietsch factorization B = T D. Select

τ =

  • j : d2

jj ≤ 2/s

  • .

Since d2

jj = 1, Markov’s inequality implies |τ| ≥ s/2. Calculate

Bτ = T Dτ ≤ T · Dτ ≤

  • π/2 B∞,2 ·
  • 2/s.

Column Subset Selection, MMDS, Stanford, June 2008 10

slide-11
SLIDE 11

Proof of Kashin–Tzafriri

❧ Suppose the n columns of A have unit ℓ2 norm ❧ Lemma 5 provides (random) σ for which |σ| = 2n A2 and Aσ∞,2 ≤ C

  • |σ|

❧ Lemma 7 applied to B = Aσ yields a subset τ ⊂ σ for which |τ| ≥ |σ| 2 and Bτ ≤ √π · 1

  • |σ|

· B∞,2 ❧ Simplify |τ| ≥ n A2 and Aτ ≤ C√π ❧ Note: This is almost an algorithm

Column Subset Selection, MMDS, Stanford, June 2008 11

slide-12
SLIDE 12

Pietsch and Eigenvalues

❧ Consider a matrix B with Pietsch factorization B = T D ❧ Suppose T ≤ α ❧ Calculate B = T D = ⇒ Bx2

2 = T Dx2 2

∀x = ⇒ Bx2

2 ≤ α2 Dx2 2

∀x = ⇒ x∗(B∗B)x ≤ α2 · x∗D2x ∀x = ⇒ x∗ B∗B − α2D2 x ≤ 0 ∀x = ⇒ λmax(B∗B − α2D2) ≤ 0

Column Subset Selection, MMDS, Stanford, June 2008 12

slide-13
SLIDE 13

Pietsch is Convex

❧ Key new idea: Can find Pietsch factorizations by convex programming min λmax(B∗B − α2F ) subject to F diagonal, F ≥ 0, trace(F ) = 1 ❧ If value at F⋆ is nonpositive, then we have a factorization B = (BF −1/2

) · F 1/2

with

  • BF −1/2

  • ≤ α

❧ Proof of Kashin–Tzafriri offers target value for α ❧ Can also perform binary search to approximate minimal value of α

Column Subset Selection, MMDS, Stanford, June 2008 13

slide-14
SLIDE 14

An Optimization over the Simplex

❧ Express F = diag(f) ❧ Constraints delineate the probability simplex: ∆ = {f : trace(f) = 1 and f ≥ 0} ❧ Objective function and its subdifferential: J(f) = λmax(B∗B − α2 diag(f)) ∂J(f) = conv

  • −α2 |u|2 : u top evec. B∗B − α2 diag(f), u2 = 1
  • ❧ Obtain

min J(f) subject to f ∈ ∆

Column Subset Selection, MMDS, Stanford, June 2008 14

slide-15
SLIDE 15

Entropic Mirror Descent

  • 1. Intialize f (1) ← s−1e and k ← 1
  • 2. Compute a subgradient: θ ∈ ∂J(f (k))
  • 3. Determine step size:

βk ←

  • 2 log s

k θ2

  • 4. Update variable:

f (k+1) ← f (k) ◦ exp{−βk θ} trace(f (k) ◦ exp{−βk θ})

  • 5. Increment k ← k + 1, and return to 2.

References: [Eggermont 1991, Beck–Teboulle 2003]

Column Subset Selection, MMDS, Stanford, June 2008 15

slide-16
SLIDE 16

Other Formulations

❧ Modified primal to simultaneously identify α min λmax(B∗B − α2F ) + α2 subject to F diagonal, F ≥ 0, trace(F ) = 1, α ≥ 0 ❧ Dual problem is the famous maxcut SDP: max B∗B, Z subject to diag(Z) = e, Z 0

Column Subset Selection, MMDS, Stanford, June 2008 16

slide-17
SLIDE 17

Related Results

Theorem 8. [Bourgain–Tzafriri 1991] Suppose the n columns of A have unit ℓ2 norm. There is a set τ of column indices for which |τ| ≥ cn A2 and κ(Aτ) ≤ √ 3.

Examples:

❧ A has identical columns. Then |τ| ≥ 1. ❧ A has orthonormal columns. Then |τ| ≥ cn.

Column Subset Selection, MMDS, Stanford, June 2008 17

slide-18
SLIDE 18

Related Results

Theorem 8. [Bourgain–Tzafriri 1991] Suppose the n columns of A have unit ℓ2 norm. There is a set τ of column indices for which |τ| ≥ cn A2 and κ(Aτ) ≤ √ 3. Theorem 9. [T 2007] There is a randomized, polynomial-time algorithm that produces the set τ.

Column Subset Selection, MMDS, Stanford, June 2008 18

slide-19
SLIDE 19

To learn more...

E-mail: ❧ jtropp@acm.caltech.edu Web: http://www.acm.caltech.edu/~jtropp Papers in Preparation:

❧ T, “Column subset selection, matrix factorization, and eigenvalue optimization” ❧ T, “Paved with good intentions: Computational applications of matrix column partitions” ❧ . . .

Column Subset Selection, MMDS, Stanford, June 2008 19