Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of - - PowerPoint PPT Presentation

coordinate descent for mixed norm nmf
SMART_READER_LITE
LIVE PREVIEW

Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of - - PowerPoint PPT Presentation

Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of Computer Science, UNM and Mitsubishi Electric Research Labs Cambridge, MA December, 2013 Joint work with Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey and Matthew E.


slide-1
SLIDE 1

Coordinate Descent for mixed-norm NMF

Vamsi K. Potluru

  • Dept. of Computer Science, UNM and

Mitsubishi Electric Research Labs Cambridge, MA

December, 2013

Joint work with Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey and Matthew E. Brand

slide-2
SLIDE 2

Contents

1 / 7

slide-3
SLIDE 3

Nonnegative Matrix Factorization

Factor a nonnegative matrix as follows: X ≈ W H (m × n) (m × r) (r × n) Applications: Collaborative filtering, hyperspectral image analysis, music transcription among others. Prior information Problem is under-determined. Additional requirements imposed by the problem domain: Sparsity Orthogonality

2 / 7

slide-4
SLIDE 4

Sparsity measures

L0 norm corresponds to our intuitive notion of sparsity. Axioms (Hurley and Rickard 2009) Robin Hood —Stealing from rich decreases sparsity. Scaling —Sparsity is scale-invariant. Rising tide —Adding constant decreases sparsity. Cloning — Invariant under cloning. Bill Gates —A very wealthy individual increases sparsity. Babies —Newborns increase sparsity. Hoyer’s sparsity measure: sp(x) = 1 √ d − 1 ( √ d − x1 x2 ) Observe that sp() lies between 0 and 1. Higher values correspond to sparser vectors.

3 / 7

slide-5
SLIDE 5

Sparse NMF

Sparse NMF formulation (Hoyer 2004, Heiler and Schnorr 2006): min

W,H f(W, H) = 1

2X − WH2

F

s.t. W ≥ 0, H ≥ 0 W i2 = 1, W i1 = α ∀ i ∈ {1, . . . , r}, (1)

Figure : 25 features each. Sparsity of 0.5 (left), 0.6 (middle) and 0.75 (right).

4 / 7

slide-6
SLIDE 6

Group sparse NMF

Our Sparse NMF formulation (includes Mørup et al., 2008): min

W,H f(W, H) = 1

2X − WH2

F

s.t. W ≥ 0, H ≥ 0 W i2 = 1 ∀ i ∈ {1, . . . , r},

  • i∈Ig

W i1 = αg ∀ g ∈ {1, . . . , G} User-friendly sparsity formulation (implicit version Kim et al., 2012). Optimizing a column at a time: max y≥0 b⊤y s.t. 1

⊤y = k,

y2 = 1 where dim(b) = m. Sparsity does not mix!

5 / 7

slide-7
SLIDE 7

Update Schemes for W

Sparsity does not mix

                  ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗                  

s1

This paper

                  ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗                  

s1 s2

6 / 7

slide-8
SLIDE 8

Results on ORL faces dataset

Optimizing two columns at a time: max y≥0 b⊤y s.t. 1

⊤y = k, y12 = 1, y22 = 1

where y = [y⊤

1, y⊤ 2] ⊤ and b = [b⊤ 1, b⊤ 2] ⊤ dim(b1) = m1, dim(b2) = m2.

Figure : 25 features each. Sparsity of 0.4 (left), 0.6 (middle) and {0.2, 0.5, 0.8} (right).

7 / 7