coordinate descent for mixed norm nmf
play

Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of - PowerPoint PPT Presentation

Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of Computer Science, UNM and Mitsubishi Electric Research Labs Cambridge, MA December, 2013 Joint work with Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey and Matthew E.


  1. Coordinate Descent for mixed-norm NMF Vamsi K. Potluru Dept. of Computer Science, UNM and Mitsubishi Electric Research Labs Cambridge, MA December, 2013 Joint work with Jonathan Le Roux, Barak A. Pearlmutter, John R. Hershey and Matthew E. Brand

  2. Contents 1 / 7

  3. Nonnegative Matrix Factorization Factor a nonnegative matrix as follows: X ≈ W H ( m × n ) ( m × r ) ( r × n ) Applications: Collaborative filtering, hyperspectral image analysis, music transcription among others. Prior information Problem is under-determined. Additional requirements imposed by the problem domain: Sparsity Orthogonality 2 / 7

  4. Sparsity measures L 0 norm corresponds to our intuitive notion of sparsity. Axioms (Hurley and Rickard 2009) Robin Hood —Stealing from rich decreases sparsity. Scaling —Sparsity is scale-invariant. Rising tide —Adding constant decreases sparsity. Cloning — Invariant under cloning. Bill Gates —A very wealthy individual increases sparsity. Babies —Newborns increase sparsity. Hoyer’s sparsity measure: √ 1 d − � x � 1 sp ( x ) = √ ( ) � x � 2 d − 1 Observe that sp () lies between 0 and 1. Higher values correspond to sparser vectors. 3 / 7

  5. Sparse NMF Sparse NMF formulation (Hoyer 2004, Heiler and Schnorr 2006): W , H f ( W , H ) = 1 2 � X − WH � 2 min s.t. W ≥ 0 , H ≥ 0 F (1) � W i � 2 = 1 , � W i � 1 = α ∀ i ∈ { 1 , . . . , r } , Figure : 25 features each. Sparsity of 0.5 (left), 0.6 (middle) and 0.75 (right). 4 / 7

  6. Group sparse NMF Our Sparse NMF formulation (includes Mørup et al., 2008): W , H f ( W , H ) = 1 2 � X − WH � 2 min s.t. W ≥ 0 , H ≥ 0 F � W i � 2 = 1 ∀ i ∈ { 1 , . . . , r } , � � W i � 1 = α g ∀ g ∈ { 1 , . . . , G } i ∈ I g User-friendly sparsity formulation (implicit version Kim et al., 2012). Optimizing a column at a time: y ≥ 0 b ⊤ y ⊤ y = k , max s.t. 1 � y � 2 = 1 where dim ( b ) = m . Sparsity does not mix! 5 / 7

  7. Update Schemes for W Sparsity does not mix This paper     ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗     ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗         ∗ ∗ ∗ ∗ ∗ ∗ ∗  ∗ ∗ ∗ ∗ ∗ ∗ ∗            s 1         ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗         ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗                     ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗         s 2 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ s 1 6 / 7

  8. Results on ORL faces dataset Optimizing two columns at a time: y ≥ 0 b ⊤ y s.t. 1 ⊤ y = k , � y 1 � 2 = 1 , � y 2 � 2 = 1 max ⊤ and b = [ b ⊤ ⊤ dim ( b 1 ) = m 1 , dim ( b 2 ) = m 2 . where y = [ y ⊤ 1 , y ⊤ 1 , b ⊤ 2 ] 2 ] Figure : 25 features each. Sparsity of 0.4 (left), 0.6 (middle) and { 0 . 2 , 0 . 5 , 0 . 8 } (right). 7 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend