an introduction to nonnegative matrix factorisation
play

An introduction to Nonnegative Matrix Factorisation Slim ESSID - PowerPoint PPT Presentation

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS June 2015 1 / 53 Credits Some illustrations, slides and demos are reproduced


  1. An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 1 / 53

  2. Credits Some illustrations, slides and demos are reproduced courtesy of: • A. Ozerov, • C. Févotte, • N. Seichepine, • R. Hennequin, • F. Vallet, • A. Liutkus. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 2 / 53

  3. ◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 3 / 53

  4. Introduction Motivation Explaining data by factorisation General formulation W ( F × K ) × H ( K × N ) V ( F × N ) F N v n w k v n ≈ � K k = 1 h kn w k Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53

  5. Introduction Motivation Explaining data by factorisation General formulation F V ( F × N ) W ( F × K ) × H ( K × N ) N v n w k data matrix “explanatory variables” “regressors”, “basis”, “dictionary”, “activation coefficients”, “patterns”, “topics” “expansion coefficients” Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53

  6. Introduction Motivation Data is often nonnegative by nature 1 • pixel intensities; • amplitude spectra; • occurrence counts; • food or energy consumption; • user scores; • stock market values; • ... For the sake of interpretability of the results, optimal processing of nonnegative data may call for processing under nonnegativity constraints . 1 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 5 / 53

  7. Introduction Motivation The Nonnegative Matrix Factorisation model NMF provides an unsupervised linear representation of the data : H V ≈ WH ; W V − W = [ w fk ] s.t. w fk ≥ 0 and − H = [ h kn ] s.t. h kn ≥ 0. Illustration by N. Seichepine Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 6 / 53

  8. Introduction Motivation Explaining face images by NMF 2 Image example: 49 images among 2429 from MIT’s CBCL face dataset 2 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 7 / 53

  9. Introduction Motivation Explaining face images by NMF Method Importance of features Facial Vectorised images features in each image ≈ ... ... ... ... V W H Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 8 / 53

  10. Introduction Motivation NMF outputs Image example Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 9 / 53

  11. Introduction Motivation Notations I • V : the F × N data matrix : − F features (rows), − N observations/examples/feature vectors (columns); • v n = ( v 1 n , · · · , v Fn ) T : the n -th feature vector observation among a collection of N observations v 1 , · · · , v N ; • v n is a column vector in R F + ; v n is a row vector; • W : the F × K dictionary matrix : − w fk is one of its coefficients, − w k a dictionary/basis vector among K elements; Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 10 / 53

  12. Introduction Motivation Notations II • H : the K × N activation/expansion matrix: − h n : the column vector of activation coefficients for observation v n : K � v n ≈ h kn w k ; k = 1 − h k : : the row vector of activation coefficients relating to basis vector w k . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 11 / 53

  13. NMF models ◮ Introduction ◮ NMF models – Cost functions – Weighted NMF schemes ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 12 / 53

  14. NMF models Cost functions NMF optimization criteria NMF approximation V ≈ WH is usually obtained through: W , H ≥ 0 D ( V | WH ) , min where D ( V | � V ) is a separable matrix divergence : F N � � D ( V | � V ) = d ( v fn | ˆ v fn ) , n = 1 f = 1 and d ( x | y ) defined for all x , y ≥ 0 is a scalar divergence such that: • d ( x | y ) is continuous over x and y ; • d ( x | y ) ≥ 0 for all x , y ≥ 0; • d ( x | y ) = 0 if and only if x = y . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 13 / 53

  15. NMF models Cost functions Popular (scalar) divergences Euclidean (EUC) distance ( Lee and Seung, 1999) 2 d EUC ( x | y ) = ( x − y ) Kullback-Leibler (KL) divergence ( Lee and Seung, 1999) d KL ( x | y ) = x log x y − x + y Itakura-Saito (IS) divergence ( Févotte et al., 2009) d IS ( x | y ) = x y − log x y − 1 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 14 / 53

  16. NMF models Cost functions Convexity properties Divergence d ( x | y ) EUC KL IS Convex on x yes yes yes Convex on y yes yes no Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 15 / 53

  17. NMF models Cost functions Scale invariance properties 3 λ 2 d EUC ( x | y ) d EUC ( λ x | λ y ) = d KL ( λ x | λ y ) = λ d KL ( x | y ) d IS ( λ x | λ y ) = d IS ( x | y ) The IS divergence is scale-invariant → it provides higher accuracy in the representation of data with large dynamic range ( e.g. audio spectra). 3 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 16 / 53

  18. NMF models Weighted NMF schemes Weighted NMF Conventional NMF optimization criterion: F N � � min d ( v fn | ˆ v fn ) . W , H ≥ 0 n = 1 f = 1 Weighted NMF optimization criterion: F N � � min b fn d ( v fn | ˆ v fn ) , W , H ≥ 0 f = 1 n = 1 where b fn ( f = 1 , . . . , F , n = 1 , . . . , N ) are some nonnegative weights representing the contribution of data point v fn to NMF learning. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 17 / 53

  19. NMF models Weighted NMF schemes Weighted NMF application example I Learning from partial observations (e.g., for image inpainting as in ( Mairal et al., 2010) ): Observed value b fn = 1 Missing value b fn = 0 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 18 / 53

  20. NMF models Weighted NMF schemes Weighted NMF application example II Face feature extraction (example and figure from ( Blondel et al., 2008) ): Data V Weights B = { b fn } f , n Image-centered weights Face-centered weights Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 19 / 53

  21. Algorithms for solving NMF ◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF – Preliminaries – Difficulties in NMF – Multiplicative update rules ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 20 / 53

  22. Algorithms for solving NMF Preliminaries Optimization problem An efficient solution of the NMF optimization problem θ C ( θ ) ; C ( θ ) def W , H ≥ 0 D ( V | WH ) ⇔ min min = D ( V | WH ) where θ def = { W , H } denotes the NMF parameters, must cope with the following difficulties: • the nonnegativity constraints must be taken into account; • the solution is not unique ... Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 21 / 53

  23. Algorithms for solving NMF Difficulties in NMF NMF is ill-posed The solution is not unique Given V = WH ; W ≥ 0, H ≥ 0; any matrix Q such that: • WQ ≥ 0 • Q − 1 H ≥ 0 provides an alternative factorisation V = ˜ W ˜ H = ( WQ )( Q − 1 H ) . In particular, Q can be any nonnegative generalised permutation matrix ; e.g. , in R 3 :   0 0 2   Q = 0 3 0 1 0 0 This case is not so problematic: merely accounts for scaling and permutation of basis vectors w k . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 22 / 53

  24. Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w v i w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

  25. Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w w 1 C w v i v i w 2 w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Problem : which C w ? Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

  26. Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w w 1 C w v i v i w 2 w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Problem : which C w ? → Need to impose constraints on the set of possible solutions to select the most “useful” ones. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend