An introduction to Nonnegative Matrix Factorisation Slim ESSID - PowerPoint PPT Presentation

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 1 / 53

Credits Some illustrations, slides and demos are reproduced courtesy of: • A. Ozerov, • C. Févotte, • N. Seichepine, • R. Hennequin, • F. Vallet, • A. Liutkus. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 2 / 53

◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 3 / 53

Introduction Motivation Explaining data by factorisation General formulation W ( F × K ) × H ( K × N ) V ( F × N ) F N v n w k v n ≈ � K k = 1 h kn w k Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53

Introduction Motivation Explaining data by factorisation General formulation F V ( F × N ) W ( F × K ) × H ( K × N ) N v n w k data matrix “explanatory variables” “regressors”, “basis”, “dictionary”, “activation coefficients”, “patterns”, “topics” “expansion coefficients” Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53

Introduction Motivation Data is often nonnegative by nature 1 • pixel intensities; • amplitude spectra; • occurrence counts; • food or energy consumption; • user scores; • stock market values; • ... For the sake of interpretability of the results, optimal processing of nonnegative data may call for processing under nonnegativity constraints . 1 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 5 / 53

Introduction Motivation The Nonnegative Matrix Factorisation model NMF provides an unsupervised linear representation of the data : H V ≈ WH ; W V − W = [ w fk ] s.t. w fk ≥ 0 and − H = [ h kn ] s.t. h kn ≥ 0. Illustration by N. Seichepine Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 6 / 53

Introduction Motivation Explaining face images by NMF 2 Image example: 49 images among 2429 from MIT’s CBCL face dataset 2 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 7 / 53

Introduction Motivation Explaining face images by NMF Method Importance of features Facial Vectorised images features in each image ≈ ... ... ... ... V W H Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 8 / 53

Introduction Motivation NMF outputs Image example Illustration by C. Févotte Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 9 / 53

Introduction Motivation Notations I • V : the F × N data matrix : − F features (rows), − N observations/examples/feature vectors (columns); • v n = ( v 1 n , · · · , v Fn ) T : the n -th feature vector observation among a collection of N observations v 1 , · · · , v N ; • v n is a column vector in R F + ; v n is a row vector; • W : the F × K dictionary matrix : − w fk is one of its coefficients, − w k a dictionary/basis vector among K elements; Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 10 / 53

Introduction Motivation Notations II • H : the K × N activation/expansion matrix: − h n : the column vector of activation coefficients for observation v n : K � v n ≈ h kn w k ; k = 1 − h k : : the row vector of activation coefficients relating to basis vector w k . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 11 / 53

NMF models ◮ Introduction ◮ NMF models – Cost functions – Weighted NMF schemes ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 12 / 53

NMF models Cost functions NMF optimization criteria NMF approximation V ≈ WH is usually obtained through: W , H ≥ 0 D ( V | WH ) , min where D ( V | � V ) is a separable matrix divergence : F N � � D ( V | � V ) = d ( v fn | ˆ v fn ) , n = 1 f = 1 and d ( x | y ) defined for all x , y ≥ 0 is a scalar divergence such that: • d ( x | y ) is continuous over x and y ; • d ( x | y ) ≥ 0 for all x , y ≥ 0; • d ( x | y ) = 0 if and only if x = y . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 13 / 53

NMF models Cost functions Popular (scalar) divergences Euclidean (EUC) distance ( Lee and Seung, 1999) 2 d EUC ( x | y ) = ( x − y ) Kullback-Leibler (KL) divergence ( Lee and Seung, 1999) d KL ( x | y ) = x log x y − x + y Itakura-Saito (IS) divergence ( Févotte et al., 2009) d IS ( x | y ) = x y − log x y − 1 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 14 / 53

NMF models Cost functions Convexity properties Divergence d ( x | y ) EUC KL IS Convex on x yes yes yes Convex on y yes yes no Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 15 / 53

NMF models Cost functions Scale invariance properties 3 λ 2 d EUC ( x | y ) d EUC ( λ x | λ y ) = d KL ( λ x | λ y ) = λ d KL ( x | y ) d IS ( λ x | λ y ) = d IS ( x | y ) The IS divergence is scale-invariant → it provides higher accuracy in the representation of data with large dynamic range ( e.g. audio spectra). 3 slide adapted from (Févotte, 2012). Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 16 / 53

NMF models Weighted NMF schemes Weighted NMF Conventional NMF optimization criterion: F N � � min d ( v fn | ˆ v fn ) . W , H ≥ 0 n = 1 f = 1 Weighted NMF optimization criterion: F N � � min b fn d ( v fn | ˆ v fn ) , W , H ≥ 0 f = 1 n = 1 where b fn ( f = 1 , . . . , F , n = 1 , . . . , N ) are some nonnegative weights representing the contribution of data point v fn to NMF learning. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 17 / 53

NMF models Weighted NMF schemes Weighted NMF application example I Learning from partial observations (e.g., for image inpainting as in ( Mairal et al., 2010) ): Observed value b fn = 1 Missing value b fn = 0 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 18 / 53

NMF models Weighted NMF schemes Weighted NMF application example II Face feature extraction (example and figure from ( Blondel et al., 2008) ): Data V Weights B = { b fn } f , n Image-centered weights Face-centered weights Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 19 / 53

Algorithms for solving NMF ◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF – Preliminaries – Difficulties in NMF – Multiplicative update rules ◮ Applications ◮ Conclusion Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 20 / 53

Algorithms for solving NMF Preliminaries Optimization problem An efficient solution of the NMF optimization problem θ C ( θ ) ; C ( θ ) def W , H ≥ 0 D ( V | WH ) ⇔ min min = D ( V | WH ) where θ def = { W , H } denotes the NMF parameters, must cope with the following difficulties: • the nonnegativity constraints must be taken into account; • the solution is not unique ... Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 21 / 53

Algorithms for solving NMF Difficulties in NMF NMF is ill-posed The solution is not unique Given V = WH ; W ≥ 0, H ≥ 0; any matrix Q such that: • WQ ≥ 0 • Q − 1 H ≥ 0 provides an alternative factorisation V = ˜ W ˜ H = ( WQ )( Q − 1 H ) . In particular, Q can be any nonnegative generalised permutation matrix ; e.g. , in R 3 :   0 0 2   Q = 0 3 0 1 0 0 This case is not so problematic: merely accounts for scaling and permutation of basis vectors w k . Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 22 / 53

Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w v i w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w w 1 C w v i v i w 2 w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Problem : which C w ? Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

Algorithms for solving NMF Difficulties in NMF Geometric interpretation and ill-posedness NMF assumes the data is well described by a simplicial convex cone C w generated by the columns of W : w 1 C w w 1 C w v i v i w 2 w 2 �� K � C w = k = 1 λ k w k ; λ k ≥ 0 Problem : which C w ? → Need to impose constraints on the set of possible solutions to select the most “useful” ones. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

An introduction to Nonnegative Matrix Factorisation Slim ESSID - PowerPoint PPT Presentation

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS June 2015 1 / 53 Credits Some illustrations, slides and demos are reproduced

Nonnegative matrix factorization and applications in audio signal processing C edric F

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Fault-tolerant matrix factorisation: a formal model and proof Camille Coti, Laure Petrucci,

Lecture 2: The Wiener-Hopf factorisation A. E. Kyprianou Department of Mathematical Sciences,

Factorisation algebras associated to Hilbert schemes of points Emily Cliff University of Oxford

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Involutive factorisation systems & Dold-Kan correspondences Clemens Berger 1 University of

A factorisation theorem for the number of rhombus tilings of a hexagon with triangular holes

Comprehensive factorisation & non-commutative Stone duality Clemens Berger 1 University of

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Nuclear Browns Ferry Nuclear Plant Filtering Strategies January 9, 2013 Preston Swafford,

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University

Back to the future : the Back and Forth Nudging algorithm Conference on Applied Inverse

Back to the future : the Back and Forth Nudging Scaling Up and Modeling for Transport and Flow in

Analyze Breakdown in All Seasons Cavity K. Yonehara APC,

Breakdown in All Seasons Cavity K. Yonehara APC, Fermilab

Particle Learning and Smoothing Hedibert Freitas Lopes The University of Chicago Booth School of

From Pointer Systems to Counter Systems using Shape Analysis Arnaud Sangnier EDF R&D, LSV,

An introduction to Nonnegative Matrix Factorisation Slim ESSID - PowerPoint PPT Presentation

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS June 2015 1 / 53 Credits Some illustrations, slides and demos are reproduced

Nonnegative matrix factorization and applications in audio signal processing C edric F

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Fault-tolerant matrix factorisation: a formal model and proof Camille Coti, Laure Petrucci,

Lecture 2: The Wiener-Hopf factorisation A. E. Kyprianou Department of Mathematical Sciences,

Factorisation algebras associated to Hilbert schemes of points Emily Cliff University of Oxford

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Involutive factorisation systems &amp; Dold-Kan correspondences Clemens Berger 1 University of

A factorisation theorem for the number of rhombus tilings of a hexagon with triangular holes

Comprehensive factorisation &amp; non-commutative Stone duality Clemens Berger 1 University of

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Nuclear Browns Ferry Nuclear Plant Filtering Strategies January 9, 2013 Preston Swafford,

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University

Back to the future : the Back and Forth Nudging algorithm Conference on Applied Inverse

Back to the future : the Back and Forth Nudging Scaling Up and Modeling for Transport and Flow in

Analyze Breakdown in All Seasons Cavity K. Yonehara APC,

Breakdown in All Seasons Cavity K. Yonehara APC, Fermilab

Particle Learning and Smoothing Hedibert Freitas Lopes The University of Chicago Booth School of

From Pointer Systems to Counter Systems using Shape Analysis Arnaud Sangnier EDF R&amp;D, LSV,

Involutive factorisation systems & Dold-Kan correspondences Clemens Berger 1 University of

Comprehensive factorisation & non-commutative Stone duality Clemens Berger 1 University of

From Pointer Systems to Counter Systems using Shape Analysis Arnaud Sangnier EDF R&D, LSV,