Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the β -divergence and applications in imaging C´ edric F´ evotte Institut de Recherche en Informatique de Toulouse Imaging & Machine Learning Institut Henri Poincar´ e April 2019

Outline Generalities Matrix factorisation models Nonnegative matrix factorisation (NMF) Optimisation for NMF Measures of fit Majorisation-minimisation Applications in imaging Hyperspectral unmixing in remote sensing Factor analysis in dynamic PET 2

Matrix factorisation models Data often available in matrix form. coefficient s e r u t a e f samples 3

Matrix factorisation models Data often available in matrix form. movie s rating e i 4 v o m users 4

Matrix factorisation models Data often available in matrix form. word count s d 57 r o w text documents 5

Matrix factorisation models Data often available in matrix form. Fourier s e i coefficient c n e 0.3 u q e r f time 6

Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 7

Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 8

Matrix factorisation models for dimensionality reduction (coding, low-dimensional embedding) ≈ 9

Matrix factorisation models for unmixing (source separation, latent topic discovery) ≈ 10

Matrix factorisation models for interpolation (collaborative filtering, image inpainting) ≈ 11

Nonnegative matrix factorisation K patterns H N samples F features ≈ V W ◮ data V and factors W , H have nonnegative entries. ◮ nonnegativity of W ensures interpretability of the dictionary, because patterns w k and samples v n belong to the same space. ◮ nonnegativity of H tends to produce part-based representations, because subtractive combinations are forbidden. Early work by (Paatero and Tapper, 1994) , landmark Nature paper by (Lee and Seung, 1999) 12

NMF for latent semantic analysis (Lee and Seung, 1999; Hofmann, 1999) court president government served council governor culture secretary supreme senate constitutional congress Encyclopedia entry: rights presidential 'Constitution of the justice elected United States' president (148) flowers disease congress (124) leaves behaviour power (120) plant glands united (104) perennial contact ≈ constitution (81) flower symptoms amendment (71) plants skin government (57) growing pain law (49) annual infection ≈ × v n W h n reproduced from (Lee and Seung, 1999) 13

NMF for audio spectral unmixing (Smaragdis and Brown, 2003) Input music passage 20000 16000 6000 3500 Frequency (Hz) Component 2000 1000 500 100 4 3 2 1 0.5 1 1.5 2 2.5 3 Frequency Time (sec) 4 Component 3 2 1 reproduced from (Smaragdis, 2013) 14

NMF for hyperspectral unmixing (Berry, Browne, Langville, Pauca, and Plemmons, 2007) reproduced from (Bioucas-Dias et al., 2012) 15

Outline Generalities Matrix factorisation models Nonnegative matrix factorisation (NMF) Optimisation for NMF Measures of fit Majorisation-minimisation Applications in imaging Hyperspectral unmixing in remote sensing Factor analysis in dynamic PET 16

NMF as a constrained minimisation problem Minimise a measure of fit between V and WH , subject to nonnegativity: � W , H ≥ 0 D ( V | WH ) = min d ([ V ] fn | [ WH ] fn ) , fn where d ( x | y ) is a scalar cost function, e.g., ◮ squared Euclidean distance (Paatero and Tapper, 1994; Lee and Seung, 2001) ◮ Kullback-Leibler divergence (Lee and Seung, 1999; Finesso and Spreij, 2006) ◮ Itakura-Saito divergence (F´ evotte, Bertin, and Durrieu, 2009) ◮ α -divergence (Cichocki et al., 2008) ◮ β -divergence (Cichocki et al., 2006; F´ evotte and Idier, 2011) ◮ Bregman divergences (Dhillon and Sra, 2005) ◮ and more in (Yang and Oja, 2011) Regularisation terms often added to D ( V | WH ) for sparsity, smoothness, dynamics, etc. Nonconvex problem. 17

Probabilistic models ◮ Let V ∼ p ( V | WH ) such that ◮ E[ V | WH ] = WH ◮ p ( V | WH ) = � fn p ( v fn | [ WH ] fn ) ◮ then the following correspondences apply with D ( V | WH ) = − log p ( V | WH ) + cst data support distribution/noise divergence examples real-valued additive Gaussian squared Euclidean many multinomial ⋆ integer weighted KL word counts integer Poisson generalised KL photon counts multiplicative nonnegative Itakura-Saito spectrogram Gamma generally generalises Tweedie β -divergence nonnegative above models ⋆ conditional independence over f does not apply 18

The β -divergence A popular measure of fit in NMF (Basu et al., 1998; Cichocki and Amari, 2010) x β + ( β − 1) y β − β x y β − 1 �  1 � β ∈ R \{ 0 , 1 } β ( β − 1)  d β ( x | y ) def  x log x = y + ( y − x ) β = 1 y − log x x y − 1 β = 0   Special cases: ◮ squared Euclidean distance ( β = 2) ◮ generalised Kullback-Leibler (KL) divergence ( β = 1) ◮ Itakura-Saito (IS) divergence ( β = 0) Properties: ◮ Homogeneity: d β ( λ x | λ y ) = λ β d β ( x | y ) ◮ d β ( x | y ) is a convex function of y for 1 ≤ β ≤ 2 ◮ Bregman divergence 19

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 20

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 21

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 22

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 β = −1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 23

The β -divergence d(x=1|y) 1 β = 2 (Euc) 0.9 β = 1 (KL) β = 0 (IS) 0.8 β = −1 β = 3 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 24

Common NMF algorithm design ◮ Block-coordinate update of H given W ( i − 1) and W given H ( i ) . ◮ Updates of W and H equivalent by transposition: V ≈ WH ⇔ V T ≈ H T W T ◮ Objective function separable in the columns of H or the rows of W : � D ( V | WH ) = D ( v n | Wh n ) n ◮ Essentially left with nonnegative linear regression: h ≥ 0 C ( h ) def min = D ( v | Wh ) Numerous references in the image restoration literature, e.g., (Richardson, 1972; Lucy, 1974; Daube-Witherspoon and Muehllehner, 1986; De Pierro, 1993) Block-descent algorithm, nonconvex problem, initialisation is an issue. 25

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (0) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (1) h (0) 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (1) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (2) h (1) h (0) 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (2) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (3) h (2) h (1) h (0) 26

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h * ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h * h (3) h (2) h (1) h (0) 26

Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ Treating convex and concave terms separately with Jensen and tangent inequalities usually works. E.g.: �� v f C IS ( h ) = + log w fk h k + cst � k w fk h k f f k 27

Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ Treating convex and concave terms separately with Jensen and tangent inequalities usually works. E.g.: �� v f C IS ( h ) = + log w fk h k + cst � k w fk h k f f k ◮ In most cases, leads to nonnegativity-preserving multiplicative algorithms: � γ h k C (˜ � ∇ − h ) h k = ˜ h k h k C (˜ ∇ + h ) ◮ ∇ h k C ( h ) = ∇ + h k C ( h ) − ∇ − h k C ( h ) and the two summands are nonnegative. ◮ if ∇ h k C (˜ h ) > 0, ratio of summands < 1 and h k goes left. ◮ γ is a divergence-specific scalar exponent. ◮ Details in (F´ evotte and Idier, 2011; Yang and Oja, 2011; Zhao and Tan, 2018) 27

Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C edric F evotte Institut de Recherche en Informatique de Toulouse Imaging & Machine Learning Institut Henri Poincar e April 2019 Outline

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Nonnegative matrix factorization and applications in audio signal processing C edric F

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim

Fault-tolerant matrix factorisation: a formal model and proof Camille Coti, Laure Petrucci,

Lecture 2: The Wiener-Hopf factorisation A. E. Kyprianou Department of Mathematical Sciences,

Factorisation algebras associated to Hilbert schemes of points Emily Cliff University of Oxford

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Involutive factorisation systems & Dold-Kan correspondences Clemens Berger 1 University of

A factorisation theorem for the number of rhombus tilings of a hexagon with triangular holes

Comprehensive factorisation & non-commutative Stone duality Clemens Berger 1 University of

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Stochastic multi-scale selection of the stopping Nicolai Bissantz criterion for MLEM

Disclosures Infective Endocarditis: 2016 Update I have nothing to disclose Ann Bolger, MD

Outline Introduction to Positron Emission Tomography (PET) Why use Multi Pixel Photon

Otto Casal BS, CNMT, R.T.(CT) Education Director - Department of Radiology

SiPMs for solar neutrino detector? J. Kaspar, 6/10/14 1 SiPM is Geiger photodiode APD Mode

Development and Study of the Multi Pixel Photon Counter Satoru Uozumi (Shinshu University, Japan)

TARRANT COUNTY COLLEGE DISTRICT AGENDA FOR NUCLEAR MEDICINE TECHNOLOGY INFORMATION SESSION 1.

INF5470 Fall 2012 Juan Antonio Leero-Bardallo Lecture 2: Neurophysiology in a Nutshell

Sambuz

Useful Links

Newsletter

Mail Us

Robust nonnegative matrix factorisation with the -divergence and - PowerPoint PPT Presentation

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C edric F evotte Institut de Recherche en Informatique de Toulouse Imaging & Machine Learning Institut Henri Poincar e April 2019 Outline

Robust nonnegative matrix factorisation with the -divergence and applications in imaging C

Nonnegative matrix factorization and applications in audio signal processing C edric F

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim

Fault-tolerant matrix factorisation: a formal model and proof Camille Coti, Laure Petrucci,

Lecture 2: The Wiener-Hopf factorisation A. E. Kyprianou Department of Mathematical Sciences,

Factorisation algebras associated to Hilbert schemes of points Emily Cliff University of Oxford

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Involutive factorisation systems &amp; Dold-Kan correspondences Clemens Berger 1 University of

A factorisation theorem for the number of rhombus tilings of a hexagon with triangular holes

Comprehensive factorisation &amp; non-commutative Stone duality Clemens Berger 1 University of

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Stochastic multi-scale selection of the stopping Nicolai Bissantz criterion for MLEM

Disclosures Infective Endocarditis: 2016 Update I have nothing to disclose Ann Bolger, MD

Outline Introduction to Positron Emission Tomography (PET) Why use Multi Pixel Photon

Otto Casal BS, CNMT, R.T.(CT) Education Director - Department of Radiology

SiPMs for solar neutrino detector? J. Kaspar, 6/10/14 1 SiPM is Geiger photodiode APD Mode

Development and Study of the Multi Pixel Photon Counter Satoru Uozumi (Shinshu University, Japan)

TARRANT COUNTY COLLEGE DISTRICT AGENDA FOR NUCLEAR MEDICINE TECHNOLOGY INFORMATION SESSION 1.

INF5470 Fall 2012 Juan Antonio Leero-Bardallo Lecture 2: Neurophysiology in a Nutshell

Sambuz

Useful Links

Newsletter

Mail Us

Involutive factorisation systems & Dold-Kan correspondences Clemens Berger 1 University of

Comprehensive factorisation & non-commutative Stone duality Clemens Berger 1 University of