Nonnegative matrix factorization and applications in audio signal - PowerPoint PPT Presentation

Nonnegative matrix factorization and applications in audio signal processing C´ edric F´ evotte Laboratoire Lagrange, Nice Machine Learning Crash Course Genova, June 2015 1

Outline Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF 2

Matrix factorisation models Data often available in matrix form. coefficient s e r u t a e f samples 3

Matrix factorisation models Data often available in matrix form. movie s rating e i 4 v o m users 4

Matrix factorisation models Data often available in matrix form. word count s d 57 r o w text documents 5

Matrix factorisation models Data often available in matrix form. Fourier s e i coefficient c n e 0.3 u q e r f time 6

Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 7

Matrix factorisation models dictionary learning ≈ low-rank approximation factor analysis latent semantic analysis data X dictionary W activations H ≈ 8

Matrix factorisation models for dimensionality reduction (coding, low-dimensional embedding) ≈ 9

Matrix factorisation models for unmixing (source separation, latent topic discovery) ≈ 10

Matrix factorisation models for interpolation (collaborative filtering, image inpainting) ≈ 11

Nonnegative matrix factorisation K patterns H N samples F features ≈ V W ◮ data V and factors W , H have nonnegative entries. ◮ nonnegativity of W ensures interpretability of the dictionary, because patterns w k and samples v n belong to the same space. ◮ nonnegativity of H tends to produce part-based representations, because subtractive combinations are forbidden. Early work by Paatero and Tapper (1994), landmark Nature paper by Lee and Seung (1999) 12

49 images among 2429 from MIT’s CBCL face dataset 13

PCA dictionary with K = 25 red pixels indicate negative values 14

NMF dictionary with K = 25 experiment reproduced from (Lee and Seung, 1999) 15

NMF for latent semantic analysis (Lee and Seung, 1999; Hofmann, 1999) court president government served council governor culture secretary supreme senate constitutional congress Encyclopedia entry: rights presidential 'Constitution of the justice elected United States' president (148) flowers disease congress (124) leaves behaviour power (120) plant glands united (104) perennial contact ≈ constitution (81) flower symptoms amendment (71) plants skin government (57) growing pain law (49) annual infection ≈ × v n W h n reproduced from (Lee and Seung, 1999) 16

NMF for hyperspectral unmixing (Berry, Browne, Langville, Pauca, and Plemmons, 2007) reproduced from (Bioucas-Dias et al., 2012) 17

NMF for audio spectral unmixing (Smaragdis and Brown, 2003) Input music passage 20000 16000 6000 3500 Frequency (Hz) Component 2000 1000 500 100 4 3 2 1 0.5 1 1.5 2 2.5 3 Frequency Time (sec) 4 Component 3 2 1 reproduced from (Smaragdis, 2013) 18

NMF as a constrained minimisation problem Minimise a measure of fit between V and WH , subject to nonnegativity: � W , H ≥ 0 D ( V | WH ) = min d ([ V ] fn | [ WH ] fn ) , fn where d ( x | y ) is a scalar cost function, e.g., ◮ squared Euclidean distance (Paatero and Tapper, 1994; Lee and Seung, 2001) ◮ Kullback-Leibler divergence (Lee and Seung, 1999; Finesso and Spreij, 2006) ◮ Itakura-Saito divergence (F´ evotte, Bertin, and Durrieu, 2009) ◮ α -divergence (Cichocki et al., 2008) ◮ β -divergence (Cichocki et al., 2006; F´ evotte and Idier, 2011) ◮ Bregman divergences (Dhillon and Sra, 2005) ◮ and more in (Yang and Oja, 2011) Regularisation terms often added to D ( V | WH ) for sparsity, smoothness, dynamics, etc. 20

Common NMF algorithm design ◮ Block-coordinate update of H given W ( i − 1) and W given H ( i ) . ◮ Updates of W and H equivalent by transposition: V ≈ WH ⇔ V T ≈ H T W T ◮ Objective function separable in the columns of H or the rows of W : � D ( V | WH ) = D ( v n | Wh n ) n ◮ Essentially left with nonnegative linear regression: h ≥ 0 C ( h ) def min = D ( v | Wh ) Numerous references in the image restoration literature. e.g., (Richardson, 1972; Lucy, 1974; Daube-Witherspoon and Muehllehner, 1986; De Pierro, 1993) 21

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 22

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (0) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (1) h (0) 22

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (1) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (2) h (1) h (0) 22

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h (2) ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h (3) h (2) h (1) h (0) 22

Majorisation-minimisation (MM) Build G ( h | ˜ h ) such that G ( h | ˜ h ) ≥ C ( h ) and G (˜ h | ˜ h ) = C (˜ h ). Optimise (iteratively) G ( h | ˜ h ) instead of C ( h ). 0.5 Objective function C(h) 0.45 Auxiliary function G(h|h * ) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 3 h * h (3) h (2) h (1) h (0) 22

Majorisation-minimisation (MM) ◮ Finding a good & workable local majorisation is the crucial point. ◮ For most the divergences mentioned, Jensen and tangent inequalities are usually enough. ◮ In many cases, leads to multiplicative algorithms such that � � γ ∇ − h k C (˜ h ) h k = ˜ h k ∇ + h k C (˜ h ) where h k C ( h ) − ∇ + ◮ ∇ h k C ( h ) = ∇ − h k C ( h ) and the two summands are nonnegative ◮ γ is a divergence-specific scalar exponent. ◮ More details about MM in (Lee and Seung, 2001; F´ evotte and Idier, 2011; Yang and Oja, 2011) . 23

How to choose a right measure of fit ? ◮ Squared Euclidean distance is a common default choice. ◮ Underlies a Gaussian additive noise model such that v fn = [ WH ] fn + ǫ fn . Can generate negative values – not very natural for nonnegative data. ◮ Many other options. Select a right divergence (for a specific problem) by ◮ comparing performances, given ground-truth data. ◮ assessing the ability to predict missing/unseen data (interpolation, cross-validation). ◮ probabilistic modelling: D ( V | WH ) = − log p ( V | WH ) + cst 24

How to choose a right measure of fit ? ◮ Let V ∼ p ( V | WH ) such that E[ V | WH ] = WH ◮ then the following correspondences apply with D ( V | WH ) = − log p ( V | WH ) + cst data support distribution/noise divergence examples real-valued additive Gaussian squared Euclidean many integer multinomial Kullback-Leibler word counts integer Poisson generalised KL photon counts multiplicative nonnegative Itakura-Saito spectral data Gamma generally generalises Tweedie β -divergence nonnegative above models 25

Piano toy example � �� (MIDI numbers : 61, 65, 68, 72) Figure: Three representations of data. 27

Piano toy example IS-NMF on power spectrogram with K = 8 Dictionary W Coefficients H Reconstructed components 15000 − 2 0.2 10000 K = 1 − 4 0 − 6 5000 − 0.2 − 8 − 10 0 − 2 10000 0.2 K = 2 − 4 0 − 6 5000 − 0.2 − 8 − 10 0 6000 − 2 0.2 4000 K = 3 − 4 0 − 6 2000 − 8 − 0.2 − 10 0 8000 − 2 0.2 K = 4 − 4 6000 0 − 6 4000 − 0.2 − 8 2000 − 10 0 − 2 2000 0.2 K = 5 − 4 0 − 6 1000 − 0.2 − 8 − 10 0 200 − 2 0.2 K = 6 − 4 0 − 6 100 − 8 − 0.2 − 10 0 4 − 2 0.2 K = 7 − 4 2 0 − 6 − 0.2 − 8 − 10 0 2 − 2 0.2 K = 8 − 4 0 1 − 6 − 8 − 0.2 − 10 0 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 0.5 1 1.5 2 2.5 3 5 x 10 Pitch estimates: 65.0 68.0 61.0 72.0 0 0 0 0 (True values: 61, 65, 68, 72) 28

Nonnegative matrix factorization and applications in audio signal - PowerPoint PPT Presentation

Nonnegative matrix factorization and applications in audio signal processing C edric F evotte Laboratoire Lagrange, Nice Machine Learning Crash Course Genova, June 2015 1 Outline Generalities Matrix factorisation models Nonnegative

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Nonnegative Matrix Factorization and Applications Christine De Mol (joint work with Michel

Some Recent Advances in Nonnegative Matrix Factorization and their Applications to Hyperspectral

Data Sciences CentraleSupelec Advance Machine Learning Course VI - Nonnegative matrix

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

New variants of Nonnegative Matrix Factorization for sparsity improvement and maximum biclique

Adversarial Nonnegative Matrix Factorization Lei Luo, Yanfu Zhang, Heng Huang Electrical and

Automatic relevance determination in nonnegative matrix factorization with the -divergence

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling Jamie Haddock

Parallel Nonnegative Matrix Factorization Algorithms for Hyperspectral Images A Masters Thesis

Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature

Multi-View Clustering via Joint Nonnegative Matrix Factorization Jialu Liu 1 Chi Wang 1 Jing Gao 2

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Exploring the uses of ICT in education: A national survey study Robert A.P. REUTER 1 , Gilbert

Good Societies Index 2012 Comparing Quality of Life in Relatively Wealthy Societies Ron Anderson

Computer-aided cryptographic proofs and designs Gilles Barthe (IMDEA, Spain) Benjamin Grgoire

Outsourcing Phone-based Web Authentication while Protecting User Privacy NordSec 2016 Martin

U(1)-gauge theory via canonical transformations Adrian Knigstein Institut fr Theoretische

How Ontologies and Rules Help to Advance Automobile Development

+ Speaking Russian in an English-speaking world Dr Anna Mikhaylova Associate Editor, Heritage

Rank one perturbations of unitary operators and Clarks model in general situation Sergei Treil