An introduction to Nonnegative Matrix Factorisation Slim ESSID - - PowerPoint PPT Presentation

an introduction to nonnegative matrix factorisation
SMART_READER_LITE
LIVE PREVIEW

An introduction to Nonnegative Matrix Factorisation Slim ESSID - - PowerPoint PPT Presentation

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS June 2015 1 / 53 Credits Some illustrations, slides and demos are reproduced


slide-1
SLIDE 1

An introduction to Nonnegative Matrix Factorisation

Slim ESSID

Telecom ParisTech

June 2015

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 1 / 53

slide-2
SLIDE 2

Credits

Some illustrations, slides and demos are reproduced courtesy of:

  • A. Ozerov,
  • C. Févotte,
  • N. Seichepine,
  • R. Hennequin,
  • F. Vallet,
  • A. Liutkus.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 2 / 53

slide-3
SLIDE 3

◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 3 / 53

slide-4
SLIDE 4

Introduction Motivation

Explaining data by factorisation

General formulation

F N W(F×K) × H(K×N) V(F×N) vn wk vn ≈ K

k=1 hknwk

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53

slide-5
SLIDE 5

Introduction Motivation

Explaining data by factorisation

General formulation

F N W(F×K) × H(K×N) V(F×N) vn wk data matrix “explanatory variables” “regressors”, “basis”, “dictionary”, “patterns”, “topics” “activation coefficients”, “expansion coefficients”

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 4 / 53

slide-6
SLIDE 6

Introduction Motivation

Data is often nonnegative by nature1

  • pixel intensities;
  • amplitude spectra;
  • occurrence counts;
  • food or energy consumption;
  • user scores;
  • stock market values;
  • ...

For the sake of interpretability of the results, optimal processing of nonnegative data may call for processing under nonnegativity constraints.

1slide adapted from (Févotte, 2012).

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 5 / 53

slide-7
SLIDE 7

Introduction Motivation

The Nonnegative Matrix Factorisation model

NMF provides an unsupervised linear representation of the data:

W H V

V ≈ WH;

− W = [wfk] s.t. wfk ≥ 0

and

− H = [hkn] s.t. hkn ≥ 0.

Illustration by N. Seichepine

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 6 / 53

slide-8
SLIDE 8

Introduction Motivation

Explaining face images by NMF2

Image example: 49 images among 2429 from MIT’s CBCL face dataset

2slide adapted from (Févotte, 2012).

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 7 / 53

slide-9
SLIDE 9

Introduction Motivation

Explaining face images by NMF

Method

H ≈ V W Vectorised images Facial features Importance of features in each image ... ... ... ...

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 8 / 53

slide-10
SLIDE 10

Introduction Motivation

NMF outputs

Image example

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 9 / 53

slide-11
SLIDE 11

Introduction Motivation

Notations I

  • V : the F × N data matrix:

− F features (rows), − N observations/examples/feature vectors (columns);

  • vn = (v1n, · · · , vFn)T: the n-th feature vector observation among a

collection of N observations v1, · · · , vN;

  • vn is a column vector in RF

+; vn is a row vector;

  • W : the F × K dictionary matrix:

− wfk is one of its coefficients, − wk a dictionary/basis vector among K elements; Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 10 / 53

slide-12
SLIDE 12

Introduction Motivation

Notations II

  • H : the K × N activation/expansion matrix:

− hn : the column vector of activation coefficients for observation vn :

vn ≈

K

  • k=1

hknwk ;

− hk: : the row vector of activation coefficients relating to basis vector wk. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 11 / 53

slide-13
SLIDE 13

NMF models

◮ Introduction ◮ NMF models – Cost functions – Weighted NMF schemes ◮ Algorithms for solving NMF ◮ Applications ◮ Conclusion

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 12 / 53

slide-14
SLIDE 14

NMF models Cost functions

NMF optimization criteria

NMF approximation V ≈ WH is usually obtained through: min

W,H≥0 D(V|WH) ,

where D(V| V) is a separable matrix divergence: D(V| V) =

F

  • f =1

N

  • n=1

d(vfn|ˆ vfn) , and d(x|y) defined for all x, y ≥ 0 is a scalar divergence such that:

  • d(x|y) is continuous over x and y;
  • d(x|y) ≥ 0 for all x, y ≥ 0;
  • d(x|y) = 0 if and only if x = y.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 13 / 53

slide-15
SLIDE 15

NMF models Cost functions

Popular (scalar) divergences

Euclidean (EUC) distance (Lee and Seung, 1999) dEUC(x|y) = (x − y)

2

Kullback-Leibler (KL) divergence (Lee and Seung, 1999) dKL(x|y) = x log x y − x + y Itakura-Saito (IS) divergence (Févotte et al., 2009) dIS(x|y) = x y − log x y − 1

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 14 / 53

slide-16
SLIDE 16

NMF models Cost functions

Convexity properties

Divergence d(x|y) EUC KL IS Convex on x yes yes yes Convex on y yes yes no

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 15 / 53

slide-17
SLIDE 17

NMF models Cost functions

Scale invariance properties3

dEUC(λ x|λ y) = λ2 dEUC(x|y) dKL(λ x|λ y) = λ dKL(x|y) dIS(λ x|λ y) = dIS(x|y) The IS divergence is scale-invariant → it provides higher accuracy in the representation of data with large dynamic range (e.g. audio spectra).

3slide adapted from (Févotte, 2012).

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 16 / 53

slide-18
SLIDE 18

NMF models Weighted NMF schemes

Weighted NMF

Conventional NMF optimization criterion: min

W,H≥0 F

  • f =1

N

  • n=1

d(vfn|ˆ vfn) . Weighted NMF optimization criterion: min

W,H≥0 F

  • f =1

N

  • n=1

bfnd(vfn|ˆ vfn) , where bfn (f = 1, . . . , F, n = 1, . . . , N) are some nonnegative weights representing the contribution of data point vfn to NMF learning.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 17 / 53

slide-19
SLIDE 19

NMF models Weighted NMF schemes

Weighted NMF application example I

Learning from partial observations (e.g., for image inpainting as in (Mairal

et al., 2010)):

Observed value bfn = 1 Missing value bfn = 0

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 18 / 53

slide-20
SLIDE 20

NMF models Weighted NMF schemes

Weighted NMF application example II

Face feature extraction (example and figure from (Blondel et al., 2008)): Data V Image-centered weights Face-centered weights Weights B = {bfn}f ,n

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 19 / 53

slide-21
SLIDE 21

Algorithms for solving NMF

◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF – Preliminaries – Difficulties in NMF – Multiplicative update rules ◮ Applications ◮ Conclusion

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 20 / 53

slide-22
SLIDE 22

Algorithms for solving NMF Preliminaries

Optimization problem

An efficient solution of the NMF optimization problem min

W,H≥0 D(V|WH) ⇔ min θ C(θ) ; C(θ) def

= D(V|WH) where θ def = {W, H} denotes the NMF parameters, must cope with the following difficulties:

  • the nonnegativity constraints must be taken into account;
  • the solution is not unique...

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 21 / 53

slide-23
SLIDE 23

Algorithms for solving NMF Difficulties in NMF

NMF is ill-posed

The solution is not unique

Given V = WH ; W ≥ 0, H ≥ 0; any matrix Q such that:

  • WQ ≥ 0
  • Q−1H ≥ 0

provides an alternative factorisation V = ˜ W ˜ H = (WQ)(Q−1H). In particular, Q can be any nonnegative generalised permutation matrix; e.g., in R3 : Q =   2 3 1   This case is not so problematic: merely accounts for scaling and permutation of basis vectors wk.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 22 / 53

slide-24
SLIDE 24

Algorithms for solving NMF Difficulties in NMF

Geometric interpretation and ill-posedness

NMF assumes the data is well described by a simplicial convex cone Cw generated by the columns of W:

vi Cw w1 w2 Cw = K

k=1 λkwk; λk ≥ 0

  • Slim ESSID (Telecom ParisTech)

Introduction to NMF TPT - UPS – June 2015 23 / 53

slide-25
SLIDE 25

Algorithms for solving NMF Difficulties in NMF

Geometric interpretation and ill-posedness

NMF assumes the data is well described by a simplicial convex cone Cw generated by the columns of W:

vi Cw w1 w2 Cw = K

k=1 λkwk; λk ≥ 0

  • vi

Cw w1 w2 Problem: which Cw?

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

slide-26
SLIDE 26

Algorithms for solving NMF Difficulties in NMF

Geometric interpretation and ill-posedness

NMF assumes the data is well described by a simplicial convex cone Cw generated by the columns of W:

vi Cw w1 w2 Cw = K

k=1 λkwk; λk ≥ 0

  • vi

Cw w1 w2 Problem: which Cw?

→ Need to impose constraints on the set of possible solutions to select the most “useful” ones.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 23 / 53

slide-27
SLIDE 27

Algorithms for solving NMF Multiplicative update rules

Alternating optimization strategy

The problem is usually easier to optimize over one matrix (say H) given the

  • ther matrix (say W) is known and fixed.

Indeed, for several divergences D(V|WH) is even convex separately w.r.t. H and w.r.t. W, but not w.r.t. {W, H}. For this reason many state-of-the-art NMF optimization algorithms rely on the following iterative alternating optimization strategy. Alternating optimization a.k.a block-coordinate descent (one iteration):

  • update W, given H fixed,
  • update H, given W fixed.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 24 / 53

slide-28
SLIDE 28

Algorithms for solving NMF Multiplicative update rules

Multiplicative update rules

A heuristic approach introduced by (Lee and Seung, 2001) to solve minθ C(θ) Multiplicative update (MU) rule for H (similarly for W) is defined as: hkn ← hkn [∇hknC(θ)]− / [∇hknC(θ)]+ , where ∇hknC(θ) = [∇hknC(θ)]+ − [∇hknC(θ)]− , and the summands are both nonnegative. NOTE: The nonnegativity of W and H is guaranteed by construction.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 25 / 53

slide-29
SLIDE 29

Algorithms for solving NMF Multiplicative update rules

Intuitive explanation

We consider for simplicity ∇hC(h) = ∇+ − ∇−

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 26 / 53

slide-30
SLIDE 30

Algorithms for solving NMF Multiplicative update rules

Discussion

The only two things guaranteed by this approach:

  • the newly updated value lies in the direction of partial derivative

decrease;

  • the newly updated value is always nonnegative.

Nothing more can be guaranteed in general, and all the other algorithm properties depend on the “positive-negative” decomposition chosen: ∇hknC(θ) = [∇hknC(θ)]+ − [∇hknC(θ)]− .

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 27 / 53

slide-31
SLIDE 31

Algorithms for solving NMF Multiplicative update rules

Majorisation-minimisation viewpoint

For many divergences and certain “positive-negative” decompositions each MU rule can be interpreted as a Majorisation-Minimisation (MM) procedure (Hunter and Lange, 2004): To minimise C(s), e.g., s = wfk or s = hkn:

  • build G(s|˜

s) such that G(s|˜ s) ≥ C(s) and G(˜ s|˜ s) = C(˜ s);

  • optimize iteratively G(s|˜

s) instead of C(s).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Objective function C(s)

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 28 / 53

slide-32
SLIDE 32

Algorithms for solving NMF Multiplicative update rules

Majorisation-minimisation viewpoint

For many divergences and certain “positive-negative” decompositions each MU rule can be interpreted as a Majorisation-Minimisation (MM) procedure (Hunter and Lange, 2004): To minimise C(s), e.g., s = wfk or s = hkn:

  • build G(s|˜

s) such that G(s|˜ s) ≥ C(s) and G(˜ s|˜ s) = C(˜ s);

  • optimize iteratively G(s|˜

s) instead of C(s).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 s(0) s(1) Objective function C(s) Auxiliary function G(s|s(0))

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 28 / 53

slide-33
SLIDE 33

Algorithms for solving NMF Multiplicative update rules

Majorisation-minimisation viewpoint

For many divergences and certain “positive-negative” decompositions each MU rule can be interpreted as a Majorisation-Minimisation (MM) procedure (Hunter and Lange, 2004): To minimise C(s), e.g., s = wfk or s = hkn:

  • build G(s|˜

s) such that G(s|˜ s) ≥ C(s) and G(˜ s|˜ s) = C(˜ s);

  • optimize iteratively G(s|˜

s) instead of C(s).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 s(1) s(2) s(0) Objective function C(s) Auxiliary function G(s|s(1))

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 28 / 53

slide-34
SLIDE 34

Algorithms for solving NMF Multiplicative update rules

Majorisation-minimisation viewpoint

For many divergences and certain “positive-negative” decompositions each MU rule can be interpreted as a Majorisation-Minimisation (MM) procedure (Hunter and Lange, 2004): To minimise C(s), e.g., s = wfk or s = hkn:

  • build G(s|˜

s) such that G(s|˜ s) ≥ C(s) and G(˜ s|˜ s) = C(˜ s);

  • optimize iteratively G(s|˜

s) instead of C(s).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 s(3) s(2) s(1) s(0) Objective function C(s) Auxiliary function G(s|s(2))

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 28 / 53

slide-35
SLIDE 35

Algorithms for solving NMF Multiplicative update rules

Majorisation-minimisation viewpoint

For many divergences and certain “positive-negative” decompositions each MU rule can be interpreted as a Majorisation-Minimisation (MM) procedure (Hunter and Lange, 2004): To minimise C(s), e.g., s = wfk or s = hkn:

  • build G(s|˜

s) such that G(s|˜ s) ≥ C(s) and G(˜ s|˜ s) = C(˜ s);

  • optimize iteratively G(s|˜

s) instead of C(s).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 s* s(3) s(2) s(1) s(0) Objective function C(s) Auxiliary function G(s|s*)

Illustration by C. Févotte

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 28 / 53

slide-36
SLIDE 36

Algorithms for solving NMF Multiplicative update rules

Majorisation-minimisation viewpoint

For many divergences and certain “positive-negative” decompositions each MU rule can be interpreted as a Majorisation-Minimisation (MM) procedure (Hunter and Lange, 2004): To minimise C(s), e.g., s = wfk or s = hkn:

  • build G(s|˜

s) such that G(s|˜ s) ≥ C(s) and G(˜ s|˜ s) = C(˜ s);

  • optimize iteratively G(s|˜

s) instead of C(s). ◮ NOTE: The MM procedure guarantees the cost is non-increasing at each iteration: C(s(t+1)) ≤ G(s(t+1)|s(t)) ≤ G(s(t)|s(t)) = C(s(t)).

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 28 / 53

slide-37
SLIDE 37

Algorithms for solving NMF Multiplicative update rules

Summary

Multiplicative Update rules: Advantages:

  • easy to implement;
  • non-negativity of W and H is guaranteed.

Drawbacks:

  • monotonicity is not always guaranteed;
  • among other algorithms the convergence rate is not the highest one.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 29 / 53

slide-38
SLIDE 38

Applications

◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications – Text analysis – Music transcription – Video structuring ◮ Conclusion

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 30 / 53

slide-39
SLIDE 39

Applications Text analysis

Topics recovery

Assume V = [vfn] is a term-document co-occurrence matrix: vfn is the frequency of occurrences of word mf in document dn; H ≈ V W Documents Topics Topic importance indicators Words

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 31 / 53

slide-40
SLIDE 40

Applications Text analysis

Text document analysis example

After sklearn topics extraction demo (Pedregosa et al., 2011)

Analysing the 20 newsgroups dataset with NMF, the following topics are automatically determined:

  • Topic #0: god people bible israel jesus christian true moral think

christians believe don say human israeli church life children jewish

  • Topic #1: drive windows card drivers video scsi software pc thanks vga

graphics help disk uni dos file ide controller work

  • Topic #2: game team nhl games ca hockey players buffalo edu cc year

play university teams baseball columbia league player toronto

  • Topic #3: window manager application mit motif size display widget

program xlib windows user color event information use events values

  • Topic #4: pitt gordon banks cs science pittsburgh univ computer soon

disease edu reply pain health david article medical medicine Topics described by most frequent words in each dictionary element Wk.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 32 / 53

slide-41
SLIDE 41

Applications Music transcription

◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications – Text analysis – Music transcription – Video structuring ◮ Conclusion

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 33 / 53

slide-42
SLIDE 42

Applications Music transcription

NMF-based music transcription

Demo slide courtesy of C. Févotte (Fevotte et al., 2009)

  • (MIDI numbers: 61, 65, 68, 72)

Three representations of the data.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 34 / 53

slide-43
SLIDE 43

Applications Music transcription

Spectral analysis

Short-Term Fourier Transform (STFT)

Drawing by J. Laroche Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 35 / 53

slide-44
SLIDE 44

Applications Music transcription

NMF-based music transcription demo

Demo slide courtesy of C. Févotte (Fevotte et al., 2009)

  • (MIDI numbers: 61, 65, 68, 72)

Three representations of the data.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 36 / 53

slide-45
SLIDE 45

Applications Music transcription

Music transcription demo

Demo slide courtesy of C. Févotte (Fevotte et al., 2009)

NMF decomposition with K = 8

−10 −8 −6 −4 −2 K = 1 Dictionary W 5000 10000 15000 Coefficients H −0.2 0.2 Reconstructed components −10 −8 −6 −4 −2 K = 2 5000 10000 −0.2 0.2 −10 −8 −6 −4 −2 K = 3 2000 4000 6000 −0.2 0.2 −10 −8 −6 −4 −2 K = 4 2000 4000 6000 8000 −0.2 0.2 −10 −8 −6 −4 −2 K = 5 1000 2000 −0.2 0.2 −10 −8 −6 −4 −2 K = 6 100 200 −0.2 0.2 −10 −8 −6 −4 −2 K = 7 2 4 −0.2 0.2 50 100 150 200 250 300 350 400 450 500 −10 −8 −6 −4 −2 K = 8 100 200 300 400 500 600 1 2 0.5 1 1.5 2 2.5 3 x 10

5

−0.2 0.2

Pitch estimates: 65.0 68.0 61.0 72.0 (True values: 61, 65, 68, 72)

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 37 / 53

slide-46
SLIDE 46

Applications Video structuring

◮ Introduction ◮ NMF models ◮ Algorithms for solving NMF ◮ Applications – Text analysis – Music transcription – Video structuring ◮ Conclusion

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 38 / 53

slide-47
SLIDE 47

Applications Video structuring

The video structuring problem

Goal: automatically extract a temporal organization of a document into units conveying a homogeneous type of (audio/video) content.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 39 / 53

slide-48
SLIDE 48

Applications Video structuring

Video Structuring

Using NMF for temporal segmentation and soft-clustering (Essid and Fevotte, 2013)

Discovering the video editing structure (Essid and Fevotte, 2012)

"Full group" "Multiple participants" "Multiple participants" "Participant 1" "Participant 2" "Participant 2" "Participant 3" "Participant 4" "Participant 5"

Performing speaker diarization (Seichepine et al., 2013) “Who spoke when?”

illustration by N. Seichepine

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 40 / 53

slide-49
SLIDE 49

Applications Video structuring

A generic video structuring system using NMF

Challenge: perform the task in a non-supervised fashion. Proposed approach: a generic structuring scheme using NMF (Essid and

Fevotte, 2013):

  • 1. Bag of words representation

A/V frames Vocab.

Word vocab. extraction Histograms

  • f words

NMF Activation thresholding

Structure extracted

images/ audio segments

  • 1. create a low-level (visual/audio) vocabulary and use it to extract

histogram of (visual/audio) words from the sequence of observation frames;

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 41 / 53

slide-50
SLIDE 50

Applications Video structuring

A generic video structuring system using NMF

Challenge: perform the task in a non-supervised fashion. Proposed approach: a generic structuring scheme using NMF (Essid and

Fevotte, 2013):

2.hDatahfactorisation A/V frames Vocab.

Wordhvocab. extraction Histograms

  • fhwords

KL-NMF Activation thresholding

Structurehextracted

  • 2. apply a variant of smooth NMF using the Kullback-Leibler divergence

to extract latent structuring events and their activations across the duration of the document.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 41 / 53

slide-51
SLIDE 51

Applications Video structuring

A generic video structuring system using NMF

Challenge: perform the task in a non-supervised fashion. Proposed approach: a generic structuring scheme using NMF (Essid and

Fevotte, 2013):

2.hDatahfactorisation A/V frames Vocab.

Wordhvocab. extraction Histograms

  • fhwords

KL-NMF Activation thresholding

Structurehextracted

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 41 / 53

slide-52
SLIDE 52

Applications Video structuring

A generic video structuring system using NMF

Challenge: perform the task in a non-supervised fashion. Proposed approach: a generic structuring scheme using NMF (Essid and

Fevotte, 2013):

2.hDatahfactorisation A/V frames Vocab.

Wordhvocab. extraction Histograms

  • fhwords

KL-NMF Activation thresholding

Structurehextracted

Activations should be temporally smooth: structuring events naturally exhibit a “certain” temporal continuity.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 41 / 53

slide-53
SLIDE 53

Applications Video structuring

Smooth KL-NMF

Using the Kullback-Leibler (KL) divergence as a measure of fit

Given histogram data (whose columns are frame-wise descriptors), we seek a factorization V ≈ WH; wfk ≥ 0 ; hkn ≥ 0 that minimises C(W, H) = D(V|WH) + βS(H) ;

  • D(V|WH) =

fn dKL(vfn| k wfkhkn): fit-to-data term such that

dKL(x|y) = x log x

y − x + y;

  • S(H) is a regularisation term that controls the temporal smoothness
  • f the activation coefficients:

S(H) = 1 2

K

  • k=1

N

  • n=2

(hkn − hk(n−1))2.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 42 / 53

slide-54
SLIDE 54

Applications Video structuring

Applications

Onscreen person-oriented structuring

Discover the video editing structure: label the video frames as follows in a non-supervised fashion:

"Full group" "Multiple participants" "Multiple participants" "Participant 1" "Participant 2" "Participant 2" "Participant 3" "Participant 4" "Participant 5"

Using the Canal9 political debates database (Vinciarelli et al., 2009).

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 43 / 53

slide-55
SLIDE 55

Applications Video structuring

Visual features

Visual vocabulary creation

− PHOW features (Bosch et al., 2007): histograms of

  • rientation gradients over 3 scales, on 8-pixel step

grid; extracted from faces and clothing regions, determined automatically for current video;

− quantization over 128 bins using K-means. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 44 / 53

slide-56
SLIDE 56

Applications Video structuring

Results

Visualising the activations

Full group Speaker 1 Speaker 2 Speaker 3 Speaker 4 Speaker 5 Full group MP Speaker 5 Full group

MP: Multiple Participants

MP

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 45 / 53

slide-57
SLIDE 57

Applications Video structuring

Experimental validation

Canal9 political debates database (Vinciarelli et al., 2009)

− broadcasts featuring a moderator and 2 to 4 guests; − moderators, guest and background vary; − 7 hours of video content: 10 minutes from each of the first 41 shows; − 189 distinct persons; 28521 video shots. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 46 / 53

slide-58
SLIDE 58

Applications Video structuring

Results

Shot-type classification error rates

1 2 3 4 10 20 30 40 50 60 70 80 NMF,

= 0

NMF,

= 0.1

NMF,

= 1

HMM ref. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 47 / 53

slide-59
SLIDE 59

Conclusion Summary and future challenges

Take-home messages I

  • NMF is a versatile data decomposition technique that has proven

effective for diverse applications across numerous disciplines,

− it tends to provide “meaningful” and “natural” part-based data

representations,

− it can be used both for feature learning, topic extraction, clustering,

segmentation, source separation, coding...

  • For NMF to be successful, it has to be estimated using appropriate

cost-functions reflecting prior knowledge about the data.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 48 / 53

slide-60
SLIDE 60

Conclusion Summary and future challenges

Take-home messages II

  • Many algorithms are available to estimate NMF, mostly alternating

updates of W and H; variants include:

− multiplicative updates: heuristic, simple and easy to implement, but slow

and instable,

− majorisation-minimisation: well-founded for a variety of cost functions,

stable, still slow,

− gradient-descent and Newton: fast but unstable.

  • NMF is a state-of-the-art technique for a number of audio-processing

tasks (transcription, source separation...),

  • it has a great potential for video analysis tasks, especially temporal

structure analysis.

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 49 / 53

slide-61
SLIDE 61

Conclusion Summary and future challenges

Ongoing and future research

  • How to properly estimate the model-order K?
  • How to achieve better and faster “convergence”?
  • How to perform non-linear data decompositions?
  • How to handle big data?

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 50 / 53

slide-62
SLIDE 62

Conclusion NMF software & bibliography

A selection of NMF software

Software

Language

Main features

beta_ntf

Python Weighted tensor decomposition, all β-divergences, MM

sklearn.decomposition.NMF

Python ℓ2-norm, gradient-descent, sparsity IMM DTU NMF toolbox Matlab ℓ2-norm, MM, gradient-descent, ALS Févotte’s matlab scripts Matlab ℓ2-norm, KL and IS-div, MM, probabilistic Seichepine’s matlab scripts Matlab Soft co-factorisation, ℓ2-norm, KL and IS-div, ℓ1/ℓ2-norm temporal smoothing, MM svmnmf Matlab Geometric SVM-based NMF, kernel-based non-linear decompositions, fast libNMF C ℓ2-norm, MM, gradient-descent, ALS, multi-core, fast

Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 51 / 53

slide-63
SLIDE 63

Conclusion NMF software & bibliography

Bibliography I

  • V. D. Blondel, N.-D. Ho, and P. V. Dooren. Weighted non-negative matrix factorization and face feature extraction.

In Image and Vision Computing, 2008.

  • A. Bosch, A. Zisserman, and X. Munoz. Image classification using random forests and ferns. In IEEE 11th

International Conference on Computer Vision. IEEE, 2007. URL http://www.computer.org/portal/web/csdl/doi/10.1109/ICCV.2007.4409066.

  • S. Essid and C. Fevotte. Decomposing the Video Editing Structure of a Talk-show using Nonnegative Matrix
  • Factorization. In International Conference on Image Processing (ICIP), Orlando, FL, USA, 2012.
  • S. Essid and C. Fevotte. Smooth Nonnegative Matrix Factorization for Unsupervised Audiovisual Document
  • Structuring. IEEE Transactions on Multimedia, 15(2):415–425, 2013. ISSN 1520-9210. doi:

10.1109/TMM.2012.2228474.

  • C. Fevotte, N. Bertin, and J.-L. Durrieu. Nonnegative Matrix Factorization with the Itakura-Saito Divergence. With

Application to Music Analysis. Neural Computation, 21(3), Mar. 2009.

  • C. Févotte, N. Bertin, and J.-L. Durrieu. Nonnegative matrix factorization with the Itakura-Saito divergence. With

application to music analysis. Neural Computation, 21(3):793–830, 2009.

  • D. R. Hunter and K. Lange. A tutorial on MM algorithms. Amer. Stat., 58(1):30–37, Feb. 2004.
  • D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrix factorization. Nature, 401:

788–791, 1999.

  • D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural and

Information Processing Systems 13, pages 556–562, 2001.

  • J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. The Journal
  • f Machine Learning Research, 11(10-60), 2010.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,
  • V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn:

Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 52 / 53

slide-64
SLIDE 64

Conclusion NMF software & bibliography

Bibliography II

  • N. Seichepine, S. Essid, C. Fevotte, and O. Cappe. Soft nonnegative matrix co-factorization with application to

multimodal speaker diarization. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, 2013.

  • A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. Canal9: A database of political debates for analysis of social
  • interactions. In IEEE International Workshop on Social Signal Processing, Amsterdam, 2009. Ieee. ISBN

978-1-4244-4800-5. doi: 10.1109/ACII.2009.5349466. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5349466. Slim ESSID (Telecom ParisTech) Introduction to NMF TPT - UPS – June 2015 53 / 53