Nonnegative matrix factorization and applications in audio signal - - PowerPoint PPT Presentation

nonnegative matrix factorization and applications in
SMART_READER_LITE
LIVE PREVIEW

Nonnegative matrix factorization and applications in audio signal - - PowerPoint PPT Presentation

Nonnegative matrix factorization and applications in audio signal processing C edric F evotte Laboratoire Lagrange, Nice Machine Learning Crash Course Genova, June 2015 1 Outline Generalities Matrix factorisation models Nonnegative


slide-1
SLIDE 1

Nonnegative matrix factorization and applications in audio signal processing

C´ edric F´ evotte

Laboratoire Lagrange, Nice Machine Learning Crash Course Genova, June 2015

1

slide-2
SLIDE 2

Outline

Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF

2

slide-3
SLIDE 3

Matrix factorisation models

Data often available in matrix form.

samples f e a t u r e s coefficient

3

slide-4
SLIDE 4

Matrix factorisation models

Data often available in matrix form.

users m

  • v

i e s movie rating

4

4

slide-5
SLIDE 5

Matrix factorisation models

Data often available in matrix form.

text documents w

  • r

d s word count

57

5

slide-6
SLIDE 6

Matrix factorisation models

Data often available in matrix form.

time f r e q u e n c i e s Fourier coefficient

0.3

6

slide-7
SLIDE 7

Matrix factorisation models

≈ dictionary learning low-rank approximation factor analysis latent semantic analysis

data X dictionary W activations H

7

slide-8
SLIDE 8

Matrix factorisation models

≈ dictionary learning low-rank approximation factor analysis latent semantic analysis

data X dictionary W activations H

8

slide-9
SLIDE 9

Matrix factorisation models

for dimensionality reduction (coding, low-dimensional embedding)

9

slide-10
SLIDE 10

Matrix factorisation models

for unmixing (source separation, latent topic discovery)

10

slide-11
SLIDE 11

Matrix factorisation models

for interpolation (collaborative filtering, image inpainting)

11

slide-12
SLIDE 12

Nonnegative matrix factorisation

V W H

N samples F features K patterns ◮ data V and factors W, H have nonnegative entries. ◮ nonnegativity of W ensures interpretability of the dictionary, because

patterns wk and samples vn belong to the same space.

◮ nonnegativity of H tends to produce part-based representations, because

subtractive combinations are forbidden.

Early work by Paatero and Tapper (1994), landmark Nature paper by Lee and Seung (1999)

12

slide-13
SLIDE 13

49 images among 2429 from MIT’s CBCL face dataset

13

slide-14
SLIDE 14

PCA dictionary with K = 25

red pixels indicate negative values

14

slide-15
SLIDE 15

NMF dictionary with K = 25

experiment reproduced from (Lee and Seung, 1999)

15

slide-16
SLIDE 16

NMF for latent semantic analysis

(Lee and Seung, 1999; Hofmann, 1999)

Encyclopedia entry: 'Constitution of the United States'

president (148) congress (124) power (120) united (104) constitution (81) amendment (71) government (57) law (49)

court government council culture supreme constitutional rights justice president served governor secretary senate congress presidential elected flowers leaves plant perennial flower plants growing annual disease behaviour glands contact symptoms skin pain infection

× vn W hn

reproduced from (Lee and Seung, 1999)

16

slide-17
SLIDE 17

NMF for hyperspectral unmixing

(Berry, Browne, Langville, Pauca, and Plemmons, 2007)

reproduced from (Bioucas-Dias et al., 2012)

17

slide-18
SLIDE 18

NMF for audio spectral unmixing

(Smaragdis and Brown, 2003)

1 2 3 4 Component Frequency 1 2 3 4 Component Frequency (Hz) Time (sec) Input music passage 0.5 1 1.5 2 2.5 3 100 500 1000 2000 3500 6000 16000 20000

reproduced from (Smaragdis, 2013)

18

slide-19
SLIDE 19

Outline

Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF

19

slide-20
SLIDE 20

NMF as a constrained minimisation problem

Minimise a measure of fit between V and WH, subject to nonnegativity: min

W,H≥0 D(V|WH) =

  • fn

d([V]fn|[WH]fn), where d(x|y) is a scalar cost function, e.g.,

◮ squared Euclidean distance (Paatero and Tapper, 1994; Lee and Seung, 2001) ◮ Kullback-Leibler divergence (Lee and Seung, 1999; Finesso and Spreij, 2006) ◮ Itakura-Saito divergence (F´

evotte, Bertin, and Durrieu, 2009)

◮ α-divergence (Cichocki et al., 2008) ◮ β-divergence (Cichocki et al., 2006; F´

evotte and Idier, 2011)

◮ Bregman divergences (Dhillon and Sra, 2005) ◮ and more in (Yang and Oja, 2011)

Regularisation terms often added to D(V|WH) for sparsity, smoothness, dynamics, etc.

20

slide-21
SLIDE 21

Common NMF algorithm design

◮ Block-coordinate update of H given W(i−1) and W given H(i). ◮ Updates of W and H equivalent by transposition:

V ≈ WH ⇔ VT ≈ HTWT

◮ Objective function separable in the columns of H or the rows of W:

D(V|WH) =

  • n

D(vn|Whn)

◮ Essentially left with nonnegative linear regression:

min

h≥0 C(h) def

= D(v|Wh)

Numerous references in the image restoration literature. e.g., (Richardson, 1972; Lucy, 1974; Daube-Witherspoon and Muehllehner, 1986; De Pierro, 1993)

21

slide-22
SLIDE 22

Majorisation-minimisation (MM)

Build G(h|˜ h) such that G(h|˜ h) ≥ C(h) and G(˜ h|˜ h) = C(˜ h). Optimise (iteratively) G(h|˜ h) instead of C(h).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Objective function C(h)

22

slide-23
SLIDE 23

Majorisation-minimisation (MM)

Build G(h|˜ h) such that G(h|˜ h) ≥ C(h) and G(˜ h|˜ h) = C(˜ h). Optimise (iteratively) G(h|˜ h) instead of C(h).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 h(0) h(1) Objective function C(h) Auxiliary function G(h|h(0))

22

slide-24
SLIDE 24

Majorisation-minimisation (MM)

Build G(h|˜ h) such that G(h|˜ h) ≥ C(h) and G(˜ h|˜ h) = C(˜ h). Optimise (iteratively) G(h|˜ h) instead of C(h).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 h(1) h(2) h(0) Objective function C(h) Auxiliary function G(h|h(1))

22

slide-25
SLIDE 25

Majorisation-minimisation (MM)

Build G(h|˜ h) such that G(h|˜ h) ≥ C(h) and G(˜ h|˜ h) = C(˜ h). Optimise (iteratively) G(h|˜ h) instead of C(h).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 h(3) h(2) h(1) h(0) Objective function C(h) Auxiliary function G(h|h(2))

22

slide-26
SLIDE 26

Majorisation-minimisation (MM)

Build G(h|˜ h) such that G(h|˜ h) ≥ C(h) and G(˜ h|˜ h) = C(˜ h). Optimise (iteratively) G(h|˜ h) instead of C(h).

3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 h* h(3) h(2) h(1) h(0) Objective function C(h) Auxiliary function G(h|h*)

22

slide-27
SLIDE 27

Majorisation-minimisation (MM)

◮ Finding a good & workable local majorisation is the crucial point. ◮ For most the divergences mentioned, Jensen and tangent inequalities are

usually enough.

◮ In many cases, leads to multiplicative algorithms such that

hk = ˜ hk

  • ∇−

hkC(˜

h) ∇+

hkC(˜

h) γ where

◮ ∇hk C(h) = ∇−

hk C(h) − ∇+ hk C(h) and the two summands are nonnegative

◮ γ is a divergence-specific scalar exponent.

◮ More details about MM in (Lee and Seung, 2001; F´

evotte and Idier, 2011; Yang and Oja, 2011).

23

slide-28
SLIDE 28

How to choose a right measure of fit ?

◮ Squared Euclidean distance is a common default choice. ◮ Underlies a Gaussian additive noise model such that

vfn = [WH]fn + ǫfn. Can generate negative values – not very natural for nonnegative data.

◮ Many other options.

Select a right divergence (for a specific problem) by

◮ comparing performances, given ground-truth data. ◮ assessing the ability to predict missing/unseen data (interpolation,

cross-validation).

◮ probabilistic modelling:

D(V|WH) = − log p(V|WH) + cst

24

slide-29
SLIDE 29

How to choose a right measure of fit ?

◮ Let V ∼ p(V|WH) such that E[V|WH] = WH ◮ then the following correspondences apply with

D(V|WH) = − log p(V|WH) + cst data support distribution/noise divergence examples real-valued additive Gaussian squared Euclidean many integer multinomial Kullback-Leibler word counts integer Poisson generalised KL photon counts nonnegative multiplicative Gamma Itakura-Saito spectral data generally nonnegative Tweedie β-divergence generalises above models

25

slide-30
SLIDE 30

Outline

Generalities Matrix factorisation models Nonnegative matrix factorisation Majorisation-minimisation algorithms Audio examples Piano toy example Audio restoration Audio bandwidth extension Multichannel IS-NMF

26

slide-31
SLIDE 31

Piano toy example

  • (MIDI numbers : 61, 65, 68, 72)

Figure: Three representations of data.

27

slide-32
SLIDE 32

Piano toy example

IS-NMF on power spectrogram with K = 8

−10 −8 −6 −4 −2 K = 1 Dictionary W 5000 10000 15000 Coefficients H −0.2 0.2 Reconstructed components −10 −8 −6 −4 −2 K = 2 5000 10000 −0.2 0.2 −10 −8 −6 −4 −2 K = 3 2000 4000 6000 −0.2 0.2 −10 −8 −6 −4 −2 K = 4 2000 4000 6000 8000 −0.2 0.2 −10 −8 −6 −4 −2 K = 5 1000 2000 −0.2 0.2 −10 −8 −6 −4 −2 K = 6 100 200 −0.2 0.2 −10 −8 −6 −4 −2 K = 7 2 4 −0.2 0.2 50 100 150 200 250 300 350 400 450 500 −10 −8 −6 −4 −2 K = 8 100 200 300 400 500 600 1 2 0.5 1 1.5 2 2.5 3 x 10

5

−0.2 0.2

Pitch estimates: 65.0 68.0 61.0 72.0 (True values: 61, 65, 68, 72)

28

slide-33
SLIDE 33

Piano toy example

KL-NMF on magnitude spectrogram with K = 8

−6 −4 −2 K = 1 Dictionary W 50 100 Coefficients H −0.2 0.2 Reconstructed components −6 −4 −2 K = 2 20 40 60 −0.2 0.2 −6 −4 −2 K = 3 20 40 −0.2 0.2 −6 −4 −2 K = 4 10 20 30 −0.2 0.2 −6 −4 −2 K = 5 20 40 −0.2 0.2 −6 −4 −2 K = 6 5 10 15 −0.2 0.2 −6 −4 −2 K = 7 5 10 −0.2 0.2 50 100 150 200 250 300 350 400 450 500 −6 −4 −2 K = 8 100 200 300 400 500 600 2 4 0.5 1 1.5 2 2.5 3 x 10

5

−0.2 0.2

Pitch estimates: 65.2 68.2 61.0 72.2 56.2 (True values: 61, 65, 68, 72)

29

slide-34
SLIDE 34

Audio restoration

Louis Armstrong and His Hot Five

Log−power spectrogram Freq. 20 40 60 80 100 120 10 20 30 40 50 60 70 80 90 100 −0.5 0.5 Original data Amp. Time (s)

30

slide-35
SLIDE 35

Audio restoration

Louis Armstrong and His Hot Five

Original mono = Accompaniment

  • Comp. 1,9

+ Brass

  • Comp. 2,3,5−8

+ Trombone

  • Comp. 4

+ Noise

  • Comp. 10

Original mono denoised Original denoised & upmixed to stereo

31

slide-36
SLIDE 36

Audio bandwidth extension

(Sun and Mazumder, 2013)

Y =

V =

Full-band training samples Band-limited samples

adapted from (Sun and Mazumder, 2013)

32

slide-37
SLIDE 37

Audio bandwidth extension

(Sun and Mazumder, 2013)

AC/DC example band-limited data (Back in Black) training data (Highway to Hell) bandwidth extended ground truth

Examples from http: // statweb. stanford. edu/ ~ dlsun/ bandwidth. html , used with permission from the author.

33

slide-38
SLIDE 38

Multichannel IS-NMF

(Ozerov and F´ evotte, 2010) + +

Sources S NMF: W H Mixing system A Mixture X Multichannel NMF problem: Estimate W, H and A from X noise 1 noise 2

◮ Best scores on the underdetermined speech and music separation task at the

Signal Separation Evaluation Campaign (SiSEC) 2008.

◮ IEEE Signal Processing Society 2014 Best Paper Award.

34

slide-39
SLIDE 39

User-guided multichannel IS-NMF

(Ozerov, F´ evotte, Blouet, and Durrieu, 2011)

◮ the decomposition is guided by the operator: source activation time-codes

are input to the separation system.

◮ set forced zeros in H when a source is silent.

35

slide-40
SLIDE 40

References I

  • M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons. Algorithms and

applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis, 52(1):155–173, Sep. 2007.

  • J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot.

Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):354–379, 2012.

  • A. Cichocki, R. Zdunek, and S. Amari. Csiszar’s divergences for non-negative matrix factorization:

Family of new algorithms. In Proc. International Conference on Independent Component Analysis and Blind Signal Separation (ICA), pages 32–39, Charleston SC, USA, Mar. 2006.

  • A. Cichocki, H. Lee, Y.-D. Kim, and S. Choi. Non-negative matrix factorization with α-divergence.

Pattern Recognition Letters, 29(9):1433–1440, July 2008.

  • M. Daube-Witherspoon and G. Muehllehner. An iterative image space reconstruction algorthm suitable

for volume ECT. IEEE Transactions on Medical Imaging, 5(5):61 – 66, 1986. doi: 10.1109/TMI.1986.4307748.

  • A. R. De Pierro. On the relation between the ISRA and the EM algorithm for positron emission
  • tomography. IEEE Trans. Medical Imaging, 12(2):328–333, 1993. doi: 10.1109/42.232263.
  • I. S. Dhillon and S. Sra. Generalized nonnegative matrix approximations with Bregman divergences. In

Advances in Neural Information Processing Systems (NIPS), 2005.

  • C. F´

evotte and J. Idier. Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9):2421–2456, Sep. 2011. doi: 10.1162/NECO a 00168. URL http://www.unice.fr/cfevotte/publications/journals/neco11.pdf.

36

slide-41
SLIDE 41

References II

  • C. F´

evotte, N. Bertin, and J.-L. Durrieu. Nonnegative matrix factorization with the Itakura-Saito

  • divergence. With application to music analysis. Neural Computation, 21(3):793–830, Mar. 2009. doi:

10.1162/neco.2008.04-08-771. URL http://www.unice.fr/cfevotte/publications/journals/neco09_is-nmf.pdf.

  • L. Finesso and P. Spreij. Nonnegative matrix factorization and I-divergence alternating minimization.

Linear Algebra and its Applications, 416:270–287, 2006.

  • T. Hofmann. Probabilistic latent semantic indexing. In Proc. 22nd International Conference on Research

and Development in Information Retrieval (SIGIR), 1999. URL http://www.cs.brown.edu/~th/papers/Hofmann-SIGIR99.pdf.

  • D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrix factorization. Nature,

401:788–791, 1999.

  • D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural and

Information Processing Systems 13, pages 556–562, 2001.

  • L. B. Lucy. An iterative technique for the rectification of observed distributions. Astronomical Journal,

79:745–754, 1974. doi: 10.1086/111605.

  • A. Ozerov and C. F´
  • evotte. Multichannel nonnegative matrix factorization in convolutive mixtures for

audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 18(3): 550–563, Mar. 2010. doi: 10.1109/TASL.2009.2031510. URL http://www.unice.fr/cfevotte/publications/journals/ieee_asl_multinmf.pdf.

  • A. Ozerov, C. F´

evotte, R. Blouet, and J.-L. Durrieu. Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, May

  • 2011. URL http://www.unice.fr/cfevotte/publications/proceedings/icassp11d.pdf.

37

slide-42
SLIDE 42

References III

  • P. Paatero and U. Tapper. Positive matrix factorization : A non-negative factor model with optimal

utilization of error estimates of data values. Environmetrics, 5:111–126, 1994.

  • W. H. Richardson. Bayesian-based iterative method of image restoration. Journal of the Optical Society
  • f America, 62:55–59, 1972.
  • P. Smaragdis. About this non-negative business. WASPAA keynote slides, 2013. URL

http://web.engr.illinois.edu/~paris/pubs/smaragdis-waspaa2013keynote.pdf.

  • P. Smaragdis and J. C. Brown. Non-negative matrix factorization for polyphonic music transcription. In
  • Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’03),
  • Oct. 2003.
  • D. L. Sun and R. Mazumder. Non-negative matrix completion for bandwidth extension: a convex
  • ptimization approach. In Proc. IEEE Workshop on Machine Learning and Signal Processing

(MLSP), 2013.

  • Z. Yang and E. Oja. Unified development of multiplicative algorithms for linear and quadratic

nonnegative matrix factorization. IEEE Transactions on Neural Networks, 22:1878 – 1891, Dec.

  • 2011. doi: http://dx.doi.org/10.1109/TNN.2011.2170094.

38