Neural Nonnegative Matrix Factorization for Hierarchical Multilayer - - PowerPoint PPT Presentation

neural nonnegative matrix factorization for hierarchical
SMART_READER_LITE
LIVE PREVIEW

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer - - PowerPoint PPT Presentation

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling Jamie Haddock CAMSAP 2019, December 16, 2019 Computational and Applied Mathematics UCLA joint with Mengdi Gao, Denali Molitor, Deanna Needell, Eli Sadovnik,


slide-1
SLIDE 1

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

Jamie Haddock CAMSAP 2019, December 16, 2019

Computational and Applied Mathematics UCLA

joint with Mengdi Gao, Denali Molitor, Deanna Needell, Eli Sadovnik, Tyler Will, Runyu Zhang

1

slide-2
SLIDE 2

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: Problem Challenges:

2

slide-3
SLIDE 3

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix Problem Challenges:

2

slide-4
SLIDE 4

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix A ∈ RN×k

≥0 : features matrix

Problem Challenges:

2

slide-5
SLIDE 5

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix A ∈ RN×k

≥0 : features matrix

S ∈ Rk×M

≥0

: coefficients matrix Problem Challenges:

2

slide-6
SLIDE 6

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix A ∈ RN×k

≥0 : features matrix

S ∈ Rk×M

≥0

: coefficients matrix k: user chosen parameter Problem Challenges:

2

slide-7
SLIDE 7

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix A ∈ RN×k

≥0 : features matrix

S ∈ Rk×M

≥0

: coefficients matrix k: user chosen parameter Problem Challenges: ⊲ nonconvex in A and S, NP-hard [Vavasis ’08]

2

slide-8
SLIDE 8

Nonnegative Matrix Factorization (NMF)

≈ X N M A N k S k M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix A ∈ RN×k

≥0 : features matrix

S ∈ Rk×M

≥0

: coefficients matrix k: user chosen parameter Problem Challenges: ⊲ nonconvex in A and S, NP-hard [Vavasis ’08] ⊲ interpretability of factors dependent upon k

2

slide-9
SLIDE 9

Nonnegative Matrix Factorization (NMF)

≈ X N: words M: documents A N: words k: topics S k: topics M: documents min

A∈RN×k

≥0 ,S∈Rk×M ≥0

X − AS2

F

Problem Setup: X ∈ RN×M

≥0

: data matrix A ∈ RN×k

≥0 : features matrix

S ∈ Rk×M

≥0

: coefficients matrix k: user chosen parameter Problem Challenges: ⊲ nonconvex in A and S, NP-hard [Vavasis ’08] ⊲ interpretability of factors dependent upon k

2

slide-10
SLIDE 10

NMF

Applications: Methods:

3

slide-11
SLIDE 11

NMF

Applications: ⊲ low-rank approximation Methods:

3

slide-12
SLIDE 12

NMF

Applications: ⊲ low-rank approximation ⊲ clustering Methods:

3

slide-13
SLIDE 13

NMF

Applications: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling Methods:

3

slide-14
SLIDE 14

NMF

Applications: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling ⊲ feature extraction Methods:

3

slide-15
SLIDE 15

NMF

Applications: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling ⊲ feature extraction Methods: ⊲ multiplicative updates

3

slide-16
SLIDE 16

NMF

Applications: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling ⊲ feature extraction Methods: ⊲ multiplicative updates ⊲ alternating nonnegative least squares

3

slide-17
SLIDE 17

NMF

Applications: ⊲ low-rank approximation ⊲ clustering ⊲ topic modeling ⊲ feature extraction Methods: ⊲ multiplicative updates ⊲ alternating nonnegative least squares ⊲ many others

3

slide-18
SLIDE 18

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Problem Advantages:

4

slide-19
SLIDE 19

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix Problem Advantages:

4

slide-20
SLIDE 20

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix P: number of classes Problem Advantages:

4

slide-21
SLIDE 21

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix P: number of classes W ∈ {0, 1}N×M

≥0

: data indicator Problem Advantages:

4

slide-22
SLIDE 22

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix P: number of classes W ∈ {0, 1}N×M

≥0

: data indicator L ∈ {0, 1}P×M

≥0

: label indicator Problem Advantages:

4

slide-23
SLIDE 23

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix P: number of classes W ∈ {0, 1}N×M

≥0

: data indicator L ∈ {0, 1}P×M

≥0

: label indicator λ: user defined hyperparameter Problem Advantages:

4

slide-24
SLIDE 24

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix P: number of classes W ∈ {0, 1}N×M

≥0

: data indicator L ∈ {0, 1}P×M

≥0

: label indicator λ: user defined hyperparameter Problem Advantages: ⊲ use of label information

4

slide-25
SLIDE 25

(Semi)supervised NMF

Goal: Incorporate known label information into problem. Y P: classes M min

A∈RN×k

≥0 ,S∈Rk×M ≥0

,B∈RP×k

≥0

W ⊙ (X − AS)2

F+λL ⊙ (Y − BS)2 F

Problem Setup: Y ∈ {0, 1}P×M

≥0

: label matrix P: number of classes W ∈ {0, 1}N×M

≥0

: data indicator L ∈ {0, 1}P×M

≥0

: label indicator λ: user defined hyperparameter Problem Advantages: ⊲ use of label information ⊲ can extend multiplicative updates method to SSNMF

4

slide-26
SLIDE 26

Hierarchical NMF

Goal: Discover hierarchical topic structure within X. Problem Setup: Problem Challenges:

5

slide-27
SLIDE 27

Hierarchical NMF

Goal: Discover hierarchical topic structure within X. X A(0) S(0) A(0)

A(1) S(1)

N M N k(0) k(0) M N k(0)

k(0) k(1) k(1)

M ≈ ≈ X ≈ A(0)S(0) X ≈ A(0)A(1)S(1) . . . X ≈ A(0)A(1) . . . A(L)S(L) Problem Setup: Problem Challenges:

5

slide-28
SLIDE 28

Hierarchical NMF

Goal: Discover hierarchical topic structure within X. X A(0) S(0) A(0)

A(1) S(1)

N M N k(0) k(0) M N k(0)

k(0) k(1) k(1)

M ≈ ≈ X ≈ A(0)S(0) X ≈ A(0)A(1)S(1) . . . X ≈ A(0)A(1) . . . A(L)S(L) Problem Setup: ⊲ k(0), k(1), . . . , k(L): user defined parameters Problem Challenges:

5

slide-29
SLIDE 29

Hierarchical NMF

Goal: Discover hierarchical topic structure within X. X A(0) S(0) A(0)

A(1) S(1)

N M N k(0) k(0) M N k(0)

k(0) k(1) k(1)

M ≈ ≈ X ≈ A(0)S(0) X ≈ A(0)A(1)S(1) . . . X ≈ A(0)A(1) . . . A(L)S(L) Problem Setup: ⊲ k(0), k(1), . . . , k(L): user defined parameters ⊲ k(ℓ): supertopics collecting k(ℓ−1) subtopics Problem Challenges:

5

slide-30
SLIDE 30

Hierarchical NMF

Goal: Discover hierarchical topic structure within X. X A(0) S(0) A(0)

A(1) S(1)

N M N k(0) k(0) M N k(0)

k(0) k(1) k(1)

M ≈ ≈ X ≈ A(0)S(0) X ≈ A(0)A(1)S(1) . . . X ≈ A(0)A(1) . . . A(L)S(L) Problem Setup: ⊲ k(0), k(1), . . . , k(L): user defined parameters ⊲ k(ℓ): supertopics collecting k(ℓ−1) subtopics Problem Challenges: ⊲ {k(i)} must be chosen

5

slide-31
SLIDE 31

Hierarchical NMF

Goal: Discover hierarchical topic structure within X. X A(0) S(0) A(0)

A(1) S(1)

N M N k(0) k(0) M N k(0)

k(0) k(1) k(1)

M ≈ ≈ X ≈ A(0)S(0) X ≈ A(0)A(1)S(1) . . . X ≈ A(0)A(1) . . . A(L)S(L) Problem Setup: ⊲ k(0), k(1), . . . , k(L): user defined parameters ⊲ k(ℓ): supertopics collecting k(ℓ−1) subtopics Problem Challenges: ⊲ {k(i)} must be chosen ⊲ error propagates through layers

5

slide-32
SLIDE 32

Hierarchical NMF

6

slide-33
SLIDE 33

Deep NMF

Goal: Exploit similarities between neural networks and hierarchical NMF.

7

slide-34
SLIDE 34

Deep NMF

Goal: Exploit similarities between neural networks and hierarchical NMF. ⊲ [Flenner, Hunter ’18]

  • introduces nonlinear pooling operator after each layer
  • introduces multiplicative updates method meant to backpropagate

7

slide-35
SLIDE 35

Deep NMF

Goal: Exploit similarities between neural networks and hierarchical NMF. ⊲ [Flenner, Hunter ’18]

  • introduces nonlinear pooling operator after each layer
  • introduces multiplicative updates method meant to backpropagate

⊲ [Trigeorgis, Bousmalis, Zafeiriou, Schuller ’16]

  • relaxes some of nonnegativity constraints in hNMF

7

slide-36
SLIDE 36

Deep NMF

Goal: Exploit similarities between neural networks and hierarchical NMF. ⊲ [Flenner, Hunter ’18]

  • introduces nonlinear pooling operator after each layer
  • introduces multiplicative updates method meant to backpropagate

⊲ [Trigeorgis, Bousmalis, Zafeiriou, Schuller ’16]

  • relaxes some of nonnegativity constraints in hNMF

⊲ [Le Roux, Hershey, Weninger ’15]

  • introduces NMF backpropagation algorithm with “unfolding” (no

hierarchy)

7

slide-37
SLIDE 37

Deep NMF

Goal: Exploit similarities between neural networks and hierarchical NMF. ⊲ [Flenner, Hunter ’18]

  • introduces nonlinear pooling operator after each layer
  • introduces multiplicative updates method meant to backpropagate

⊲ [Trigeorgis, Bousmalis, Zafeiriou, Schuller ’16]

  • relaxes some of nonnegativity constraints in hNMF

⊲ [Le Roux, Hershey, Weninger ’15]

  • introduces NMF backpropagation algorithm with “unfolding” (no

hierarchy)

⊲ [Sun, Nasrabadi, Tran ’17]

  • similar method lacking nonnegativity constraints

7

slide-38
SLIDE 38

Our method: Neural NMF

Goal: Develop true backpropagation algorithm for hNMF model.

8

slide-39
SLIDE 39

Our method: Neural NMF

Goal: Develop true backpropagation algorithm for hNMF model. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices.

8

slide-40
SLIDE 40

Our method: Neural NMF

Goal: Develop true backpropagation algorithm for hNMF model. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices. ⊲ Define q(X, A) := argminS≥0X − AS2

F. 8

slide-41
SLIDE 41

Our method: Neural NMF

Goal: Develop true backpropagation algorithm for hNMF model. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices. ⊲ Define q(X, A) := argminS≥0X − AS2

F.

O U T P U T S Classification Layers S(1) = q(S(0), A(1))

X

S(0) S(0) = q(X, A(0)) S(1) S(ℒ)

8

slide-42
SLIDE 42

Our method: Neural NMF

Goal: Develop true backpropagation algorithm for hNMF model. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices. ⊲ Define q(X, A) := argminS≥0X − AS2

F.

O U T P U T S Classification Layers S(1) = q(S(0), A(1))

X

S(0) S(0) = q(X, A(0)) S(1) S(ℒ)

⊲ Pin the values of S to those of A by recursively setting S(ℓ) := q(S(ℓ−1), A(ℓ)).

8

slide-43
SLIDE 43

Our method: Neural NMF

Goal: Develop true backpropagation algorithm for hNMF model. ⊲ Regard the A matrices as independent variables, determine the S matrices from the A matrices. ⊲ Define q(X, A) := argminS≥0X − AS2

F.

O U T P U T S Classification Layers S(1) = q(S(0), A(1))

X

S(0) S(0) = q(X, A(0)) S(1) S(ℒ)

⊲ Pin the values of S to those of A by recursively setting S(ℓ) := q(S(ℓ−1), A(ℓ)). ⊲ Can we compute derivatives and backpropagate?

8

slide-44
SLIDE 44

Neural NMF Backpropagation

9

slide-45
SLIDE 45

Neural NMF Backpropagation

⊲ Differentiate q function and apply chain rule.

9

slide-46
SLIDE 46

Neural NMF Backpropagation

⊲ Differentiate q function and apply chain rule. ⊲ Flexible to cost function (e.g., supervision).

9

slide-47
SLIDE 47

Neural NMF Backpropagation

⊲ Differentiate q function and apply chain rule. ⊲ Flexible to cost function (e.g., supervision). ⊲ Backpropagate and update all A matrices simultaneously via GD or SGD.

9

slide-48
SLIDE 48

Neural NMF

Method 1 Neural NMF Require: data matrix X ∈ RN×M, number of layers L, step size γ, cost function C, initial matrices A(i) for i = 0, ..., L procedure ForwardPropagation(A(0), . . . , A(L)) for i := 0...L do S(i) ← q(A(i), S(i−1)) ForwardPropagation(A(0), . . . , A(L)) while not converged do for i := 0...L do A(i) ← A(i) − γ ∗

∂C ∂A(i)

⊲ Gradient descent A(i) ← A(i)

+

⊲ Project onto positive orthant ForwardPropagation(A(0), . . . , A(L))

10

slide-49
SLIDE 49

Experimental Results

⊲ unsupervised reconstruction with two-layer structure (k(0) = 9, k(1) = 4)

11

slide-50
SLIDE 50

Experimental Results

⊲ semisupervised reconstruction (40% labels) with three-layer structure (k(0) = 9, k(1) = 4, k(2) = 2)

12

slide-51
SLIDE 51

Experimental Results

Note that despite reconstruction error increasing as layers increase (since the final rank decreases), the topic structure can be resolved from the intermediate factorizations. ⊲ unsupervised reconstruction with two-layer structure (k(0) = 9, k(1) = 4)

13

slide-52
SLIDE 52

Experimental Results

Note that despite reconstruction error increasing as layers increase (since the final rank decreases), the topic structure can be resolved from the intermediate factorizations. ⊲ semisupervised reconstruction (40% labels) with three-layer structure (k(0) = 9, k(1) = 4, k(2) = 2)

14

slide-53
SLIDE 53

Experimental Results

Table 1: Reconstruction error / classification accuracy

Layers

  • Hier. NMF

Deep NMF Neural NMF Unsuper. 1 0.053 0.031 0.029 2 0.399 0.414 0.310 3 0.860 0.838 0.492 Semisuper. 1 0.049 / 0.933 0.031 / 0.947 0.042 / 1 2 0.374 / 0.926 0.394 / 0.911 0.305 / 1 3 0.676 / 0.930 0.733 / 0.930 0.496 / 0.990 Supervised 1 0.052 / 0.960 0.042 / 0.962 0.042 / 1 2 0.311 / 0.984 0.310 / 0.984 0.307 / 1 3 0.495 / 1 0.494 / 1 0.498 / 1 15

slide-54
SLIDE 54

Conclusions and Future Work

16

slide-55
SLIDE 55

Conclusions and Future Work

⊲ presented a novel method for multilayer NMF that incorporates the backpropagation technique from deep learning to minimize error accumulation

16

slide-56
SLIDE 56

Conclusions and Future Work

⊲ presented a novel method for multilayer NMF that incorporates the backpropagation technique from deep learning to minimize error accumulation ⊲ exhibited preliminary tests on toy datasets showing the proposed method outperforms existing multilayer NMF algorithms.

16

slide-57
SLIDE 57

Conclusions and Future Work

⊲ presented a novel method for multilayer NMF that incorporates the backpropagation technique from deep learning to minimize error accumulation ⊲ exhibited preliminary tests on toy datasets showing the proposed method outperforms existing multilayer NMF algorithms. ⊲ compare our method and others on various datasets to find precise regimes in which we offer improvement

16

slide-58
SLIDE 58

Conclusions and Future Work

⊲ presented a novel method for multilayer NMF that incorporates the backpropagation technique from deep learning to minimize error accumulation ⊲ exhibited preliminary tests on toy datasets showing the proposed method outperforms existing multilayer NMF algorithms. ⊲ compare our method and others on various datasets to find precise regimes in which we offer improvement ⊲ extend to method for hierarchical nonnegative tensor factorization

16

slide-59
SLIDE 59

Thanks for listening!

Questions?

[1] Jennifer Flenner and Blake Hunter. A deep non-negative matrix factorization neural network, 2018. Unpublished. [2] Jonathan Le Roux, John R Hershey, and Felix Weninger. Deep nmf for speech

  • separation. In Int. Conf. Acoust. Spee., pages 66–70. IEEE, 2015.

[3] Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999. [4] H. Lee, J. Yoo, and S. Choi. Semi-supervised nonnegative matrix factorization. IEEE Signal Proc. Let., 17(1):4–7, Jan 2010. [5] Xiaoxia Sun, Nasser M. Nasrabadi, and Trac D. Tran. Supervised multilayer sparse coding networks for image classification. CoRR, abs/1701.08349, 2017. [6] George Trigeorgis, Konstantinos Bousmalis, Stefanos Zafeiriou, and Bj¨

  • rn W
  • Schuller. A deep matrix factorization method for learning attribute
  • representations. IEEE T. Pattern Anal., 39(3):417–429, 2016.

17