Tensor Factorization via Matrix Factorization Volodymyr Kuleshov - - PowerPoint PPT Presentation

tensor factorization via matrix factorization
SMART_READER_LITE
LIVE PREVIEW

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov - - PowerPoint PPT Presentation

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 1 / 28 Introduction:


slide-1
SLIDE 1

Tensor Factorization via Matrix Factorization

Volodymyr Kuleshovú Arun Tejasvi Chagantyú Percy Liang

Stanford University

May 11, 2015

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 1 / 28

slide-2
SLIDE 2

Introduction: tensor factorization

An application: community detection

a b c d

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

slide-3
SLIDE 3

Introduction: tensor factorization

An application: community detection

a b c d

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

slide-4
SLIDE 4

Introduction: tensor factorization

An application: community detection

a b c d

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

slide-5
SLIDE 5

Introduction: tensor factorization

An application: community detection

a b c d ? ? ? ?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

slide-6
SLIDE 6

Introduction: tensor factorization

An application: community detection

Anandkumar, Ge, Hsu, and

  • S. Kakade 2013

a b c d = + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28

slide-7
SLIDE 7

Introduction: tensor factorization

Applications of tensor factorization

I Community detection

I Anandkumar, Ge, Hsu, and S. Kakade 2013

I Parsing

I Cohen, Satta, and Collins 2013

I Knowledge base completion

I Chang et al. 2014 I Singh, Rockt¨

aschel, and Riedel 2015

I Topic modelling

I Anandkumar, Foster, et al. 2012

I Crowdsourcing

I Zhang et al. 2014

I Mixture models

I Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013

I Bottlenecked models

I Chaganty and Liang 2014

I . . .

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 3 / 28

slide-8
SLIDE 8

Introduction: tensor factorization

What is tensor (CP) factorization?

I Tensor analogue of matrix eigen-decomposition.

M =

k

ÿ

i=1

fiiui ¢ ui . = + + · · · + k

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

slide-9
SLIDE 9

Introduction: tensor factorization

What is tensor (CP) factorization?

I Tensor analogue of matrix eigen-decomposition.

T =

k

ÿ

i=1

fiiui ¢ ui¢ui . = + + · · · + k

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

slide-10
SLIDE 10

Introduction: tensor factorization

What is tensor (CP) factorization?

I Tensor analogue of matrix eigen-decomposition.

T =

k

ÿ

i=1

fiiui ¢ ui¢ui+‘R.

I Goal: Given T with noise, ‘R, recover factors ui.

= + + · · · + + k

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

slide-11
SLIDE 11

Introduction: tensor factorization

What is tensor (CP) factorization?

I Tensor analogue of matrix eigen-decomposition.

T =

k

ÿ

i=1

fiiui ¢ ui¢ui+‘R.

I Goal: Given T with noise, ‘R, recover factors ui.

= + + · · · + +

O r t h

  • g
  • n

a l N

  • n
  • r

t h

  • g
  • n

a l

= + + · · · + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28

slide-12
SLIDE 12

Introduction: tensor factorization

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.

2013)

I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28

slide-13
SLIDE 13

Introduction: tensor factorization

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.

2013)

I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors.

I Alternating least squares (Comon, Luciani, and Almeida 2009;

Anandkumar, Ge, and Janzamin 2014)

I Sensitive to initialization. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28

slide-14
SLIDE 14

Introduction: tensor factorization

Existing tensor factorization algorithms

I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.

2013)

I Analog of matrix power method. I Sensitive to noise. I Restricted to orthogonal tensors.

I Alternating least squares (Comon, Luciani, and Almeida 2009;

Anandkumar, Ge, and Janzamin 2014)

I Sensitive to initialization.

Our approach: reduce to existing fast and robust matrix algorithms.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28

slide-15
SLIDE 15

Orthogonal Tensor factorization

Outline

Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 6 / 28

slide-16
SLIDE 16

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = fi1u¢3

1

+ fi2u¢3

2

+ fi3u¢3

3

+ ‘R

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 7 / 28

slide-17
SLIDE 17

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = u¢3

1

+ u¢3

1

+ u¢3

1

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28

slide-18
SLIDE 18

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = u¢3

1

+ u¢3

1

+ u¢3

1

¿ T(I, I, w) = (w€u1)u¢2

1

+ (w€u2)u¢2

2

+ (w€u3)u¢2

3

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28

slide-19
SLIDE 19

Orthogonal Tensor factorization

Tensor factorization via single matrix factorization

T = u¢3

1

+ u¢3

1

+ u¢3

1

¿ T(I, I, w) = (w€u1)

¸ ˚˙ ˝

⁄1

u¢2

1

+ (w€u2)

¸ ˚˙ ˝

⁄2

u¢2

2

+ (w€u3)

¸ ˚˙ ˝

⁄3

u¢2

3 I Proposal: Eigen-decomposition on the projected matrix. I Return: recovered eigenvectors, ui.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28

slide-20
SLIDE 20

Orthogonal Tensor factorization

Sensitivity of single matrix projection

I Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors à 1 min(difference in eigenvalues).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28

slide-21
SLIDE 21

Orthogonal Tensor factorization

Sensitivity of single matrix projection

I Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors à 1 min(difference in eigenvalues).

I Intuition: If two eigenvalues are equal, corresponding eigenvectors

are arbitrary. = + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28

slide-22
SLIDE 22

Orthogonal Tensor factorization

Sensitivity of single matrix projection

I Problem: Eigendecomposition is very sensitive to the eigengap.

error in factors à 1 min(difference in eigenvalues).

I Intuition: If two eigenvalues are equal, corresponding eigenvectors

are arbitrary. = + +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28

slide-23
SLIDE 23

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-24
SLIDE 24

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-25
SLIDE 25

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-26
SLIDE 26

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-27
SLIDE 27

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-28
SLIDE 28

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-29
SLIDE 29

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-30
SLIDE 30

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

I Simultaneous matrix

factorization: error in factors à 1 min avg. diff. in eigenvalues.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-31
SLIDE 31

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

I Simultaneous matrix

factorization: error in factors à 1 min avg. diff. in eigenvalues. Every coordinate pair needs

  • ne good projection (with a

large eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-32
SLIDE 32

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

I Simultaneous matrix

factorization: error in factors à 1 min avg. diff. in eigenvalues. Every coordinate pair needs

  • ne good projection (with a

large eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-33
SLIDE 33

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

I Simultaneous matrix

factorization: error in factors à 1 min avg. diff. in eigenvalues. Every coordinate pair needs

  • ne good projection (with a

large eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-34
SLIDE 34

Orthogonal Tensor factorization

Sensitivity analysis (contd.)

(Cardoso 1994)

I Single matrix factorization:

error in factors à 1 min diff. in eigenvalues.

I Simultaneous matrix

factorization: error in factors à 1 min avg. diff. in eigenvalues. Every coordinate pair needs

  • ne good projection (with a

large eigengap).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28

slide-35
SLIDE 35

Orthogonal Tensor factorization

Reduction to simultaneous diagonalization

T(I, I, w1)

¸ ˚˙ ˝

M1

= (w€

1 u1)

¸ ˚˙ ˝

⁄11

u1u€

1

+ (w€

1 u2)

¸ ˚˙ ˝

⁄21

u2u€

2

+ (w€

1 u3)

¸ ˚˙ ˝

⁄31

u3u€

3

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28

slide-36
SLIDE 36

Orthogonal Tensor factorization

Reduction to simultaneous diagonalization

T(I, I, w1)

¸ ˚˙ ˝

M1

= (w€

1 u1)

¸ ˚˙ ˝

⁄11

u1u€

1

+ (w€

1 u2)

¸ ˚˙ ˝

⁄21

u2u€

2

+ (w€

1 u3)

¸ ˚˙ ˝

⁄31

u3u€

3

. . . . . . . . . . . . T(I, I, w¸)

¸ ˚˙ ˝

= (w€

¸ u1)

¸ ˚˙ ˝

⁄1¸

u1u€

1

+ (w€

¸ u2)

¸ ˚˙ ˝

⁄2¸

u2u€

2

+ (w€

¸ u3)

¸ ˚˙ ˝

⁄3¸

u3u€

3

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28

slide-37
SLIDE 37

Orthogonal Tensor factorization

Reduction to simultaneous diagonalization

T(I, I, w1)

¸ ˚˙ ˝

M1

= (w€

1 u1)

¸ ˚˙ ˝

⁄11

u1u€

1

+ (w€

1 u2)

¸ ˚˙ ˝

⁄21

u2u€

2

+ (w€

1 u3)

¸ ˚˙ ˝

⁄31

u3u€

3

. . . . . . . . . . . . T(I, I, w¸)

¸ ˚˙ ˝

= (w€

¸ u1)

¸ ˚˙ ˝

⁄1¸

u1u€

1

+ (w€

¸ u2)

¸ ˚˙ ˝

⁄2¸

u2u€

2

+ (w€

¸ u3)

¸ ˚˙ ˝

⁄3¸

u3u€

3 I Projections share factors: can be simultaneously diagonalized.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28

slide-38
SLIDE 38

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U:UU€=I L

ÿ

¸=1

  • ff(U€M¸U)
  • ff(A) =

ÿ

i”=j

A2

ij.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

slide-39
SLIDE 39

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U:UU€=I L

ÿ

¸=1

  • ff(U€M¸U)
  • ff(A) =

ÿ

i”=j

A2

ij.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

slide-40
SLIDE 40

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U:UU€=I L

ÿ

¸=1

  • ff(U€M¸U)
  • ff(A) =

ÿ

i”=j

A2

ij.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

slide-41
SLIDE 41

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U:UU€=I L

ÿ

¸=1

  • ff(U€M¸U)
  • ff(A) =

ÿ

i”=j

A2

ij. I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

slide-42
SLIDE 42

Orthogonal Tensor factorization

Simultaneous diagonalization algorithm

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U:UU€=I L

ÿ

¸=1

  • ff(U€M¸U)
  • ff(A) =

ÿ

i”=j

A2

ij. I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).

I Used widly in the ICA community. I Empirically appears to be generically global convergence. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28

slide-43
SLIDE 43

Orthogonal Tensor factorization Projections

Outline

Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 13 / 28

slide-44
SLIDE 44

Orthogonal Tensor factorization Projections

Oracle and random projections

I Hypothetically: “oracle”

projections along the factors is good.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 14 / 28

slide-45
SLIDE 45

Orthogonal Tensor factorization Projections

Oracle and random projections

I Hypothetically: “oracle”

projections along the factors is good.

I Practically: use projections

along random directions.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 14 / 28

slide-46
SLIDE 46

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

T =

k

ÿ

i=1

fiiu¢3

i

+ ‘R.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

slide-47
SLIDE 47

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

T =

k

ÿ

i=1

fiiu¢3

i

+ ‘R.

Theorem (Random projections)

Pick L = Ω(k log k) projections randomly from the unit sphere. Then, with high probability,

error in factors Æ O Q c c c a 

ÎfiÎ1fimax fi2

min

+

Û

d L

R d d d b ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

slide-48
SLIDE 48

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

T =

k

ÿ

i=1

fiiu¢3

i

+ ‘R.

Theorem (Random projections)

Pick L = Ω(k log k) projections randomly from the unit sphere. Then, with high probability,

error in factors Æ O Q c c c a 

ÎfiÎ1fimax fi2

min

¸ ˚˙ ˝

  • racle error

+

Û

d L

R d d d b ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

slide-49
SLIDE 49

Orthogonal Tensor factorization Projections

Results: Orthogonal tensor decomposition

T =

k

ÿ

i=1

fiiu¢3

i

+ ‘R.

Theorem (Random projections)

Pick L = Ω(k log k) projections randomly from the unit sphere. Then, with high probability,

error in factors Æ O Q c c c a 

ÎfiÎ1fimax fi2

min

¸ ˚˙ ˝

  • racle error

+

Û

d L

¸˚˙˝

  • conc. term

R d d d b ‘

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28

slide-50
SLIDE 50

Orthogonal Tensor factorization Projections

Empirical: Random vs. oracle projections

10 20 30 40 50 60 1umEer of projections 0.000 0.002 0.004 0.006 0.008 0.010 Error

2rthogonaO case

Figure: Comparing random vs. oracle projections (d = k = 10, ‘ = 0.05)

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 16 / 28

slide-51
SLIDE 51

Non-orthogonal tensor factorization

Outline

Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 17 / 28

slide-52
SLIDE 52

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

T(I, I, w1)

¸ ˚˙ ˝

M1

= (w€

1 u1)

¸ ˚˙ ˝

⁄11

u1u€

1

+ (w€

1 u2)

¸ ˚˙ ˝

⁄21

u2u€

2

+ (w€

1 u3)

¸ ˚˙ ˝

⁄31

u3u€

3 I No unique non-orthogonal factorization for a single matrix.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 18 / 28

slide-53
SLIDE 53

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

T(I, I, w1)

¸ ˚˙ ˝

M1

= (w€

1 u1)

¸ ˚˙ ˝

⁄11

u1u€

1

+ (w€

1 u2)

¸ ˚˙ ˝

⁄21

u2u€

2

+ (w€

1 u3)

¸ ˚˙ ˝

⁄31

u3u€

3

. . . . . . . . . . . . T(I, I, w¸)

¸ ˚˙ ˝

= (w€

¸ u1)

¸ ˚˙ ˝

⁄1¸

u1u€

1

+ (w€

¸ u2)

¸ ˚˙ ˝

⁄2¸

u2u€

2

+ (w€

¸ u3)

¸ ˚˙ ˝

⁄3¸

u3u€

3 I No unique non-orthogonal factorization for a single matrix. I Ø 2 matrices have a unique non-orthogonal factorization.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 18 / 28

slide-54
SLIDE 54

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U L

ÿ

¸=1

  • ff(U≠1M¸U≠€)
  • ff(A) =

ÿ

i”=j

A2

ij.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

slide-55
SLIDE 55

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U L

ÿ

¸=1

  • ff(U≠1M¸U≠€)
  • ff(A) =

ÿ

i”=j

A2

ij. I U are not constrained to be orthogonal.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

slide-56
SLIDE 56

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U L

ÿ

¸=1

  • ff(U≠1M¸U≠€)
  • ff(A) =

ÿ

i”=j

A2

ij. I U are not constrained to be orthogonal. I Optimize using the QR1JD algorithm (Afsari 2006).

I Only guaranteed to have local convergence. I More stable than ALS in practice. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

slide-57
SLIDE 57

Non-orthogonal tensor factorization

Non-orthogonal simultaneous diagonalization

I Algorithm: Simultaneously diagonalize projected matrices.

U = arg min

U L

ÿ

¸=1

  • ff(U≠1M¸U≠€)
  • ff(A) =

ÿ

i”=j

A2

ij. I U are not constrained to be orthogonal. I Optimize using the QR1JD algorithm (Afsari 2006).

I Only guaranteed to have local convergence. I More stable than ALS in practice.

I Sensitivity analysis due to Afsari 2008

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28

slide-58
SLIDE 58

Non-orthogonal tensor factorization

Results: Non-orthogonal tensor decomposition

Theorem (Random projections)

Pick L = Ω(k log k) projections randomly from the unit sphere. Then, with high probability, error in factors Æ O

Q c c c c c a

ÎU≠€Î2

2

1 ≠ µ2

ÎfiÎ1fimax fi2

min

Q a1 + Û

d L

R b R d d d d d b

‘ where U = [u1| . . . |uk], µ = max u€

i uj and ÎU≠€Î2

2

1≠µ2

measures non-orthogonality.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28

slide-59
SLIDE 59

Non-orthogonal tensor factorization

Results: Non-orthogonal tensor decomposition

Theorem (Random projections)

Pick L = Ω(k log k) projections randomly from the unit sphere. Then, with high probability, error in factors Æ O

Q c c c c c a

ÎU≠€Î2

2

1 ≠ µ2

ÎfiÎ1fimax fi2

min

Q a1 + Û

d L

R b ¸ ˚˙ ˝

  • rtho. cost

R d d d d d b

‘ where U = [u1| . . . |uk], µ = max u€

i uj and ÎU≠€Î2

2

1≠µ2

measures non-orthogonality.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28

slide-60
SLIDE 60

Non-orthogonal tensor factorization

Results: Non-orthogonal tensor decomposition

Theorem (Random projections)

Pick L = Ω(k log k) projections randomly from the unit sphere. Then, with high probability, error in factors Æ O

Q c c c c c a

ÎU≠€Î2

2

1 ≠ µ2

¸ ˚˙ ˝

non≠ortho. cost

ÎfiÎ1fimax fi2

min

Q a1 + Û

d L

R b ¸ ˚˙ ˝

  • rtho. cost

R d d d d d b

‘ where U = [u1| . . . |uk], µ = max u€

i uj and ÎU≠€Î2

2

1≠µ2

measures non-orthogonality.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28

slide-61
SLIDE 61

Related work

Outline

Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 21 / 28

slide-62
SLIDE 62

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensors

using a whitening transformation (Anandkumar, Ge, Hsu,

  • S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009). Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28

slide-63
SLIDE 63

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensors

using a whitening transformation (Anandkumar, Ge, Hsu,

  • S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009).

I Simultaneous diagonalization for tensors proposed by Lathauwer

2006.

I Relies on computing the SVD of a d4 ◊ k2 matrix. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28

slide-64
SLIDE 64

Related work

Notes and Related work

I Orthogonal tensor methods can factorize non-orthogonal tensors

using a whitening transformation (Anandkumar, Ge, Hsu,

  • S. M. Kakade, et al. 2013).

I Is a major source of errors itself (Souloumiac 2009).

I Simultaneous diagonalization for tensors proposed by Lathauwer

2006.

I Relies on computing the SVD of a d4 ◊ k2 matrix.

I Simultaneous diagonalizations for multiple projections mentioned

in Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.

I No analysis presented. Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28

slide-65
SLIDE 65

Empirical results

Outline

Introduction: tensor factorization Orthogonal Tensor factorization Projections Non-orthogonal tensor factorization Related work Empirical results Conclusions

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 23 / 28

slide-66
SLIDE 66

Empirical results

Community detection

Anandkumar, Ge, Hsu, and

  • S. Kakade 2013

a b c d = + + · · · +

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 24 / 28

slide-67
SLIDE 67

Empirical results

Community detection

0.02 0.04 0.06 0.08 0.05 0.1 0.15 Recall Error

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28

slide-68
SLIDE 68

Empirical results

Community detection

0.02 0.04 0.06 0.08 0.05 0.1 0.15 Recall Error TPM

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28

slide-69
SLIDE 69

Empirical results

Community detection

0.02 0.04 0.06 0.08 0.05 0.1 0.15 Recall Error TPM OJD

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28

slide-70
SLIDE 70

Empirical results

Crowdsourcing

Zhang et al. 2014 a b c Y N N Y N

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 26 / 28

slide-71
SLIDE 71

Empirical results

Crowdsourcing

Zhang et al. 2014 web rte birds dogs 80 85 90 95 100 Accuracy TPM ALS

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

slide-72
SLIDE 72

Empirical results

Crowdsourcing

Zhang et al. 2014 web rte birds dogs 80 85 90 95 100 Accuracy TPM ALS OJD

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

slide-73
SLIDE 73

Empirical results

Crowdsourcing

Zhang et al. 2014 web rte birds dogs 80 85 90 95 100 Accuracy TPM ALS OJD NOJD

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

slide-74
SLIDE 74

Empirical results

Crowdsourcing

Zhang et al. 2014 web rte birds dogs 80 85 90 95 100 Accuracy TPM ALS OJD NOJD MV+EM

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28

slide-75
SLIDE 75

Conclusions

Conclusions

$

I Reduce tensor problems to matrix ones with random projections.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

slide-76
SLIDE 76

Conclusions

Conclusions

$

I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

slide-77
SLIDE 77

Conclusions

Conclusions

$

I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

slide-78
SLIDE 78

Conclusions

Conclusions

$

I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?

I Github: https://github.com/kuleshov/tensor-factorization I Codalab: https://www.codalab.org/worksheets/

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28

slide-79
SLIDE 79

Conclusions

Conclusions

$

I Reduce tensor problems to matrix ones with random projections. I Empirically, competitive with state of the art with support for

non-orthogonal, asymmetric tensors of arbitrary order.

I Open question: is the Jacobi angles algorithm for orthogonal

simultaneous diagaonalization generically globally convergent?

I Github: https://github.com/kuleshov/tensor-factorization I Codalab: https://www.codalab.org/worksheets/ I Thanks! Questions?

Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28