[PPT] - Spectral Experts for Estimating Mixtures of Linear Regressions Arun PowerPoint Presentation

SLIDE 1

Spectral Experts for Estimating Mixtures of Linear Regressions

Arun Tejasvi Chaganty Percy Liang

Stanford University

January 28, 2016

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22

SLIDE 2

Introduction

Latent Variable Models

◮

Generative Models h x

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

SLIDE 3

Introduction

Latent Variable Models

◮

Generative Models

◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .

h x

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

SLIDE 4

Introduction

Latent Variable Models

◮

Generative Models

◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .

◮

Discriminative Models h x h x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

SLIDE 5

Introduction

Latent Variable Models

◮

Generative Models

◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .

◮

Discriminative Models

◮ Mixture of Experts ◮ Latent CRFs ◮ Discriminative LDA ◮ . . .

h x h x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

SLIDE 6

Introduction

Latent Variable Models

◮

Generative Models

◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .

◮

Discriminative Models

◮ Mixture of Experts ◮ Latent CRFs ◮ Discriminative LDA ◮ . . .

◮ Easy to include features and

tend to be more accurate. h x h x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22

SLIDE 7

Introduction

Parameter Estimation is Hard

θ − log pθ(x)

◮ Log-likelihood function is non-convex.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

SLIDE 8

Introduction

Parameter Estimation is Hard

θ − log pθ(x) θMLE

◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

SLIDE 9

Introduction

Parameter Estimation is Hard

θ − log pθ(x) θMLE θEM θEM

◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but

inconsistent.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

SLIDE 10

Introduction

Parameter Estimation is Hard

θ − log pθ(x) θMLE θEM θEM

◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but

inconsistent.

◮ Can we build an efficient and consistent estimator?

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22

SLIDE 11

Introduction

Related Work

◮ Method of Moments [Pearson, 1894]

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22

SLIDE 12

Introduction

Related Work

◮ Method of Moments [Pearson, 1894] ◮ Observable operators

◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh,

2004]

◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22

SLIDE 13

Introduction

Related Work

◮ Method of Moments [Pearson, 1894] ◮ Observable operators

◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh,

2004]

◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012]

◮ Parameter Estimation

◮ Mixture of Gaussians [Kalai/Moitra/Valiant, 2010] ◮ Mixture models, HMMs [Anandkumar/Hsu/Kakade, 2012] ◮ Latent Dirichlet Allocation [Anandkumar/Hsu/Kakade, 2012] ◮ Stochastic block models [Anandkumar/Ge/Hsu/Kakade, 2012] ◮ Linear Bayesian networks [Anandkumar/Hsu/Javanmard/Kakade, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22

SLIDE 14

Introduction

Outline

Introduction Tensor Factorization for a Generative Model Tensor Factorization for a Discriminative Model Experimental Insights Conclusions

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 5 / 22

SLIDE 15

Tensor Factorization for a Generative Model

Aside: Tensor Operations

◮

Tensor Product x⊗3 = x ⊗ x ⊗ x x⊗3

ijk = xixjxk

= × ×

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22

SLIDE 16

Tensor Factorization for a Generative Model

Aside: Tensor Operations

◮

Tensor Product x⊗3 = x ⊗ x ⊗ x x⊗3

ijk = xixjxk ◮ Inner product

A, B =

ijk

AijkBijk = × ×

,

= 0.5

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22

SLIDE 17

Tensor Factorization for a Generative Model

Aside: Tensor Operations

◮

Tensor Product x⊗3 = x ⊗ x ⊗ x x⊗3

ijk = xixjxk ◮ Inner product

A, B =

ijk

AijkBijk = vec A, vec B = × ×

,

= 0.5

,

= 0.5

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22

SLIDE 18

Tensor Factorization for a Generative Model

Example: Gaussian Mixture Model

anandkumar12moments

◮ Generative process:

h ∼ Mult([π1, π2, · · · , πk]) x ∼ N(βh, σ2). h x

x1 x2

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

SLIDE 19

Tensor Factorization for a Generative Model

Example: Gaussian Mixture Model

anandkumar12moments

◮ Generative process:

h ∼ Mult([π1, π2, · · · , πk]) x ∼ N(βh, σ2).

◮ Moments:

E[x|h] = βh h x

x1 x2

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

SLIDE 20

Tensor Factorization for a Generative Model

Example: Gaussian Mixture Model

anandkumar12moments

◮ Generative process:

h ∼ Mult([π1, π2, · · · , πk]) x ∼ N(βh, σ2).

◮ Moments:

E[x|h] = βh E[x] =

h

πhβh h x

x1 x2

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

SLIDE 21

Tensor Factorization for a Generative Model

Example: Gaussian Mixture Model

anandkumar12moments

◮ Generative process:

h ∼ Mult([π1, π2, · · · , πk]) x ∼ N(βh, σ2).

◮ Moments:

E[x|h] = βh E[x] =

h

πhβh E[x⊗2] =

h

πh(βhβT

h ) + σ2

=

h

πhβh⊗2 + σ2 h x

x1 x2

E[x⊗2] d d

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

SLIDE 22

Tensor Factorization for a Generative Model

Example: Gaussian Mixture Model

anandkumar12moments

◮ Generative process:

h ∼ Mult([π1, π2, · · · , πk]) x ∼ N(βh, σ2).

◮ Moments:

E[x|h] = βh E[x] =

h

πhβh E[x⊗2] =

h

πh(βhβT

h ) + σ2

=

h

πhβh⊗2 + σ2 E[x⊗3] =

h

πhβ⊗3

h

+ bias. h x

x1 x2

E[x⊗2] d d E[x⊗3] d

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22

SLIDE 23

Tensor Factorization for a Generative Model

Solution: Tensor Factorization

◮ E[x⊗3] = k h=1 πhβ⊗3 h .

h x

x1 x2

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

SLIDE 24

Tensor Factorization for a Generative Model

Solution: Tensor Factorization

◮ E[x⊗3] = k h=1 πhβ⊗3 h .

h x

x1 x2

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

SLIDE 25

Tensor Factorization for a Generative Model

Solution: Tensor Factorization

AnandkumarGeHsu2012

◮ E[x⊗3] = k h=1 πhβ⊗3 h . ◮ If βh are orthogonal, they are

eigenvectors! E[x⊗3](βh, βh) = πhβh. h x

x1 x2

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

SLIDE 26

Tensor Factorization for a Generative Model

Solution: Tensor Factorization

AnandkumarGeHsu2012

◮ E[x⊗3] = k h=1 πhβ⊗3 h . ◮ If βh are orthogonal, they are

eigenvectors! E[x⊗3](βh, βh) = πhβh.

◮ In general, whiten E[x⊗3] first.

h x

x1 x2

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22

SLIDE 27

Tensor Factorization for a Generative Model

h x Generative Models h x y Discriminative Models

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22

SLIDE 28

Tensor Factorization for a Generative Model

h x Generative Models h x y Discriminative Models

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22

SLIDE 29

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 30

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

◮ Given x

◮ h ∼ Mult([π1, π2, · · · , πk]).

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 31

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

◮ Given x

◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT

h x + ǫ.

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 32

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

◮ Given x

◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT

h x + ǫ.

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 33

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

◮ Given x

◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT

h x + ǫ.

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 34

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

◮ Given x

◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT

h x + ǫ.

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 35

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

h x y

◮ Given x

◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT

h x + ǫ.

x y

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22

SLIDE 36

Tensor Factorization for a Discriminative Model

Mixture of Linear Regressions

x y

     

π1 π2 . . . πk

            

β1 β2 . . . βk

      

B

?

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 11 / 22

SLIDE 37

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y = βh , x + ǫ

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22

SLIDE 38

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y = βh

random

, x + ǫ

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22

SLIDE 39

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y = βh

random

, x + ǫ = E[βh], x + (βh − E[βh]), x + ǫ E[βh] =

h πhβh.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22

SLIDE 40

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y = βh

random

, x + ǫ = E[βh], x

linear measurement

+ (βh − E[βh]), x + ǫ E[βh] =

h πhβh.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22

SLIDE 41

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y = βh

random

, x + ǫ = E[βh], x

linear measurement

+ (βh − E[βh]), x + ǫ

noise

E[βh] =

h πhβh.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22

SLIDE 42

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y =

linear measurement

E[βh], x

+

noise

(βh − E[βh])Tx + ǫ
,
Chaganty, Liang (Stanford University)

Spectral Experts January 28, 2016 13 / 22

SLIDE 43

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y =

linear measurement

E[βh], x

+

noise

(βh − E[βh])Tx + ǫ

y2 = (βh, x + ǫ)2

,
Chaganty, Liang (Stanford University)

Spectral Experts January 28, 2016 13 / 22

SLIDE 44

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y =

linear measurement

E[βh], x

+

noise

(βh − E[βh])Tx + ǫ

y2 = (βh, x + ǫ)2 = E[β⊗2

h ], x⊗2

+ bias2 + noise2

,
,
Chaganty, Liang (Stanford University)

Spectral Experts January 28, 2016 13 / 22

SLIDE 45

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y =

linear measurement

E[βh], x

+

noise

(βh − E[βh])Tx + ǫ

y2 = (βh, x + ǫ)2 = E[β⊗2

h ] M2

, x⊗2 + bias2 + noise2

,
,
Chaganty, Liang (Stanford University)

Spectral Experts January 28, 2016 13 / 22

SLIDE 46

Tensor Factorization for a Discriminative Model

Finding Tensor Structure

y =

linear measurement

E[βh], x

+

noise

(βh − E[βh])Tx + ǫ

y2 = (βh, x + ǫ)2 = E[β⊗2

h ] M2

, x⊗2 + bias2 + noise2 y3 = E[β⊗3

h ] M3

, x⊗3 + bias3 + noise3

,
,
,
Chaganty, Liang (Stanford University)

Spectral Experts January 28, 2016 13 / 22

SLIDE 47

Tensor Factorization for a Discriminative Model

Recovering Parameters

◮ M3

def

= E[β⊗3

h ] = k h=1 πhβ⊗3 h

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 14 / 22

SLIDE 48

Tensor Factorization for a Discriminative Model

Recovering Parameters

◮ M3

def

= E[β⊗3

h ] = k h=1 πhβ⊗3 h

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 14 / 22

SLIDE 49

Tensor Factorization for a Discriminative Model

Recovering Parameters

◮ M3

def

= E[β⊗3

h ] = k h=1 πhβ⊗3 h

◮ Apply tensor factorization!

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 14 / 22

SLIDE 50

Tensor Factorization for a Discriminative Model

Overview: Spectral Experts

x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

regression tensor factorization

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22

SLIDE 51

Tensor Factorization for a Discriminative Model

Overview: Spectral Experts

x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

regression tensor factorization Assumptions:

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22

SLIDE 52

Tensor Factorization for a Discriminative Model

Overview: Spectral Experts

x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

regression tensor factorization Assumptions: ˆ E[vec(x⊗2)⊗2] ≻ 0 ˆ E[vec(x⊗3)⊗2] ≻ 0.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22

SLIDE 53

Tensor Factorization for a Discriminative Model

Overview: Spectral Experts

x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

regression tensor factorization Assumptions: ˆ E[vec(x⊗2)⊗2] ≻ 0 ˆ E[vec(x⊗3)⊗2] ≻ 0. π ≻ 0 rank(B) = k ≤ d

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22

SLIDE 54

Tensor Factorization for a Discriminative Model

Exploiting Low-rank Structure.

ˆ M2 = arg min

M

(x,y)∈D
y2 −
M, x⊗2

− bias2

2

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 16 / 22

SLIDE 55

Tensor Factorization for a Discriminative Model

Exploiting Low-rank Structure.

fazel2002matrix

ˆ M2 = arg min

M

(x,y)∈D
y2 −
M, x⊗2

− bias2

2 + M∗

i σi(M)

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 16 / 22

SLIDE 56

Tensor Factorization for a Discriminative Model

Exploiting Low-rank Structure.

fazel2002matrix tomioka2010estimation

ˆ M3 = arg min

M

(x,y)∈D
y3 −
M, x⊗3

− bias3

2 + M∗

= + + · · · + k

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 16 / 22

SLIDE 57

Tensor Factorization for a Discriminative Model

Sample Complexity

x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

low-rank regression tensor factorization

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 17 / 22

SLIDE 58

Tensor Factorization for a Discriminative Model

Sample Complexity

NegahbanWainwright2009; Tomioka2011 x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

low-rank regression tensor factorization O

k x12 β6 E[ǫ2]6

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 17 / 22

SLIDE 59

Tensor Factorization for a Discriminative Model

Sample Complexity

NegahbanWainwright2009; Tomioka2011 AnandkumarGeHsu2012 x⊗2, y2

(x,y)∈D

x⊗3, y3

(x,y)∈D

M2 M3 π, B

tensor factorization

low-rank regression tensor factorization O

k x12 β6 E[ǫ2]6

O

kπ2

max

σk(M2)5

Chaganty, Liang (Stanford University)

Spectral Experts January 28, 2016 17 / 22

SLIDE 60

Experimental Insights

1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y

y = βT

    

1 t t4 t7

    

x

+ǫ k = 3, d = 4, n = 105

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22

SLIDE 61

Experimental Insights

1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y

EM

y = βT

    

1 t t4 t7

    

x

+ǫ k = 3, d = 4, n = 105

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22

SLIDE 62

Experimental Insights

1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y

EM

1 2 3 4 5 6 7 Parameter Error 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%

EM

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22

SLIDE 63

Experimental Insights

1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y

Spectral

1 2 3 4 5 6 7 Parameter Error 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%

EM Spectral

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22

SLIDE 64

Experimental Insights

1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y

Spectral+EM

1 2 3 4 5 6 7 Parameter Error 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%

EM Spectral Spectral+EM

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22

SLIDE 65

Experimental Insights

EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error

d = 4, k = 2

EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error

d = 5, k = 2

EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error

d = 5, k = 3

EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error

d = 6, k = 2

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 19 / 22

SLIDE 66

Experimental Insights

On Initialization (Cartoon)

θ − log pθ(x)

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22

SLIDE 67

Experimental Insights

On Initialization (Cartoon)

θ − log pθ(x)

x ˆ

θEM

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22

SLIDE 68

Experimental Insights

On Initialization (Cartoon)

θ − log pθ(x)

x ˆ

θEM

x

ˆ θspec

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22

SLIDE 69

Experimental Insights

On Initialization (Cartoon)

θ − log pθ(x)

x ˆ

θEM

x

ˆ θspec

x

ˆ θspec + EM

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22

SLIDE 70

Conclusions

◮ Consistent estimator for the mixture of linear regressions

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 71

Conclusions

◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 72

Conclusions

◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 73

Conclusions

◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good

initialization for EM.

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 74

Conclusions

◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good

initialization for EM.

◮ Future Work: How can we handle other discriminative models?

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 75

Conclusions

◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good

initialization for EM.

◮ Future Work: How can we handle other discriminative models?

◮ Dependencies between h and x (mixture of experts). Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 76

Conclusions

◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good

initialization for EM.

◮ Future Work: How can we handle other discriminative models?

◮ Dependencies between h and x (mixture of experts). ◮ Non-linear link functions (hidden variable logistic regression). Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22

SLIDE 77

Conclusions

Thank you!

Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 22 / 22