Spectral Experts for Estimating Mixtures of Linear Regressions
Arun Tejasvi Chaganty Percy Liang
Stanford University
January 28, 2016
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22
Spectral Experts for Estimating Mixtures of Linear Regressions Arun - - PowerPoint PPT Presentation
Spectral Experts for Estimating Mixtures of Linear Regressions Arun Tejasvi Chaganty Percy Liang Stanford University January 28, 2016 Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22 Introduction Latent
Stanford University
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 1 / 22
Introduction
◮
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction
◮
◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction
◮
◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .
◮
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction
◮
◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .
◮
◮ Mixture of Experts ◮ Latent CRFs ◮ Discriminative LDA ◮ . . .
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction
◮
◮ Gaussian Mixture Models ◮ Hidden Markov Models ◮ Latent Dirichlet Allocation ◮ PCFGs ◮ . . .
◮
◮ Mixture of Experts ◮ Latent CRFs ◮ Discriminative LDA ◮ . . .
◮ Easy to include features and
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 2 / 22
Introduction
◮ Log-likelihood function is non-convex.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction
◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction
◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction
◮ Log-likelihood function is non-convex. ◮ MLE is consistent but intractable. ◮ Local methods (EM, gradient descent, etc.) are tractable but
◮ Can we build an efficient and consistent estimator?
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 3 / 22
Introduction
◮ Method of Moments [Pearson, 1894]
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22
Introduction
◮ Method of Moments [Pearson, 1894] ◮ Observable operators
◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh,
2004]
◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22
Introduction
◮ Method of Moments [Pearson, 1894] ◮ Observable operators
◮ Control Theory [Ljung, 1987] ◮ Observable operator models [Jaeger, 2000; Littman/Sutton/Singh,
2004]
◮ Hidden Markov models [Hsu/Kakade/Zhang, 2009] ◮ Low-treewidth graphs [Parikh et al., 2012] ◮ Weighted finite state automata [Balle & Mohri, 2012]
◮ Parameter Estimation
◮ Mixture of Gaussians [Kalai/Moitra/Valiant, 2010] ◮ Mixture models, HMMs [Anandkumar/Hsu/Kakade, 2012] ◮ Latent Dirichlet Allocation [Anandkumar/Hsu/Kakade, 2012] ◮ Stochastic block models [Anandkumar/Ge/Hsu/Kakade, 2012] ◮ Linear Bayesian networks [Anandkumar/Hsu/Javanmard/Kakade, 2012] Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 4 / 22
Introduction
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 5 / 22
Tensor Factorization for a Generative Model
◮
ijk = xixjxk
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22
Tensor Factorization for a Generative Model
◮
ijk = xixjxk ◮ Inner product
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22
Tensor Factorization for a Generative Model
◮
ijk = xixjxk ◮ Inner product
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 6 / 22
Tensor Factorization for a Generative Model
anandkumar12moments
◮ Generative process:
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model
anandkumar12moments
◮ Generative process:
◮ Moments:
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model
anandkumar12moments
◮ Generative process:
◮ Moments:
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model
anandkumar12moments
◮ Generative process:
◮ Moments:
h ) + σ2
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model
anandkumar12moments
◮ Generative process:
◮ Moments:
h ) + σ2
h
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 7 / 22
Tensor Factorization for a Generative Model
◮ E[x⊗3] = k h=1 πhβ⊗3 h .
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model
◮ E[x⊗3] = k h=1 πhβ⊗3 h .
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model
AnandkumarGeHsu2012
◮ E[x⊗3] = k h=1 πhβ⊗3 h . ◮ If βh are orthogonal, they are
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model
AnandkumarGeHsu2012
◮ E[x⊗3] = k h=1 πhβ⊗3 h . ◮ If βh are orthogonal, they are
◮ In general, whiten E[x⊗3] first.
x1 x2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 8 / 22
Tensor Factorization for a Generative Model
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22
Tensor Factorization for a Generative Model
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 9 / 22
Tensor Factorization for a Discriminative Model
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
◮ Given x
◮ h ∼ Mult([π1, π2, · · · , πk]).
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
◮ Given x
◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT
h x + ǫ.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
◮ Given x
◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT
h x + ǫ.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
◮ Given x
◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT
h x + ǫ.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
◮ Given x
◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT
h x + ǫ.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
◮ Given x
◮ h ∼ Mult([π1, π2, · · · , πk]). ◮ y = βT
h x + ǫ.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 10 / 22
Tensor Factorization for a Discriminative Model
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 11 / 22
Tensor Factorization for a Discriminative Model
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22
Tensor Factorization for a Discriminative Model
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22
Tensor Factorization for a Discriminative Model
h πhβh.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22
Tensor Factorization for a Discriminative Model
h πhβh.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22
Tensor Factorization for a Discriminative Model
h πhβh.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 12 / 22
Tensor Factorization for a Discriminative Model
linear measurement
noise
Spectral Experts January 28, 2016 13 / 22
Tensor Factorization for a Discriminative Model
linear measurement
noise
Spectral Experts January 28, 2016 13 / 22
Tensor Factorization for a Discriminative Model
linear measurement
noise
h ], x⊗2
Spectral Experts January 28, 2016 13 / 22
Tensor Factorization for a Discriminative Model
linear measurement
noise
h ] M2
Spectral Experts January 28, 2016 13 / 22
Tensor Factorization for a Discriminative Model
linear measurement
noise
h ] M2
h ] M3
Spectral Experts January 28, 2016 13 / 22
Tensor Factorization for a Discriminative Model
◮ M3
def
h ] = k h=1 πhβ⊗3 h
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 14 / 22
Tensor Factorization for a Discriminative Model
◮ M3
def
h ] = k h=1 πhβ⊗3 h
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 14 / 22
Tensor Factorization for a Discriminative Model
◮ M3
def
h ] = k h=1 πhβ⊗3 h
◮ Apply tensor factorization!
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 14 / 22
Tensor Factorization for a Discriminative Model
x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22
Tensor Factorization for a Discriminative Model
x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22
Tensor Factorization for a Discriminative Model
x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22
Tensor Factorization for a Discriminative Model
x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 15 / 22
Tensor Factorization for a Discriminative Model
M
2
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 16 / 22
Tensor Factorization for a Discriminative Model
fazel2002matrix
M
2 + M∗
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 16 / 22
Tensor Factorization for a Discriminative Model
fazel2002matrix tomioka2010estimation
M
2 + M∗
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 16 / 22
Tensor Factorization for a Discriminative Model
x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 17 / 22
Tensor Factorization for a Discriminative Model
NegahbanWainwright2009; Tomioka2011 x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
k x12 β6 E[ǫ2]6
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 17 / 22
Tensor Factorization for a Discriminative Model
NegahbanWainwright2009; Tomioka2011 AnandkumarGeHsu2012 x⊗2, y2
(x,y)∈D
x⊗3, y3
(x,y)∈D
tensor factorization
k x12 β6 E[ǫ2]6
kπ2
max
σk(M2)5
Spectral Experts January 28, 2016 17 / 22
Experimental Insights
1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y
x
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22
Experimental Insights
1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y
EM
x
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22
Experimental Insights
1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y
EM
1 2 3 4 5 6 7 Parameter Error 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
EM
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22
Experimental Insights
1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y
Spectral
1 2 3 4 5 6 7 Parameter Error 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
EM Spectral
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22
Experimental Insights
1.0 0.5 0.0 0.5 1.0 t 3 2 1 1 2 3 4 5 y
Spectral+EM
1 2 3 4 5 6 7 Parameter Error 0.0% 20.0% 40.0% 60.0% 80.0% 100.0%
EM Spectral Spectral+EM
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 18 / 22
Experimental Insights
EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error
EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error
EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error
EM Spectral Spectral + EM 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Parameter Error
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 19 / 22
Experimental Insights
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22
Experimental Insights
x ˆ
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22
Experimental Insights
x ˆ
x
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22
Experimental Insights
x ˆ
x
x
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 20 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity.
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good
◮ Future Work: How can we handle other discriminative models?
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good
◮ Future Work: How can we handle other discriminative models?
◮ Dependencies between h and x (mixture of experts). Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
◮ Consistent estimator for the mixture of linear regressions ◮ Key Idea: Expose tensor factorization structure through regression. ◮ Theory: Polynomial sample and computational complexity. ◮ Experiments: Method of moment estimates can be a good
◮ Future Work: How can we handle other discriminative models?
◮ Dependencies between h and x (mixture of experts). ◮ Non-linear link functions (hidden variable logistic regression). Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 21 / 22
Conclusions
Chaganty, Liang (Stanford University) Spectral Experts January 28, 2016 22 / 22