Learning Overcomplete Latent Variable Models through Tensor Methods - PowerPoint PPT Presentation

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine Joint work with Majid Janzamin Rong Ge UC Irvine Microsoft Research

Latent Variable Probabilistic Models Latent (hidden) variable h ∈ R k , observed variable x ∈ R d .

Latent Variable Probabilistic Models Latent (hidden) variable h ∈ R k , observed variable x ∈ R d . Multiview linear mixture models Categorical hidden variable h . h · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h .

Latent Variable Probabilistic Models Latent (hidden) variable h ∈ R k , observed variable x ∈ R d . Multiview linear mixture models Categorical hidden variable h . h · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . Gaussian Mixture Categorical hidden variable h . x | h ∼ N ( µ h , Σ h ) .

Latent Variable Probabilistic Models Latent (hidden) variable h ∈ R k , observed variable x ∈ R d . Multiview linear mixture models Categorical hidden variable h . h · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . Gaussian Mixture Categorical hidden variable h . x | h ∼ N ( µ h , Σ h ) . ICA, Sparse Coding, HMM, Topic modeling, . . .

Latent Variable Probabilistic Models Latent (hidden) variable h ∈ R k , observed variable x ∈ R d . Multiview linear mixture models Categorical hidden variable h . h · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . Gaussian Mixture Categorical hidden variable h . x | h ∼ N ( µ h , Σ h ) . ICA, Sparse Coding, HMM, Topic modeling, . . . Efficient Learning of the parameters a h , µ h , . . . ?

Method-of-Moments (Spectral methods) Multi-variate observed moments M 2 := E [ x ⊗ x ] , M 3 := E [ x ⊗ x ⊗ x ] . M 1 := E [ x ] ,

Method-of-Moments (Spectral methods) Multi-variate observed moments M 2 := E [ x ⊗ x ] , M 3 := E [ x ⊗ x ⊗ x ] . M 1 := E [ x ] , Matrix E [ x ⊗ x ] ∈ R d × d is a second order tensor. E [ x ⊗ x ] i 1 ,i 2 = E [ x i 1 x i 2 ] . For matrices: E [ x ⊗ x ] = E [ xx ⊤ ] .

Method-of-Moments (Spectral methods) Multi-variate observed moments M 2 := E [ x ⊗ x ] , M 3 := E [ x ⊗ x ⊗ x ] . M 1 := E [ x ] , Matrix E [ x ⊗ x ] ∈ R d × d is a second order tensor. E [ x ⊗ x ] i 1 ,i 2 = E [ x i 1 x i 2 ] . For matrices: E [ x ⊗ x ] = E [ xx ⊤ ] . Tensor E [ x ⊗ x ⊗ x ] ∈ R d × d × d is a third order tensor. E [ x ⊗ x ⊗ x ] i 1 ,i 2 ,i 3 = E [ x i 1 x i 2 x i 3 ] .

Method-of-Moments (Spectral methods) Multi-variate observed moments M 2 := E [ x ⊗ x ] , M 3 := E [ x ⊗ x ⊗ x ] . M 1 := E [ x ] , Matrix E [ x ⊗ x ] ∈ R d × d is a second order tensor. E [ x ⊗ x ] i 1 ,i 2 = E [ x i 1 x i 2 ] . For matrices: E [ x ⊗ x ] = E [ xx ⊤ ] . Tensor E [ x ⊗ x ⊗ x ] ∈ R d × d × d is a third order tensor. E [ x ⊗ x ⊗ x ] i 1 ,i 2 ,i 3 = E [ x i 1 x i 2 x i 3 ] . Information in moments for learning LVMs?

Multiview Mixture Model [ k ] := { 1 , . . . , k } . Multiview linear mixture models Categorical hidden variable h ∈ [ k ] . h w j := Pr[ h = j ] · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h .

Multiview Mixture Model [ k ] := { 1 , . . . , k } . Multiview linear mixture models Categorical hidden variable h ∈ [ k ] . h w j := Pr[ h = j ] · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . x 1 x ⊤ 2 � �� E x [ x 1 ⊗ x 2 ] = E h [ E x [ x 1 ⊗ x 2 | h ]] = E h [ a h ⊗ b h ] � = w j a j ⊗ b j . j ∈ [ k ]

Multiview Mixture Model [ k ] := { 1 , . . . , k } . Multiview linear mixture models Categorical hidden variable h ∈ [ k ] . h w j := Pr[ h = j ] · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . � E [ x 1 ⊗ x 2 ] = w j a j ⊗ b j , j ∈ [ k ]

Multiview Mixture Model [ k ] := { 1 , . . . , k } . Multiview linear mixture models Categorical hidden variable h ∈ [ k ] . h w j := Pr[ h = j ] · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . � E [ x 1 ⊗ x 2 ] = w j a j ⊗ b j , j ∈ [ k ] � E [ x 1 ⊗ x 2 ⊗ x 3 ] = w j a j ⊗ b j ⊗ c j . j ∈ [ k ]

Multiview Mixture Model [ k ] := { 1 , . . . , k } . Multiview linear mixture models Categorical hidden variable h ∈ [ k ] . h w j := Pr[ h = j ] · · · Views: conditionally indep. given h . x 1 x 2 x 3 Linear model: E [ x 1 | h ] = a h , E [ x 2 | h ] = b h , E [ x 3 | h ] = c h . � E [ x 1 ⊗ x 2 ] = w j a j ⊗ b j , j ∈ [ k ] � E [ x 1 ⊗ x 2 ⊗ x 3 ] = w j a j ⊗ b j ⊗ c j . j ∈ [ k ] Tensor (matrix) factorization for learning LVMs.

Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w · a ⊗ b ⊗ c ⇔ T ( i, j, l ) = w · a ( i ) · b ( j ) · c ( l ) .

Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w · a ⊗ b ⊗ c ⇔ T ( i, j, l ) = w · a ( i ) · b ( j ) · c ( l ) . CANDECOMP/PARAFAC (CP) Decomposition � w j a j ⊗ b j ⊗ c j ∈ R d × d × d , a j , b j , c j ∈ S d − 1 . T = j ∈ [ k ] .... = + w 1 · a 1 ⊗ b 1 ⊗ c 1 w 2 · a 2 ⊗ b 2 ⊗ c 2 Tensor T

Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w · a ⊗ b ⊗ c ⇔ T ( i, j, l ) = w · a ( i ) · b ( j ) · c ( l ) . CANDECOMP/PARAFAC (CP) Decomposition � w j a j ⊗ b j ⊗ c j ∈ R d × d × d , a j , b j , c j ∈ S d − 1 . T = j ∈ [ k ] .... = + w 1 · a 1 ⊗ b 1 ⊗ c 1 w 2 · a 2 ⊗ b 2 ⊗ c 2 Tensor T k : tensor rank, d : ambient dimension. k ≤ d : undercomplete and k > d : overcomplete.

Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w · a ⊗ b ⊗ c ⇔ T ( i, j, l ) = w · a ( i ) · b ( j ) · c ( l ) . CANDECOMP/PARAFAC (CP) Decomposition � w j a j ⊗ b j ⊗ c j ∈ R d × d × d , a j , b j , c j ∈ S d − 1 . T = j ∈ [ k ] .... = + w 1 · a 1 ⊗ b 1 ⊗ c 1 w 2 · a 2 ⊗ b 2 ⊗ c 2 Tensor T k : tensor rank, d : ambient dimension. k ≤ d : undercomplete and k > d : overcomplete. This talk: guarantees for overcomplete tensor decomposition

Challenges in Tensor Decomposition � Symmetric tensor T ∈ R d × d × d : T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Challenges in tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general.

Challenges in Tensor Decomposition � Symmetric tensor T ∈ R d × d × d : T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Challenges in tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. Tractable case: orthogonal tensor decomposition ( � v i , v j � = 0 , i � = j ) T ( I, v, v ) Algorithm: tensor power method: v �→ � T ( I, v, v ) � . • { v i } ’s are the only robust fixed points.

Challenges in Tensor Decomposition � Symmetric tensor T ∈ R d × d × d : T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Challenges in tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. Tractable case: orthogonal tensor decomposition ( � v i , v j � = 0 , i � = j ) T ( I, v, v ) Algorithm: tensor power method: v �→ � T ( I, v, v ) � . • { v i } ’s are the only robust fixed points. • All other eigenvectors are saddle points.

Challenges in Tensor Decomposition � Symmetric tensor T ∈ R d × d × d : T = λ i v i ⊗ v i ⊗ v i . i ∈ [ k ] Challenges in tensors Decomposition may not always exist for general tensors. Finding the decomposition is NP-hard in general. Tractable case: orthogonal tensor decomposition ( � v i , v j � = 0 , i � = j ) T ( I, v, v ) Algorithm: tensor power method: v �→ � T ( I, v, v ) � . • { v i } ’s are the only robust fixed points. • All other eigenvectors are saddle points. For an orthogonal tensor, no spurious local optima!

Beyond Orthogonal Tensor Decomposition Limitations Not ALL tensors have orthogonal decomposition (unlike matrices).

Beyond Orthogonal Tensor Decomposition Limitations Not ALL tensors have orthogonal decomposition (unlike matrices). Undercomplete tensors ( k ≤ d ) with full rank components Non-orthogonal decomposition T 1 = � i w i a i ⊗ a i ⊗ a i . v 1 a 1 W Whitening matrix W : a 2 v 2 a 3 v 3 Multilinear transform: T 2 = T 1 ( W, W, W ) Tensor T 1 Tensor T 2

Beyond Orthogonal Tensor Decomposition Limitations Not ALL tensors have orthogonal decomposition (unlike matrices). Undercomplete tensors ( k ≤ d ) with full rank components Non-orthogonal decomposition T 1 = � i w i a i ⊗ a i ⊗ a i . v 1 a 1 W Whitening matrix W : a 2 v 2 a 3 v 3 Multilinear transform: T 2 = T 1 ( W, W, W ) Tensor T 1 Tensor T 2 This talk: guarantees for overcomplete tensor decomposition

Outline Introduction 1 Overcomplete tensor decomposition 2 Sample Complexity Analysis 3 Conclusion 4

Our Setup So far General tensor decomposition: NP-hard. Orthogonal tensors: too limiting. Tractable cases? Covers overcomplete tensors?

Learning Overcomplete Latent Variable Models through Tensor Methods - PowerPoint PPT Presentation

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine Joint work with Majid Janzamin Rong Ge UC Irvine Microsoft Research Latent Variable Probabilistic Models Latent (hidden) variable h R k ,

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Overcomplete models & Lateral interactions and Feedback Teppo Niinimki April 22, 2010

1 Latent variable models In the next section we will discuss latent variable models for

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning sparsely used overcomplete dictionaries Alekh Agarwal Microsoft Research Joint work

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

Latent Variable Models Volodymyr Kuleshov Cornell Tech Lecture 5 Volodymyr Kuleshov (Cornell

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer,

Dream to Control: Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap,

AdaGeo: Adaptive Geometric Learning for Optimization and Sampling Gabriele Abbati 1 , Alessandra

Latent Wishart Processes for Relational Kernel Learning Wu-Jun Li Department of Computer Science

in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka

Learning Overcomplete Latent Variable Models through Tensor Methods - PowerPoint PPT Presentation

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine Joint work with Majid Janzamin Rong Ge UC Irvine Microsoft Research Latent Variable Probabilistic Models Latent (hidden) variable h R k ,

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Overcomplete models &amp; Lateral interactions and Feedback Teppo Niinimki April 22, 2010

1 Latent variable models In the next section we will discuss latent variable models for

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning sparsely used overcomplete dictionaries Alekh Agarwal Microsoft Research Joint work

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

Latent Variable Models Volodymyr Kuleshov Cornell Tech Lecture 5 Volodymyr Kuleshov (Cornell

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer,

Dream to Control: Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap,

AdaGeo: Adaptive Geometric Learning for Optimization and Sampling Gabriele Abbati 1 , Alessandra

Latent Wishart Processes for Relational Kernel Learning Wu-Jun Li Department of Computer Science

in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka

Overcomplete models & Lateral interactions and Feedback Teppo Niinimki April 22, 2010