Latent Variable Models CS3750 Xiaoting Li 1 Out utli line - PDF document

2/4/2020 Latent Variable Models CS3750 Xiaoting Li 1 Out utli line • Latent Variable Models • Expectation Maximization Algorithm (EM) • Factor Analysis • Probabilistic Principal Component Analysis • Model Formulation • Maximum Likelihood for PPCA • EM for PPCA • Examples • Sensible Principal Component Analysis • Model Formulation • EM for SPCA • References 2 1

2/4/2020 Laten ent Var ariable le Mod odels: : Mot otiv ivatio ion 3 Laten ent Var ariable le Mod odels: : Mot otiv ivatio ion 4 2

2/4/2020 Laten ent Var ariable le Mod odels: : Mot otiv ivatio ion • Gaussian mixture models • A single Gaussian is not a good fit to data • But two different Gaussians may do • True class of each point is unobservable 5 Laten ent Var ariable le Mod odels A latent variable model p is a probability distribution over two sets of variables s,x : 𝑞(𝑡, 𝑦; 𝜄) where the x variables are observed at learning time in a dataset D and the s are never observed 6 3

2/4/2020 Laten ent Var ariable le Mod odels • The goal of a latent variable model is to express the distribution p(x) of the variables 𝑦 1 , … , 𝑦 𝑒 in terms of a smaller number of latent variables s = ( 𝑡 1 ,..., 𝑡 𝑟 ) where q < d 𝑡 1 𝑡 2 𝑡 3 Latent variable: s, q-dimensions q < d Observed variable: x, d-dimensions 𝑦 1 𝑦 2 𝑦 3 𝑦 4 7 Expe Ex pectatio ion-Maxim imizatio ion (EM (EM) alg algorit ithm • EM algorithm is a hugely important and widely used algorithm for learning directed latent-variable graphical • The key idea of the method: • Compute the parameter estimates iteratively by performing the following two steps: • 1. Expectation step . For all hidden and missing variables (and their possible value assignments) calculate their expectations for the current set of parameters Θ ' • 2. Maximization step . Compute the new estimates of Θ by considering the expectations of the different value completions • Stop when no improvement possible 8 4

2/4/2020 Fac actor Anal nalysis is • Assumptions: • Underlying latent variable has a Gaussian distribution • s ~ 𝑂( 0, I), independent, Gaussian with unit variance • Linear relationship between latent and observed variables • Diagonal Gaussian noise in data dimensions • 𝜗 ~ 𝑂 0, Ψ , Gaussian noise 9 Fac actor Anal nalysis is • A common latent variable where the relationship is linear: x = 𝑋𝑡 + 𝜈 + 𝜗 • d−dimensional observation vector x • q -dimensional vector of latent variable s • 𝑒 × 𝑟 matrix W relates the two sets of variables, 𝑟 < 𝑒 • 𝜈 permits the model to have non-zero mean • s ~ 𝑂( 0, I), independent, Gaussian with unit variance • 𝜗 ~ 𝑂 0, Ψ , Gaussian noise • Then x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + Ψ) 10 5

2/4/2020 Fac actor Ana naly lysis is s ~ 𝑂(0, 𝐽) Latent variable: s, q-dimensions Observed variable: x, d-dimensions 𝑡 1 𝑡 2 𝑡 3 𝑆𝑓𝑛𝑏𝑞𝑞𝑗𝑜𝑕: Ws (weight matrix: w) + 𝜈 (location parameter) + 𝜗 ~ 𝑂 0, Ψ , Gaussian noise 𝑦 1 𝑦 2 𝑦 3 𝑦 4 Parameters of interest: W (weight matrix), Ψ (variance of noise), 𝝂 x = Ws + 𝜈 + 𝜗 𝑦~ 𝑂 𝜈, 𝑋𝑋 𝑈 + Ψ 11 Fac actor Anal nalysis is: Optim imizatio ion • Use EM to solve parameters • E-step: • compute posterior p(s|x) • M-step: • Take derivatives of the expected complete log likelihood with respect to parameters 12 6

2/4/2020 Pri rincipal l Com omponent Ana naly lysis • General motivation is to transform the data into some reduced dimensionality representation • Linear transformation of a d dimensional input x to q dimensional vector s such that q < d under which the retained variance is maximal • Limitation: • Absence of an associated probabilistic model for the observed data • Computational intensive for covariance matrix • Does not deal properly with missing data 13 Prob obabili listic ic PCA • Motivations: • The corresponding likelihood measure would permit comparison with other density – estimation techniques and would facilitate statistical testing. • Provides a natural framework for thinking about hypothesis testing • Offers the potential to extend the scope of conventional PCA. • Can be utilized as a constrained Gaussian density model. • Constrained covariance • Allows us to deal with missing values in the data set. • Can be used to model class conditional densities and hence it can be applied to classification problems. 14 7

2/4/2020 Gen enerativ ive Vie View of of PPCA • Generative view of the PPCA for a 2-d data space and 1-d latent space s s s s s s s 15 PPCA PPCA • Assumptions: • Underlying latent variable 𝑟 − dim 𝑡 has a Gaussian distribution • Linear relationship between 𝑟 − dim latent 𝑡 and 𝑒 − dim observed 𝑦 variables • Isotropic Gaussian noise in observed dimensions • Noise variances constrained to be equal 16 8

2/4/2020 PPCA PPCA • A special case of factor analysis • noise variances constrained to be equal: • 𝜗 ~ 𝑂(0, 𝜏 2 I) • the s conditional probability distribution over x-space: • x |𝑡 ~ 𝑂(𝑋𝑡 + 𝜈, 𝜏 2 I) • latent variables: • s ~ 𝑂(0, 𝐽) • observed data x be obtained by integrating out the latent variables: • x ~ 𝑂 𝜈, 𝐷 • 𝐹 𝑦 = 𝐹 𝜈 + 𝑋𝑡 + 𝜗 = 𝜈 + 𝑋𝐹 𝑡 + 𝐹 𝜗 = 𝜈 + 𝑋0 + 0 = 𝜈 • 𝐷 = 𝑋𝑋 𝑈 + 𝜏 2 I (the observation covariance model) 𝜈 + 𝑋𝑡 + 𝜗 − 𝜈 𝑈 = 𝐹 𝑋𝑡 + 𝜗 𝑋𝑡 + 𝜗 𝑈 = 𝑋𝑋 𝑈 + 𝜏 2 I • 𝐷 = 𝐷𝑝𝑤 𝑦 = 𝐹 𝜈 + 𝑋𝑡 + 𝜗 − 𝜈 • The maximum likelihood estimator for 𝜈 is given by the mean of data, S is sample covariance matrix of the observations {𝑦 𝑜 } • Estimates for 𝑋 and 𝜏 2 can be solved in two ways • Closed form • EM Algorithms 17 Latent variable: s, q-dimensions PPCA PPCA Observed variable: x, d-dimensions s ~ 𝑂(0, 𝐽) 𝑡 1 𝑡 2 𝑡 3 𝑆𝑓𝑛𝑏𝑞𝑞𝑗𝑜𝑕: Ws (weight matrix: w) + 𝜈 (location parameter) + Random error (noise): 𝜗 ~ 𝑂 0, 𝜏 2 𝐽 𝑦 1 𝑦 2 𝑦 3 𝑦 4 Parameters of interest: W (weight matrix), 𝝉 𝟑 (variance of noise), 𝝂 x = Ws + 𝜈 + 𝜗 x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + 𝜏 2 I) 18 9

2/4/2020 Fac actor Anal nalysis is vs. s. PPCA • PPCA • x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + 𝜏 2 I) • Isotropic error • Factor Analysis • x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + Ψ) • The error covariance is a diagonal matrix • FA doesn’t change if you scale variables • FA looks for directions of large correlation in the data • FA doesn’t chase large -noise features that are uncorrelated with other features • FA changes if you rotate data • can’t interpret multiple factors as being unique 19 Maxi ximum Likelih ihood for or PPCA • The log-likelihood for the observed data under this model is given by 𝑂 = − 𝑂𝑒 2 ln 2𝜌 − 𝑂 2 ln C − 𝑂 2 𝑈𝑠{𝐷 −1 𝑇} ℒ = ෍ ln 𝑞 𝑦 𝑜 𝑜=1 • where 𝑇 is the sample covariance matrix of the observations 𝑦 𝑜 𝑂 𝑇 = 1 (𝑦 𝑜 − 𝜈)(𝑦 𝑜 − 𝜈) 𝑈 𝑂 ෍ 𝑜=1 • 𝐷 = 𝑋𝑋 𝑈 + 𝜏 2 I • The log-likelihood is maximized when the columns of W span the principal subspace of the data. • Fit parameters (𝑿, 𝜈, 𝜏) to maximum likelihood: make the constrained model covariance as close as possible to the observed covariance 20 10

2/4/2020 Maxi ximum Likelih ihood for or PPCA • Consider the derivatives with respect to W 𝜖ℒ 𝜖𝑋 = 𝑂(𝐷 −1 𝑇𝐷 −1 𝑋 − 𝐷 −1 W) • • Maximizing with respect to W • 𝑋 𝑁𝑀 = 𝑉 𝑟 (∧ 𝑟 −𝜏 2 𝐽) 1/2 𝑆 • Where • the 𝑟 column vectors in 𝑉𝑟 are eigenvectors of 𝑇 , with corresponding eigenvalues in the diagonal matrix Λ𝑟 • 𝑆 is an arbitrary 𝑟 × 𝑟 orthogonal rotation matrix. 𝑁𝑀 , the maximum-likelihood estimator for 𝜏 2 is given by • For 𝑋 = 𝑋 1 2 𝑒 • 𝜏 𝑁𝑀 𝑒−𝑟 σ 𝑘=𝑟+1 = 𝜇 𝑘 • the average variance associated with the discarded dimensions 21 Maxi ximum Likelih ihood for or PPCA • Consider the derivatives with respect to W 𝜖ℒ • 𝜖𝑋 = 𝑂(𝐷 −1 𝑇𝐷 −1 𝑋 − 𝐷 −1 W) • At the stationary points 𝑇𝐷 −1 𝑋 = 𝑋 , assuming that 𝐷 −1 exists • Three possible classes of solutions • 𝑋 = 0, minimum of the log-likelihood • 𝐷 = 𝑇 • Covariance model is exact • 𝑋𝑋 𝑈 = 𝑇 − 𝜏 2 𝐽 has a known solution at 𝑋 = 𝑉(∧ − 𝜏 2 𝐽) 1/2 𝑆 , where 𝑉 is a square matrix whose columns are the eigenvectors of 𝑇 , with ∧ is the corresponding diagonal matrix of eigenvalues, 𝑆 is an arbitrary orthogonal matrix • 𝑇𝐷 −1 𝑋 = 𝑋, 𝑐𝑣𝑢 𝑋 ≠ 0 𝑏𝑜𝑒 𝐷 ≠ 𝑇 22 11

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line - PDF document

2/4/2020 Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models Expectation Maximization Algorithm (EM) Factor Analysis Probabilistic Principal Component Analysis Model Formulation Maximum

1 Latent variable models In the next section we will discuss latent variable models for

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Machine Learning Basics Lecture slides for Chapter 5 of Deep Learning www.deeplearningbook.org

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Principal Component Analysis of High Frequency Data t-Sahalia Dacheng Xiu Yacine A

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line - PDF document

2/4/2020 Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models Expectation Maximization Algorithm (EM) Factor Analysis Probabilistic Principal Component Analysis Model Formulation Maximum

1 Latent variable models In the next section we will discuss latent variable models for

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Machine Learning Basics Lecture slides for Chapter 5 of Deep Learning www.deeplearningbook.org

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Principal Component Analysis of High Frequency Data t-Sahalia Dacheng Xiu Yacine A

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Regularization Overview Regularization Overview Problems & Multicollinearity We will