latent variable models
play

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line - PDF document

2/4/2020 Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models Expectation Maximization Algorithm (EM) Factor Analysis Probabilistic Principal Component Analysis Model Formulation Maximum


  1. 2/4/2020 Latent Variable Models CS3750 Xiaoting Li 1 Out utli line • Latent Variable Models • Expectation Maximization Algorithm (EM) • Factor Analysis • Probabilistic Principal Component Analysis • Model Formulation • Maximum Likelihood for PPCA • EM for PPCA • Examples • Sensible Principal Component Analysis • Model Formulation • EM for SPCA • References 2 1

  2. 2/4/2020 Laten ent Var ariable le Mod odels: : Mot otiv ivatio ion 3 Laten ent Var ariable le Mod odels: : Mot otiv ivatio ion 4 2

  3. 2/4/2020 Laten ent Var ariable le Mod odels: : Mot otiv ivatio ion • Gaussian mixture models • A single Gaussian is not a good fit to data • But two different Gaussians may do • True class of each point is unobservable 5 Laten ent Var ariable le Mod odels A latent variable model p is a probability distribution over two sets of variables s,x : 𝑞(𝑡, 𝑦; 𝜄) where the x variables are observed at learning time in a dataset D and the s are never observed 6 3

  4. 2/4/2020 Laten ent Var ariable le Mod odels • The goal of a latent variable model is to express the distribution p(x) of the variables 𝑦 1 , … , 𝑦 𝑒 in terms of a smaller number of latent variables s = ( 𝑡 1 ,..., 𝑡 𝑟 ) where q < d 𝑡 1 𝑡 2 𝑡 3 Latent variable: s, q-dimensions q < d Observed variable: x, d-dimensions 𝑦 1 𝑦 2 𝑦 3 𝑦 4 7 Expe Ex pectatio ion-Maxim imizatio ion (EM (EM) alg algorit ithm • EM algorithm is a hugely important and widely used algorithm for learning directed latent-variable graphical • The key idea of the method: • Compute the parameter estimates iteratively by performing the following two steps: • 1. Expectation step . For all hidden and missing variables (and their possible value assignments) calculate their expectations for the current set of parameters Θ ' • 2. Maximization step . Compute the new estimates of Θ by considering the expectations of the different value completions • Stop when no improvement possible 8 4

  5. 2/4/2020 Fac actor Anal nalysis is • Assumptions: • Underlying latent variable has a Gaussian distribution • s ~ 𝑂( 0, I), independent, Gaussian with unit variance • Linear relationship between latent and observed variables • Diagonal Gaussian noise in data dimensions • 𝜗 ~ 𝑂 0, Ψ , Gaussian noise 9 Fac actor Anal nalysis is • A common latent variable where the relationship is linear: x = 𝑋𝑡 + 𝜈 + 𝜗 • d−dimensional observation vector x • q -dimensional vector of latent variable s • 𝑒 × 𝑟 matrix W relates the two sets of variables, 𝑟 < 𝑒 • 𝜈 permits the model to have non-zero mean • s ~ 𝑂( 0, I), independent, Gaussian with unit variance • 𝜗 ~ 𝑂 0, Ψ , Gaussian noise • Then x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + Ψ) 10 5

  6. 2/4/2020 Fac actor Ana naly lysis is s ~ 𝑂(0, 𝐽) Latent variable: s, q-dimensions Observed variable: x, d-dimensions 𝑡 1 𝑡 2 𝑡 3 𝑆𝑓𝑛𝑏𝑞𝑞𝑗𝑜𝑕: Ws (weight matrix: w) + 𝜈 (location parameter) + 𝜗 ~ 𝑂 0, Ψ , Gaussian noise 𝑦 1 𝑦 2 𝑦 3 𝑦 4 Parameters of interest: W (weight matrix), Ψ (variance of noise), 𝝂 x = Ws + 𝜈 + 𝜗 𝑦~ 𝑂 𝜈, 𝑋𝑋 𝑈 + Ψ 11 Fac actor Anal nalysis is: Optim imizatio ion • Use EM to solve parameters • E-step: • compute posterior p(s|x) • M-step: • Take derivatives of the expected complete log likelihood with respect to parameters 12 6

  7. 2/4/2020 Pri rincipal l Com omponent Ana naly lysis • General motivation is to transform the data into some reduced dimensionality representation • Linear transformation of a d dimensional input x to q dimensional vector s such that q < d under which the retained variance is maximal • Limitation: • Absence of an associated probabilistic model for the observed data • Computational intensive for covariance matrix • Does not deal properly with missing data 13 Prob obabili listic ic PCA • Motivations: • The corresponding likelihood measure would permit comparison with other density – estimation techniques and would facilitate statistical testing. • Provides a natural framework for thinking about hypothesis testing • Offers the potential to extend the scope of conventional PCA. • Can be utilized as a constrained Gaussian density model. • Constrained covariance • Allows us to deal with missing values in the data set. • Can be used to model class conditional densities and hence it can be applied to classification problems. 14 7

  8. 2/4/2020 Gen enerativ ive Vie View of of PPCA • Generative view of the PPCA for a 2-d data space and 1-d latent space s s s s s s s 15 PPCA PPCA • Assumptions: • Underlying latent variable 𝑟 − dim 𝑡 has a Gaussian distribution • Linear relationship between 𝑟 − dim latent 𝑡 and 𝑒 − dim observed 𝑦 variables • Isotropic Gaussian noise in observed dimensions • Noise variances constrained to be equal 16 8

  9. 2/4/2020 PPCA PPCA • A special case of factor analysis • noise variances constrained to be equal: • 𝜗 ~ 𝑂(0, 𝜏 2 I) • the s conditional probability distribution over x-space: • x |𝑡 ~ 𝑂(𝑋𝑡 + 𝜈, 𝜏 2 I) • latent variables: • s ~ 𝑂(0, 𝐽) • observed data x be obtained by integrating out the latent variables: • x ~ 𝑂 𝜈, 𝐷 • 𝐹 𝑦 = 𝐹 𝜈 + 𝑋𝑡 + 𝜗 = 𝜈 + 𝑋𝐹 𝑡 + 𝐹 𝜗 = 𝜈 + 𝑋0 + 0 = 𝜈 • 𝐷 = 𝑋𝑋 𝑈 + 𝜏 2 I (the observation covariance model) 𝜈 + 𝑋𝑡 + 𝜗 − 𝜈 𝑈 = 𝐹 𝑋𝑡 + 𝜗 𝑋𝑡 + 𝜗 𝑈 = 𝑋𝑋 𝑈 + 𝜏 2 I • 𝐷 = 𝐷𝑝𝑤 𝑦 = 𝐹 𝜈 + 𝑋𝑡 + 𝜗 − 𝜈 • The maximum likelihood estimator for 𝜈 is given by the mean of data, S is sample covariance matrix of the observations {𝑦 𝑜 } • Estimates for 𝑋 and 𝜏 2 can be solved in two ways • Closed form • EM Algorithms 17 Latent variable: s, q-dimensions PPCA PPCA Observed variable: x, d-dimensions s ~ 𝑂(0, 𝐽) 𝑡 1 𝑡 2 𝑡 3 𝑆𝑓𝑛𝑏𝑞𝑞𝑗𝑜𝑕: Ws (weight matrix: w) + 𝜈 (location parameter) + Random error (noise): 𝜗 ~ 𝑂 0, 𝜏 2 𝐽 𝑦 1 𝑦 2 𝑦 3 𝑦 4 Parameters of interest: W (weight matrix), 𝝉 𝟑 (variance of noise), 𝝂 x = Ws + 𝜈 + 𝜗 x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + 𝜏 2 I) 18 9

  10. 2/4/2020 Fac actor Anal nalysis is vs. s. PPCA • PPCA • x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + 𝜏 2 I) • Isotropic error • Factor Analysis • x ~ 𝑂(𝜈, 𝑋𝑋 𝑈 + Ψ) • The error covariance is a diagonal matrix • FA doesn’t change if you scale variables • FA looks for directions of large correlation in the data • FA doesn’t chase large -noise features that are uncorrelated with other features • FA changes if you rotate data • can’t interpret multiple factors as being unique 19 Maxi ximum Likelih ihood for or PPCA • The log-likelihood for the observed data under this model is given by 𝑂 = − 𝑂𝑒 2 ln 2𝜌 − 𝑂 2 ln C − 𝑂 2 𝑈𝑠{𝐷 −1 𝑇} ℒ = ෍ ln 𝑞 𝑦 𝑜 𝑜=1 • where 𝑇 is the sample covariance matrix of the observations 𝑦 𝑜 𝑂 𝑇 = 1 (𝑦 𝑜 − 𝜈)(𝑦 𝑜 − 𝜈) 𝑈 𝑂 ෍ 𝑜=1 • 𝐷 = 𝑋𝑋 𝑈 + 𝜏 2 I • The log-likelihood is maximized when the columns of W span the principal subspace of the data. • Fit parameters (𝑿, 𝜈, 𝜏) to maximum likelihood: make the constrained model covariance as close as possible to the observed covariance 20 10

  11. 2/4/2020 Maxi ximum Likelih ihood for or PPCA • Consider the derivatives with respect to W 𝜖ℒ 𝜖𝑋 = 𝑂(𝐷 −1 𝑇𝐷 −1 𝑋 − 𝐷 −1 W) • • Maximizing with respect to W • 𝑋 𝑁𝑀 = 𝑉 𝑟 (∧ 𝑟 −𝜏 2 𝐽) 1/2 𝑆 • Where • the 𝑟 column vectors in 𝑉𝑟 are eigenvectors of 𝑇 , with corresponding eigenvalues in the diagonal matrix Λ𝑟 • 𝑆 is an arbitrary 𝑟 × 𝑟 orthogonal rotation matrix. 𝑁𝑀 , the maximum-likelihood estimator for 𝜏 2 is given by • For 𝑋 = 𝑋 1 2 𝑒 • 𝜏 𝑁𝑀 𝑒−𝑟 σ 𝑘=𝑟+1 = 𝜇 𝑘 • the average variance associated with the discarded dimensions 21 Maxi ximum Likelih ihood for or PPCA • Consider the derivatives with respect to W 𝜖ℒ • 𝜖𝑋 = 𝑂(𝐷 −1 𝑇𝐷 −1 𝑋 − 𝐷 −1 W) • At the stationary points 𝑇𝐷 −1 𝑋 = 𝑋 , assuming that 𝐷 −1 exists • Three possible classes of solutions • 𝑋 = 0, minimum of the log-likelihood • 𝐷 = 𝑇 • Covariance model is exact • 𝑋𝑋 𝑈 = 𝑇 − 𝜏 2 𝐽 has a known solution at 𝑋 = 𝑉(∧ − 𝜏 2 𝐽) 1/2 𝑆 , where 𝑉 is a square matrix whose columns are the eigenvectors of 𝑇 , with ∧ is the corresponding diagonal matrix of eigenvalues, 𝑆 is an arbitrary orthogonal matrix • 𝑇𝐷 −1 𝑋 = 𝑋, 𝑐𝑣𝑢 𝑋 ≠ 0 𝑏𝑜𝑒 𝐷 ≠ 𝑇 22 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend