Probabilistic Graphical Models 10-708 Factor Analysis and State - PDF document

Probabilistic Graphical Models 10-708 Factor Analysis and State Space Factor Analysis and State Space Models Models Eric Xing Eric Xing Lecture 15, Nov 2, 2005 Reading: MJ-Chap. 13,14,15 A road map to more complex dynamic models Y Y Y discrete discrete continuous discrete continuous continuous A X X A X A Mixture model Mixture model Factor analysis e.g., mixture of multinomials e.g., mixture of Gaussians y 1 y 1 y 2 y 2 y 3 y 3 y N y N y 1 y 1 y 2 y 2 y 3 y 3 y N y N y 1 y 1 y 2 y 2 y 3 y 3 y N y N ... ... ... ... ... ... x 1 A x 1 A A A x 2 x 2 A x 3 A x 3 x N x N A A x 1 A x 1 A x 2 x 2 A A x 3 A x 3 A x N x N A A x 1 A x 1 A x 2 x 2 A A x 3 x 3 A A x N x N A A ... ... ... ... ... ... HMM HMM State space model (for discrete sequential data, e.g., text) (for continuous sequential data, e.g., speech signal) S 1 S 1 S 2 S 2 S 3 S 3 S N S N S 1 S 1 S 2 S 2 S 3 S 3 S N S N ... ... ... ... y 11 y 11 y 12 y 12 y 13 y 13 y 1N y 1N y 11 y 11 y 12 y 12 y 13 y 13 y 1N y 1N ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... y k1 y k1 y k2 y k2 y k3 y k3 y kN y kN y k1 y k1 y k2 y k2 y k3 y k3 y kN y kN Factorial HMM ... ... Switching SSM ... ... A x 1 A x 1 A x 2 x 2 A A A x 3 x 3 x N x N A A x 1 A x 1 A x 2 A x 2 A x 3 A A x 3 x N A A x N ... ... ... ... 1

Review: A primer to multivariate Gaussian � Multivariate Gaussian density: 1 { } p T − 1 µ Σ = 1 µ Σ µ ( | , ) exp - ( - ) ( - ) x x x 2 1 2 2 n 2 π Σ / / ( ) � A joint Gaussian: ⎡ ⎤ ⎡ ⎤ ⎡ µ ⎤ ⎡ Σ Σ ⎤ x x p 1 1 1 11 12 µ Σ = N ( ⎢ ⎥ | , ) ( ⎢ ⎥ ⎢ ⎥ , ⎢ ⎥ ) µ Σ Σ ⎣ x ⎦ ⎣ x ⎦ ⎣ ⎦ ⎣ ⎦ 2 2 2 21 22 � How to write down p ( x 1 ), p ( x 1 | x 2 ) or p ( x 2 | x 1 ) using the block elements in µ and Σ ? � Formulas to remember: p p m m = N = N ( ) ( | , ) ( ) ( | , ) x x x m V x x m V 1 2 1 1 2 1 2 2 2 2 2 | | m 1 = µ = µ + Σ Σ − − µ ( ) m m x 2 2 1 2 1 12 22 2 2 | m = Σ 1 = Σ − Σ Σ − Σ V V 2 22 1 2 11 12 22 21 | Review: The matrix inverse lemma ⎡ ⎤ E F = � Consider a block-partitioned matrix: M ⎢ ⎥ ⎣ G H ⎦ � First we diagonalize M ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ H - 1 H - 1 - F E F 0 E - F G 0 I I = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ H 1 - ⎣ ⎦ ⎣ G H ⎦ ⎣ G ⎦ ⎣ H ⎦ 0 - 0 I I = M H H - 1 Schur complement: E G � - F / = ⇒ = W W � Then we inverse, using this formula: - 1 - 1 X Y Y X Z Z − 1 ( ) 1 ⎡ ⎤ ⎡ ⎤ ⎡ − ⎤ ⎡ ⎤ M H - 1 E F 0 H - F 0 − 1 I = = / I ⎢ ⎥ ⎢ ⎥ M ⎢ ⎥ ⎢ ⎥ − 1 ⎣ H ⎦ ⎣ - 1 ⎦ G H G ⎣ H ⎦ ⎣ ⎦ - 0 0 I I ( ) ( ) ( ) ( ) ⎡ − 1 − 1 ⎤ ⎡ − 1 − 1 ⎤ − 1 + − 1 − 1 M M M M H H H - 1 G - 1 - F E E F E E - E F E = / / = / / ⎢ ⎥ ⎢ ⎥ ( ) ( ) ( ) ( ) 1 − 1 1 1 − 1 − 1 − 1 − − + − M M - M - M ⎣ H G H H H G H H 1 ⎦ ⎣ G 1 ⎦ - F - E E E / / / / � Matrix inverse lemma ( ) ( ) − 1 − 1 1 1 = − + − H - 1 H - 1 - 1 E G E E G E F G E - F F - 2

Review: Some matrix algebra [ ] ∑ def a � Trace and derivatives = tr A ii i Cyclical permutations � [ ] [ ] [ ] = = tr tr tr ABC CAB BCA Derivatives � ∂ tr [ ] = T BA B ∂ A ∂ [ ] ∂ [ ] = = T T T tr tr x Ax xx A xx ∂ ∂ A A � Determinants and derivatives ∂ log = -T A A ∂ A Factor analysis � An unsupervised linear regression model p 0 = N ( ) ( ; , ) x x I X p = µ + Λ Ψ N ( x y ) ( y ; x , ) where Λ is called a factor loading matrix, and Ψ is diagonal. A Y � Geometric interpretation To generate data, first generate a point within the manifold then add � noise. Coordinates of point are components of latent variable. 3

Marginal data distribution � A marginal Gaussian (e.g., p ( x )) times a conditional Gaussian (e.g., p ( y | x )) is a joint Gaussian � Any marginal (e.g., p ( y ) of a joint Gaussian (e.g., p ( x , y )) is also a Gaussian Since the marginal is Gaussian, we can determine it by just computing � its mean and variance. (Assume noise uncorrelated with data.) [ ] [ ] 0 = µ + Λ + N Ψ E Y E X W w here W ~ ( , ) [ ] [ ] = µ + Λ + E X E W 0 0 = µ + + = µ [ ] [ ] ( )( ) = − µ − µ T Var Y E Y Y [ ] ( )( ) = µ + Λ + − µ µ + Λ + − µ T E X W X W [ ] ( )( ) = Λ + Λ + T X W X W E [ ] [ ] = Λ Λ + T T T E XX E WW = ΛΛ + Ψ T FA = Constrained-Covariance Gaussian � Marginal density for factor analysis ( y is p -dim, x is k -dim): p θ = N µ ΛΛ + Ψ T ( | ) ( ; , ) y y � So the effective covariance is the low-rank outer product of two long skinny matrices plus a diagonal matrix: � In other words, factor analysis is just a constrained Gaussian model. (If were not diagonal then we could model any Gaussian and it would be pointless.) 4

FA joint distribution � Model p 0 = N ( ) ( ; , ) x x I p = N µ + Λ Ψ ( ) ( ; , ) y x y x � Covariance between x and y [ ] [ ] [ ] ( )( ) ( ) 0 = − − µ T = µ + Λ + − µ T Cov X, Y E X Y E x X W [ ] = Λ + T T T E XX xW = Λ T � Hence the joint distribution of x and y : 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Λ x x T I p = N ( ) ( , ⎢ ⎥ ) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ µ Λ ΛΛ + Ψ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ T y y ⎣ ⎦ � Assume noise is uncorrelated with data or latent variables. Inference in Factor Analysis � Apply the Gaussian conditioning formulas to the joint distribution we derived above, where Σ = I 11 Σ = Σ = Λ T T 12 12 ( ) Σ = ΛΛ + Ψ T 22 we can now derive the posterior of the latent variable x given p = N observation y , , where ( ) ( | , ) x y x m V 1 2 1 2 | | = µ + Σ Σ − 1 − µ = Σ − Σ Σ − 1 Σ ( ) m y V 1 2 1 12 22 2 1 2 11 12 22 21 | | ( ) ( ) Λ − 1 − 1 = Λ ΛΛ + Ψ − µ = − Λ ΛΛ + Ψ T T T T ( ) y I ( ) ( ) 1 1 − − Applying the matrix inversion lemma = − 1 + − 1 H - 1 H G - 1 G - 1 E - F G E E F - E F E ( ) ⇒ ⇒ − Λ 1 − 1 − y 1 = + Λ Ψ = Λ Ψ − µ T T m V ( ) V I 1 2 1 2 1 2 | | | Here we only need to invert a matrix of size | x | × | x |, instead of | y | × | y |. � 5

Geometric interpretation: inference is linear projection � The posterior is: p = N ( x y ) ( x ; m , V ) 1 2 1 2 | | ( ) − Λ 1 − 1 − y 1 = + Λ Ψ = Λ Ψ − µ T T ( ) V I m V 1 2 1 2 1 2 | | | � Posterior covariance does not depend on observed data y ! � Computing the posterior mean is just a linear operation: EM for Factor Analysis � Incomplete data log likelihood function (marginal density of y) N 1 ( ) 1 ∑ T − D y y l θ = − ΛΛ + Ψ − − µ ΛΛ + Ψ − µ T T ( , ) log ( ) ( ) 2 2 n n n [ ] N 1 ( ) 1 ∑ − y µ y µ T = − ΛΛ + Ψ − ΛΛ + Ψ = − − T T log tr , where ( ) ( ) S S n n 2 2 n ∑ µ ML = 1 y � Estimating m is trivial: ˆ N n n Parameters Λ and Ψ are coupled nonlinearly in log-likelihood � � Complete log likelihood ∑ ∑ D p x y p x p y x l θ = = + ( , ) log ( , ) log ( ) log ( | ) n n n n n c n n N 1 N 1 ∑ ∑ T x T x y x 1 y x = − − − Ψ − − Λ Ψ − − Λ log log ( ) ( ) I 2 2 n n 2 2 n n n n n n N 1 N [ ] 1 ∑ ∑ = − Ψ − x T x − Ψ − 1 = y − Λ x y − Λ x T log tr , where ( ) ( ) S S n n n n n n 2 2 2 N n n 6

Probabilistic Graphical Models 10-708 Factor Analysis and State - PDF document

Probabilistic Graphical Models 10-708 Factor Analysis and State Space Factor Analysis and State Space Models Models Eric Xing Eric Xing Lecture 15, Nov 2, 2005 Reading: MJ-Chap. 13,14,15 A road map to more complex dynamic models Y Y Y

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

DOES UNIVERSAL DELIRIUM SCREENING IN ELDERLY INPATIENTS CHANGE OUTCOMES? A WILKINSON, B

LOI: VTX (Layer 1) (Layer 5) Bill Cooper Fermilab VXD Places where VTX is Mentioned 1.2

Robotics 13 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 13 1 13 Robotics 13.1 Robots

eCos in commercial use - the Sinar eMotion Outline Introduction Sinar eMotion Overview

Agenda TRANSDEC: Transportation Decision Project Overview Making Tasks Technologies

Allwyn Brown Assistant Police Chief Richmond, California Diverse, urban community of 110,000

What does a successful smart city business model look like? Eldar Tuzmukhametov Smart City Lab

Nonintercepting ODR Diagnostics for Multi-GeV Electron Beams Alex H. Lumpkin ASD Diagnostics