exact inference for gaussian networks
play

& Exact inference for Gaussian networks Probabilistic Graphical - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2016 Soleymani Multivariate Gaussian distribution 2 /2 1/2 exp{ 1 1 2


  1. Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2016 Soleymani

  2. Multivariate Gaussian distribution 2𝜌 𝑒/2 𝜯 1/2 exp{βˆ’ 1 1 2 π’š βˆ’ 𝝂 π‘ˆ 𝜯 βˆ’1 (π’š βˆ’ 𝝂)} π’ͺ π’š|𝝂, 𝜯 =  The natural, canonical, or information parameterization of a Gaussian distribution arises from quadratic form π’ͺ π’š| π’Š , 𝑲 ∝ exp{βˆ’ 1 2 π’š π‘ˆ π‘²π’š + π’Š π‘ˆ π’š} 𝚳 = 𝑲 = 𝜯 βˆ’1 π’Š = 𝜯 βˆ’1 𝝂 2

  3. Joint Gaussian distribution: block elements  If we partition the vector π’š into π’š 1 and π’š 2 : 𝝂 = 𝝂 1 𝝂 2 π‘ˆ 𝜯 = 𝜯 11 𝜯 12 𝜯 21 = 𝜯 12 𝜯 11 and 𝜯 22 are symmetric 𝜯 21 𝜯 22 π’š 1 π’š 1 𝝂 1 𝝂 2 , 𝜯 11 𝜯 12 𝑄 𝝂, 𝜯 = π’ͺ π’š 2 π’š 2 𝜯 21 𝜯 22 3

  4. Marginal and conditional of Gaussian 𝑦 2 𝑦 2 = 0.7 𝑄(𝑦 1 |𝑦 2 = 0.7) 𝑄(𝑦 1 , 𝑦 2 ) 𝑄(𝑦 1 ) 𝑦 1 [Bishop] For multivariate Gaussian distribution, all marginal and conditional distributions are also Gaussians 4

  5. Matrix inverse lemma βˆ’1 βˆ’π‘πΆπΈ βˆ’1 𝐡 𝐢 𝑁 = 𝐸 βˆ’1 + 𝐸 βˆ’1 𝐷𝑁𝐢𝐸 βˆ’1 βˆ’πΈ βˆ’1 𝐷𝑁 𝐷 𝐸 𝑁 = 𝐡 βˆ’ 𝐢𝐸 βˆ’1 𝐷 βˆ’1 5

  6. Precision matrix  In many situations, it will be convenient to work with 𝚳 = 𝜯 βˆ’1 known as precision matrix: βˆ’1 𝚳 = 𝜯 βˆ’1 = 𝜯 11 𝜯 12 π‘ˆ 𝜯 21 𝜯 22 𝚳 21 = 𝚳 12 𝚳 = 𝚳 11 𝚳 12 𝚳 11 and 𝚳 22 are symmetric 𝚳 21 𝚳 22 Relation between the inverse of a partitioned matrix and the inverses of its partitions (using matrix inverse lemma): βˆ’1 𝜯 21 βˆ’1 𝚳 11 = 𝜯 11 βˆ’ 𝜯 12 𝜯 22 βˆ’1 𝜯 21 βˆ’1 𝜯 12 𝜯 22 βˆ’1 𝚳 12 = βˆ’ 𝜯 11 βˆ’ 𝜯 12 𝜯 22 6

  7. Marginal and conditional distributions based on block elements of 𝚳 𝚳 = 𝜯 βˆ’1 𝚳 = 𝚳 11 𝚳 12  Conditional 𝚳 21 𝚳 22 𝑄 π’š 1 |π’š 2 = π’ͺ π’š 1 |𝝂 1|2 , 𝜯 1|2 βˆ’1 𝚳 12 π’š 2 βˆ’ 𝝂 2 𝝂 1|2 = 𝝂 1 βˆ’ 𝚳 11 linear-Gaussian model βˆ’1 𝜯 1|2 = 𝚳 11  Marginal 𝑄 π’š 1 = π’ͺ π’š 1 |𝝂 1 , 𝜯 1 βˆ’1 βˆ’1 𝚳 21 𝜯 1 = 𝚳 11 βˆ’ 𝚳 12 𝚳 22 7

  8. Marginal and conditional distributions based on block elements of 𝜯  Conditional distributions : 𝑄 π’š 1 |π’š 2 = π’ͺ π’š 1 |𝝂 1|2 , 𝜯 1|2 βˆ’1 π’š 2 βˆ’ 𝝂 2 𝝂 1|2 = 𝝂 1 + 𝜯 12 𝜯 22 βˆ’1 𝜯 21 𝜯 1|2 = 𝜯 11 βˆ’ 𝜯 12 𝜯 22  Marginal distributions based on block element of 𝝂 and 𝜯 : 𝑄 π’š 1 = π’ͺ π’š 1 |𝝂 1 , 𝜯 11 𝑄 π’š 2 = π’ͺ π’š 2 |𝝂 2 , 𝜯 22 8

  9. Factor analysis  Gaussian latent variable 𝒂 ( 𝑀 dimensional)  Continuous latent variable  Observed variable 𝒀 ( 𝐸 dimensional) 𝒂 𝑄 π’œ = π’ͺ π’š 𝟏, 𝑱 𝝂, 𝑩, 𝛀 𝒀 𝑄 π’š|π’œ = π’ͺ π’š 𝝂 + π‘©π’œ, 𝛀 𝒂 ∈ ℝ 𝑀 𝑩 : factor loading 𝐸 Γ— 𝑀 matrix 𝒀 ∈ ℝ 𝐸 𝛀 : diagonal covariance matrix 9

  10. Marginal distribution π’œ 𝑄 π’œ = π’ͺ π’œ 𝟏, 𝑱 π’š = 𝝂 + π‘©π’œ + 𝒙 𝑄 π’š|π’œ = π’ͺ π’š 𝝂 + π‘©π’œ, 𝛀 𝒙~π’ͺ(𝟏, 𝛀) 𝒙 is independent of π’š and π’œ π’š The product of Gaussian distributions are Gaussian, as well as the marginal of Gaussian, thus 𝑄 π’š = 𝑄 π’œ 𝑄 π’š|π’œ π‘’π’œ is Gaussian 𝝂 π’š = 𝐹 π’š = 𝐹 𝝂 + π‘©π’œ = 𝝂 + 𝑩𝐹 π’œ = 𝝂 π’š βˆ’ 𝝂 π‘ˆ = 𝐹 π‘©π’œ + 𝒙 π‘ˆ 𝜯 π’šπ’š = 𝐹 π’š βˆ’ 𝝂 π‘©π’œ + 𝒙 = 𝑩𝐹 π’œπ’œ π‘ˆ 𝑩 π‘ˆ + 𝛀 = 𝑩𝑩 π‘ˆ + 𝛀 β‡’ 𝑄 π’š = π’ͺ π’š|𝝂 , 𝑩𝑩 π‘ˆ + 𝛀 10

  11. Joint Gaussian distribution 𝜯 π’œπ’š = 𝐷𝑝𝑀 π’œ, π’š = 𝐹 π’œ π‘©π’œ + 𝒙 π‘ˆ = 𝑩 π‘ˆ 𝜯 π’šπ’š = 𝑩𝑩 π‘ˆ + 𝜴 𝜯 π’œπ’œ = 𝑱 𝑱 𝑩 β‡’ 𝜯 = 𝑩𝑩 π‘ˆ + 𝜴 𝑩 π‘ˆ 𝐹 π’œ π’š = 𝟏 𝝂 π’œ π’œ 𝟏 𝑱 𝑩 β‡’ 𝑄 = π’ͺ 𝝂 , 𝑩𝑩 π‘ˆ + 𝜴 𝑩 π‘ˆ π’š π’š 11

  12. 𝑄 π’š 1 |π’š 2 = π’ͺ π’š 1 |𝝂 1|2 , 𝜯 1|2 βˆ’1 π’š 2 βˆ’ 𝝂 2 𝝂 1|2 = 𝝂 1 + 𝜯 12 𝜯 22 Conditional distributions βˆ’1 𝜯 21 𝜯 1|2 = 𝜯 11 βˆ’ 𝜯 12 𝜯 22 𝑄 π’œ|π’š = π’ͺ π’œ|𝝂 π’œ|π’š , 𝚻 π’œ|π’š 𝝂 π’œ|π’š = 𝑩 π‘ˆ 𝑩𝑩 π‘ˆ + 𝛀 βˆ’1 π’š βˆ’ 𝝂 𝚻 π’œ|π’š = 𝑱 βˆ’ 𝑩 π‘ˆ 𝑩𝑩 π‘ˆ + 𝛀 βˆ’1 𝑩  𝐸 Γ— 𝐸 matrix is required to be inverted. If 𝑀 < 𝐸 , it is preferred to use: 𝝂 π’œ|π’š = 𝑱 + 𝑩 π‘ˆ 𝛀 βˆ’1 𝑩 βˆ’1 𝑩 π‘ˆ 𝛀 βˆ’1 π’š βˆ’ 𝝂 = 𝚻 π’œ|π’š 𝑩 π‘ˆ 𝛀 βˆ’1 π’š βˆ’ 𝝂 𝚻 π’œ|π’š = 𝑱 + 𝑩 π‘ˆ 𝛀 βˆ’1 𝑩 βˆ’1 Posterior covariance does not depend on observed data π’š ! Computing the posterior mean is a linear operation 12 𝐡 βˆ’ 𝐢𝐸 βˆ’1 𝐷 βˆ’1 = 𝐡 βˆ’1 + 𝐡 βˆ’1 𝐢 𝐸 βˆ’ 𝐷𝐡 βˆ’1 𝐢 βˆ’1 𝐷𝐡 βˆ’1

  13. Geometric illustration π’š 𝑦 3 𝑄(π’œ|π’š) [Jordan] 𝑄(π’œ) 𝑦 2 𝑦 1 To generate data, first generate a point within the manifold then add noise. 13

  14. Factor analysis: dimensionality reduction  FA is just a constrained Gaussian model  If 𝜴 were not diagonal then we could model any Gaussian  FA is a low rank parameterization of a multi-variate Gaussian  Since 𝑄 π’š = π’ͺ π’š|𝝂 , 𝑩𝑩 π‘ˆ + 𝛀 , FA approximates the covariance matrix of the visible vector using a low-rank decomposition 𝑩𝑩 π‘ˆ and the diagonal matrix 𝛀  𝑩𝑩 π‘ˆ + 𝛀 is the outer product of two low-rank matrices plus a diagonal matrix (i.e., 𝑃(𝑀𝐸) parameters instead of 𝑃(𝐸 2 ) )  Given {π’š 1 , … , π’š (𝑂) } (the observation on high dimensional data), by learning from incomplete data we find 𝑩 for transforming data to a lower dimensional space 14

  15. Incomplete likelihood β„“ 𝜾; 𝒠 𝑂 = βˆ’ 𝑂 2 log 𝑩𝑩 π‘ˆ + 𝜴 βˆ’ 1 π‘ˆ 𝑩𝑩 π‘ˆ + 𝜴 βˆ’1 π’š π‘œ βˆ’ 𝝂 π’š π‘œ βˆ’ 𝝂 2 π‘œ=1 = βˆ’ 𝑂 2 log 𝑩𝑩 π‘ˆ + 𝜴 βˆ’ 1 2 𝑒𝑠 𝑩𝑩 π‘ˆ + 𝜴 βˆ’1 𝑻 𝑂 π‘ˆ π’š π‘œ βˆ’ 𝝂 π’š π‘œ βˆ’ 𝝂 𝑻 = π‘œ=1 𝑂 𝝂 𝑁𝑀 = 1 π’š π‘œ 𝑂 π‘œ=1 15

  16. E-step: expected sufficient statistics 𝑂 𝐹 𝑄 π’œ (π‘œ) |π’š (π‘œ) ,𝜾 𝑒 log 𝑄 π’œ (π‘œ) 𝜾 + log 𝑄 π’š (π‘œ) π’œ (π‘œ) , 𝜾 𝐹 𝑄 β„‹|𝒠,𝜾 𝑒 log 𝑄 𝒠, β„‹ 𝜾 = π‘œ=1  Expected sufficient statistics: 𝑂 = βˆ’ 𝑂 2 log 𝜴 βˆ’ 1 𝑒𝑠 𝐹 π’œ π‘œ π’œ π‘œ π‘ˆ 𝐹 log 𝑄 𝒠, β„‹ 𝜾 2 π‘œ=1 𝑂 βˆ’ 1 π‘ˆ 𝜴 βˆ’1 + 𝑑 π’š π‘œ βˆ’ π‘©π’œ π‘œ π’š π‘œ βˆ’ π‘©π’œ π‘œ 2 𝑒𝑠 𝐹 π‘œ=1 π’š π‘œ βˆ’ π‘©π’œ (π‘œ) π‘ˆ π’š π‘œ βˆ’ π‘©π’œ (π‘œ) 𝐹 = π’š π‘œ π’š π‘œ π‘ˆ βˆ’ 𝑩𝐹 π’œ π‘œ π’š π‘œ βˆ’ π’š π‘œ 𝐹 π’œ π‘œ π‘ˆ 𝑩 π‘ˆ + 𝑩𝐹 π’œ (π‘œ) π’œ π‘œ π‘ˆ 𝑩 π‘ˆ = 𝝂 π’œ|π’š (π‘œ) = 𝚻 π’œ|π’š 𝑩 π‘ˆ 𝜴 βˆ’1 π’š π‘œ βˆ’ 𝝂 𝐹 𝑄 π’œ (π‘œ) |π’š (π‘œ) ,𝜾 𝑒 π’œ π‘œ π’œ (π‘œ) π’œ π‘œ π‘ˆ = 𝚻 π’œ|π’š (π‘œ) + 𝝂 π’œ|π’š (π‘œ) 𝝂 π’œ|π’š π‘œ π‘ˆ 𝐹 𝑄 π’œ (π‘œ) |π’š (π‘œ) ,𝜾 𝑒 𝚻 π’œ|π’š = 𝑱 + 𝑩 π‘ˆ 𝛀 βˆ’1 𝑩 βˆ’1 16 𝝂 π’œ|π’š = 𝚻 π’œ|π’š 𝑩 π‘ˆ 𝛀 βˆ’1 π’š βˆ’ 𝝂

  17. M-Step βˆ’1 𝑂 𝑂 π‘ˆ 𝐹 π’œ π‘œ π’œ π‘œ π‘ˆ 𝑩 𝑒+1 = π’š π‘œ 𝐹 π’œ π‘œ π‘œ=1 π‘œ=1 𝜴 𝑒+1 = 1 π’š π‘œ βˆ’ 𝑩 𝑒+1 π’œ (π‘œ) π‘ˆ π’š π‘œ βˆ’ 𝑩 𝑒+1 π’œ (π‘œ) 𝑂 diag 𝐹 𝑂 𝑂 π’š π‘œ π’š π‘œ π‘ˆ βˆ’ 𝑩 𝑒+1 𝐹 π’œ π‘œ π’š π‘œ π‘ˆ ) = diag( π‘œ=1 π‘œ=1 17

  18. Unidentifiability  𝑩 only appears as outer product 𝑩𝑩 π‘ˆ , thus the model is invariant to rotation and axis flips of the latent space.  𝑩 can be replaced with 𝑩𝑹 for any orthonormal matrix 𝑹 and the model containing only 𝑩𝑩 π‘ˆ remains the same.  Thus, FA is an un-identifiable model.  Likelihood objective function on a set of data will not have a unique maximum (an infinite number of parameters give the maximum score)  It not be guaranteed to identify the same parameters. 18

  19. Probabilistic PCA & FA  Data is a linear function of low-dimensional latent coordinates, plus Gaussian noise  Factor analysis: 𝜴 is a general diagonal matrix  Probabilistic PCA: 𝜴 = 𝛽𝑱 𝑄 π’œ = π’ͺ(π’œ|𝟏, 𝑱) 𝒃 𝑄 π’š π’œ = π’ͺ π’š|π‘©π’œ + 𝝂, 𝜴 𝑄 π’š = π’ͺ π’š|𝝂, 𝑩𝑩 𝑼 + 𝜴 𝑨 𝒃 19 [Bishop]

  20. PPCA Posterior mean is not an orthogonal projection, since it is shrunk somewhat towards the prior mean [Murphy] 20

  21. Exact inference for Gaussian networks 21

  22. Multivariate Gaussian distribution 2𝜌 𝑒/2 𝜯 1/2 exp{βˆ’ 1 1 2 π’š βˆ’ 𝝂 π‘ˆ 𝜯 βˆ’1 (π’š βˆ’ 𝝂)} 𝑄 𝒀 = 𝑄 𝒀 ∝ exp{βˆ’ 1 2 π’š π‘ˆ π‘²π’š + 𝑲𝝂 π‘ˆ π’š} 𝑲 = 𝜯 βˆ’1 𝑄 is normalizable (i.e., normalization constant is finite) and defines a legal Gaussian distribution  Directed model if and only if 𝑲 is positive definite.  Linear Gaussian model  Undirected model  Gaussian MRF 22

  23. Linear-Gaussian model  Linear-Gaussian model for CPDs: 𝑄 π‘Œ 𝑗 𝑄𝑏 π‘Œ 𝑗 = π’ͺ π‘Œ 𝑗 | π‘₯ π‘—π‘˜ π‘Œ π‘˜ + 𝑐 𝑗 , 𝑀 𝑗 π‘Œ π‘˜ βˆˆπ‘„π‘ π‘Œ 𝑗  The joint distribution is Gaussian: ln 𝑄 π‘Œ 1 , … , π‘Œ 𝐸 2 𝐸 1 = βˆ’ π‘Œ 𝑗 βˆ’ π‘₯ π‘—π‘˜ π‘Œ π‘˜ βˆ’ 𝑐 𝑗 + 𝐷 2𝑀 𝑗 𝑗=1 π‘Œ π‘˜ βˆˆπ‘„π‘ π‘Œ 𝑗 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend