Factor analysis & Exact inference for Gaussian networks - - PowerPoint PPT Presentation

β–Ά
factor analysis exact inference for gaussian networks
SMART_READER_LITE
LIVE PREVIEW

Factor analysis & Exact inference for Gaussian networks - - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Multivariate Gaussian distribution 2 /2 1/2 exp{ 1 1 2


slide-1
SLIDE 1

Factor analysis & Exact inference for Gaussian networks

Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani

slide-2
SLIDE 2

Multivariate Gaussian distribution

2

π’ͺ π’š|𝝂, 𝜯 = 1 2𝜌 𝑒/2 𝜯 1/2 exp{βˆ’ 1 2 π’š βˆ’ 𝝂 π‘ˆπœ―βˆ’1(π’š βˆ’ 𝝂)}

 The natural, canonical, or information parameterization of

a Gaussian distribution arises from quadratic form π’ͺ π’š|π’Š, 𝑲 ∝ exp{βˆ’ 1 2 π’šπ‘ˆπ‘²π’š + π’Šπ‘ˆπ’š}

𝚳 = 𝑲 = πœ―βˆ’1 π’Š = πœ―βˆ’1𝝂

slide-3
SLIDE 3

Joint Gaussian distribution: block elements

3

 If we partition the vector π’š into π’š1 and π’š2:

𝝂 = 𝝂1 𝝂2 𝜯 = 𝜯11 𝜯12 𝜯21 𝜯22 𝑄 π’š1 π’š2 𝝂, 𝜯 = π’ͺ π’š1 π’š2 𝝂1 𝝂2 , 𝜯11 𝜯12 𝜯21 𝜯22

𝜯21 = 𝜯12

π‘ˆ

𝜯11 and 𝜯22 are symmetric

slide-4
SLIDE 4

Marginal and conditional of Gaussian

4

𝑦1 𝑦2 𝑄(𝑦1, 𝑦2) 𝑦2 = 0.7 𝑄(𝑦1) 𝑄(𝑦1|𝑦2 = 0.7) For multivariate Gaussian distribution, all marginal and conditional distributions are also Gaussians [Bishop]

slide-5
SLIDE 5

Matrix inverse lemma

5

𝐡 𝐢 𝐷 𝐸

βˆ’1

= 𝑁 βˆ’π‘πΆπΈβˆ’1 βˆ’πΈβˆ’1𝐷𝑁 πΈβˆ’1 + πΈβˆ’1π·π‘πΆπΈβˆ’1 𝑁 = 𝐡 βˆ’ πΆπΈβˆ’1𝐷 βˆ’1

slide-6
SLIDE 6

Precision matrix

6

 In many situations, it will be convenient to work with 𝚳 = πœ―βˆ’1

known as precision matrix:

𝚳 = πœ―βˆ’1 = 𝜯11 𝜯12 𝜯21 𝜯22

βˆ’1

𝚳 = 𝚳11 𝚳12 𝚳21 𝚳22

Relation between the inverse of a partitioned matrix and the inverses of its partitions (using matrix inverse lemma): 𝚳11 = 𝜯11 βˆ’ 𝜯12𝜯22

βˆ’1𝜯21 βˆ’1

𝚳12 = βˆ’ 𝜯11 βˆ’ 𝜯12𝜯22

βˆ’1𝜯21 βˆ’1𝜯12𝜯22 βˆ’1

𝚳21 = 𝚳12

π‘ˆ

𝚳11 and 𝚳22 are symmetric

slide-7
SLIDE 7

Marginal and conditional distributions based on block elements of 𝚳

7

 Conditional

𝑄 π’š1|π’š2 = π’ͺ π’š1|𝝂1|2, 𝜯1|2 𝝂1|2 = 𝝂1 βˆ’ 𝚳11

βˆ’1𝚳12 π’š2 βˆ’ 𝝂2

𝜯1|2 = 𝚳11

βˆ’1

 Marginal

𝑄 π’š1 = π’ͺ π’š1|𝝂1, 𝜯1 𝜯1 = 𝚳11 βˆ’ 𝚳12𝚳22

βˆ’1𝚳21 βˆ’1

𝚳 = πœ―βˆ’1 𝚳 = 𝚳11 𝚳12 𝚳21 𝚳22 linear-Gaussian model

slide-8
SLIDE 8

Marginal and conditional distributions based on block elements of 𝜯

8

 Conditional distributions :

𝑄 π’š1|π’š2 = π’ͺ π’š1|𝝂1|2, 𝜯1|2 𝝂1|2 = 𝝂1 + 𝜯12𝜯22

βˆ’1 π’š2 βˆ’ 𝝂2

𝜯1|2 = 𝜯11 βˆ’ 𝜯12𝜯22

βˆ’1𝜯21

 Marginal distributions based on block element of 𝝂 and 𝜯:

𝑄 π’š1 = π’ͺ π’š1|𝝂1, 𝜯11 𝑄 π’š2 = π’ͺ π’š2|𝝂2, 𝜯22

slide-9
SLIDE 9

Factor analysis

9

 Gaussian latent variable 𝒂 (𝑀 dimensional)

 Continuous latent variable  Can be used for dimensionality reduction

 Observed variable 𝒀 (𝑀 < 𝐸 dimensional)

𝑄 π’œ = π’ͺ π’š 𝟏, 𝑱 𝑄 π’š|π’œ = π’ͺ π’š 𝝂 + π‘©π’œ, 𝛀

𝒂 𝒀

𝑩: factor loading 𝐸 Γ— 𝑀 matrix 𝛀: diagonal covariance matrix

𝒂 ∈ ℝ𝑀 𝒀 ∈ ℝ𝐸 𝝂, 𝑩, 𝛀

slide-10
SLIDE 10

Marginal distribution

10

π’š = 𝝂 + π‘©π’œ + 𝒙 𝒙~π’ͺ(𝟏, 𝛀)

The product of Gaussian distributions are Gaussian, as well as the marginal of Gaussian, thus 𝑄 π’š = 𝑄 π’œ 𝑄 π’š|π’œ π‘’π’œ is Gaussian

π‚π’š = 𝐹 π’š = 𝐹 𝝂 + π‘©π’œ + 𝒙 = 𝝂 + 𝑩𝐹 π’œ = 𝝂 πœ―π’šπ’š = 𝐹 π’š βˆ’ 𝝂 π’š βˆ’ 𝝂 π‘ˆ = 𝐹 π‘©π’œ + 𝒙 π‘©π’œ + 𝒙 π‘ˆ = 𝑩𝐹 π’œπ’œπ‘ˆ π‘©π‘ˆ + 𝛀 = π‘©π‘©π‘ˆ + 𝛀 β‡’ 𝑄 π’š = π’ͺ π’š|𝝂,π‘©π‘©π‘ˆ + 𝛀

𝒙 is independent of π’š and π’œ π’œ π’š 𝑄 π’œ = π’ͺ π’œ 𝟏, 𝑱 𝑄 π’š|π’œ = π’ͺ π’š 𝝂 + π‘©π’œ, 𝛀

slide-11
SLIDE 11

Joint Gaussian distribution

11

πœ―π’œπ’š = 𝐷𝑝𝑀 π’œ, π’š = 𝐹 π’œ π‘©π’œ + 𝒙 π‘ˆ = π‘©π‘ˆ

πœ―π’šπ’š = π‘©π‘©π‘ˆ + 𝜴

πœ―π’œπ’œ = 𝑱

β‡’ 𝜯 = 𝑱 𝑩 π‘©π‘ˆ π‘©π‘©π‘ˆ + 𝜴 𝐹 π’œ π’š = 𝟏 𝝂 β‡’ 𝑄 π’œ π’š = π’ͺ π’œ π’š 𝟏 𝝂 , 𝑱 𝑩 π‘©π‘ˆ π‘©π‘©π‘ˆ + 𝜴

slide-12
SLIDE 12

Conditional distributions

12

𝑄 π’œ|π’š = π’ͺ π’œ|π‚π’œ|π’š,πš»π’œ|π’š π‚π’œ|π’š = π‘©π‘ˆ π‘©π‘©π‘ˆ + 𝛀 βˆ’1 π’š βˆ’ 𝝂 πš»π’œ|π’š = 𝑱 βˆ’ π‘©π‘ˆ π‘©π‘©π‘ˆ + 𝛀 βˆ’1𝑩

 𝐸 Γ— 𝐸 matrix is required to be inverted. If

𝑀 < 𝐸, it is preferred to use:

π‚π’œ|π’š = 𝑱 + π‘©π‘ˆπ›€βˆ’1𝑩 βˆ’1π‘©π‘ˆπ›€βˆ’1 π’š βˆ’ 𝝂 = πš»π’œ|π’šπ‘©π‘ˆπ›€βˆ’1 π’š βˆ’ 𝝂 πš»π’œ|π’š = 𝑱 + π‘©π‘ˆπ›€βˆ’1𝑩 βˆ’1

𝐡 βˆ’ πΆπΈβˆ’1𝐷 βˆ’1 = π΅βˆ’1 + π΅βˆ’1𝐢 𝐸 βˆ’ π·π΅βˆ’1𝐢 βˆ’1π·π΅βˆ’1 Posterior covariance does not depend on observed data π’š! Computing the posterior mean is a linear operation 𝑄 π’š1|π’š2 = π’ͺ π’š1|𝝂1|2, 𝜯1|2 𝝂1|2 = 𝝂1 + 𝜯12𝜯22

βˆ’1 π’š2 βˆ’ 𝝂2

𝜯1|2 = 𝜯11 βˆ’ 𝜯12𝜯22

βˆ’1𝜯21

𝜯 = 𝑱 𝑩 π‘©π‘ˆ π‘©π‘©π‘ˆ + 𝜴

slide-13
SLIDE 13

Geometric illustration

13

𝑄(π’œ) 𝑄(π’œ|π’š) To generate data, first generate a point within the manifold then add noise. 𝑦1 𝑦2 𝑦3 π’š [Jordan]

slide-14
SLIDE 14

FA example

14

 Data is a linear function of low-dimensional latent

coordinates, plus Gaussian noise

𝑨 𝒃

𝑄 π’œ = π’ͺ(π’œ|𝟏, 𝑱) 𝑄 π’š π’œ = π’ͺ π’š|π‘©π’œ + 𝝂, 𝜴 𝑄 π’š = π’ͺ π’š|𝝂, 𝑩𝑩𝑼 + 𝜴

[Bishop] 𝒃

slide-15
SLIDE 15

Factor analysis: dimensionality reduction

15

 FA is just a constrained Gaussian model

 If 𝜴 were not diagonal then we could model any Gaussian

 FA is a low rank parameterization of a multi-variate Gaussian

 Since 𝑄 π’š = π’ͺ π’š|𝝂,π‘©π‘©π‘ˆ + 𝛀 , FA approximates the covariance matrix

  • f the visible vector using a low-rank decomposition π‘©π‘©π‘ˆ and the

diagonal matrix 𝛀

 π‘©π‘©π‘ˆ + 𝛀 is the outer product of two low-rank matrices plus a diagonal matrix

(i.e., 𝑃(𝑀𝐸) parameters instead of 𝑃(𝐸2))

 Given {π’š 1 , … , π’š(𝑂)} (the observation on high dimensional

data), by learning from incomplete data we find 𝑩 for transforming data to a lower dimensional space

slide-16
SLIDE 16

Incomplete likelihood

16

β„“ 𝜾; 𝒠 = βˆ’ 𝑂 2 log π‘©π‘©π‘ˆ + 𝜴 βˆ’ 1 2

π‘œ=1 𝑂

π’š π‘œ βˆ’ 𝝂

π‘ˆ π‘©π‘©π‘ˆ + 𝜴 βˆ’1 π’š π‘œ βˆ’ 𝝂

= βˆ’ 𝑂 2 log π‘©π‘©π‘ˆ + 𝜴 βˆ’ 1 2 𝑒𝑠 π‘©π‘©π‘ˆ + 𝜴 βˆ’1𝑻 𝑻 =

π‘œ=1 𝑂

π’š π‘œ βˆ’ 𝝂 π’š π‘œ βˆ’ 𝝂

π‘ˆ

𝝂𝑁𝑀 = 1 𝑂

π‘œ=1 𝑂

π’š π‘œ

slide-17
SLIDE 17

E-step: expected sufficient statistics

17

𝐹𝑄 β„‹|𝒠,πœΎπ‘’ log 𝑄 𝒠, β„‹ 𝜾 =

π‘œ=1 𝑂

𝐹𝑄 π’œ(π‘œ)|π’š(π‘œ),πœΎπ‘’ log 𝑄 π’œ(π‘œ) 𝜾 + log 𝑄 π’š(π‘œ) π’œ(π‘œ), 𝜾

 Expected sufficient statistics:

𝐹 log 𝑄 𝒠, β„‹ 𝜾 = βˆ’ 𝑂 2 log 𝜴 βˆ’ 1 2

π‘œ=1 𝑂

𝑒𝑠 𝐹 π’œ π‘œ π’œ π‘œ π‘ˆ βˆ’ 1 2

π‘œ=1 𝑂

𝑒𝑠 𝐹 π’š π‘œ βˆ’ π‘©π’œ π‘œ π’š π‘œ βˆ’ π‘©π’œ π‘œ

π‘ˆ πœ΄βˆ’1 + 𝑑

𝐹 π’š π‘œ βˆ’ π‘©π’œ(π‘œ) π’š π‘œ βˆ’ π‘©π’œ(π‘œ) π‘ˆ = π’š π‘œ π’š π‘œ π‘ˆ βˆ’ 𝑩𝐹 π’œ π‘œ π’š π‘œ βˆ’ π’š π‘œ 𝐹 π’œ π‘œ

π‘ˆπ‘©π‘ˆ + 𝑩𝐹 π’œ(π‘œ)π’œ π‘œ π‘ˆ π‘©π‘ˆ

𝐹𝑄 π’œ(π‘œ)|π’š(π‘œ),πœΎπ‘’ π’œ π‘œ = π‚π’œ|π’š(π‘œ) = πš»π’œ|π’š(π‘œ)π‘©π‘ˆπœ΄βˆ’1 π’š π‘œ βˆ’ 𝝂 𝐹𝑄 π’œ(π‘œ)|π’š(π‘œ),πœΎπ‘’ π’œ(π‘œ)π’œ π‘œ π‘ˆ = πš»π’œ|π’š(π‘œ) + π‚π’œ|π’š(π‘œ)π‚π’œ|π’š π‘œ

π‘ˆ

πš»π’œ|π’š = 𝑱 + π‘©π‘ˆπ›€βˆ’1𝑩 βˆ’1 π‚π’œ|π’š = πš»π’œ|π’šπ‘©π‘ˆπ›€βˆ’1 π’š βˆ’ 𝝂

slide-18
SLIDE 18

M-Step

18

𝑩𝑒+1 =

π‘œ=1 𝑂

π’š π‘œ 𝐹 π’œ π‘œ

π‘ˆ π‘œ=1 𝑂

𝐹 π’œ π‘œ π’œ π‘œ π‘ˆ

βˆ’1

πœ΄π‘’+1 = 1 𝑂 diag 𝐹 π’š π‘œ βˆ’ 𝑩𝑒+1π’œ(π‘œ) π’š π‘œ βˆ’ 𝑩𝑒+1π’œ(π‘œ) π‘ˆ = diag(

π‘œ=1 𝑂

π’š π‘œ π’š π‘œ π‘ˆ βˆ’ 𝑩𝑒+1

π‘œ=1 𝑂

𝐹 π’œ π‘œ π’š π‘œ π‘ˆ)

slide-19
SLIDE 19

Unidentifiability

19

 𝑩 only appears as outer product π‘©π‘©π‘ˆ, thus the model is

invariant to rotation and axis flips of the latent space.

 𝑩 can be replaced with 𝑩𝑹 for any orthonormal matrix 𝑹 and

the model containing only π‘©π‘©π‘ˆ remains the same.

 Thus, FA is an un-identifiable model.

 Likelihood objective function on a set of data will not have a

unique maximum (an infinite number of parameters give the maximum score)

 It not be guaranteed to identify the same parameters.

slide-20
SLIDE 20

Probabilistic PCA (PPCA)

20

 Factor analysis: 𝜴 is a general diagonal matrix  Probabilistic PCA: 𝜴 = 𝛽𝑱 and 𝑩 is orthogonal

Posterior mean is not an orthogonal projection, since it is shrunk somewhat towards the prior mean [Murphy] PCA

slide-21
SLIDE 21

21

Exact inference for Gaussian networks

slide-22
SLIDE 22

Multivariate Gaussian distribution

22

𝑄 𝒀 = 1 2𝜌 𝑒/2 𝜯 1/2 exp{βˆ’ 1 2 π’š βˆ’ 𝝂 π‘ˆπœ―βˆ’1(π’š βˆ’ 𝝂)} 𝑄 𝒀 ∝ exp{βˆ’ 1 2 π’šπ‘ˆπ‘²π’š + 𝑲𝝂 π‘ˆπ’š}

 Directed model

 Linear Gaussian model

 Undirected model

 Gaussian MRF

𝑲 = πœ―βˆ’1 𝑄 is normalizable (i.e., normalization constant is finite) and defines a legal Gaussian distribution if and only if 𝑲 is positive definite.

slide-23
SLIDE 23

Linear-Gaussian model

23

 Linear-Gaussian model for CPDs:

𝑄 π‘Œπ‘— 𝑄𝑏 π‘Œπ‘— = π’ͺ π‘Œπ‘—|

π‘Œπ‘˜βˆˆπ‘„π‘ π‘Œπ‘—

π‘₯π‘—π‘˜π‘Œ

π‘˜ + 𝑐𝑗 , 𝑀𝑗

 The joint distribution is Gaussian:

ln 𝑄 π‘Œ1, … , π‘ŒπΈ = βˆ’

𝑗=1 𝐸

1 2𝑀𝑗 π‘Œπ‘— βˆ’

π‘Œπ‘˜βˆˆπ‘„π‘ π‘Œπ‘—

π‘₯π‘—π‘˜π‘Œ

π‘˜ βˆ’ 𝑐𝑗 2

+ 𝐷

slide-24
SLIDE 24

From linear-Gaussian model to joint multi- variate distribution

24

 We can find the parameters of the multi-variate Gaussian

from the linear-Gaussian model

 Mean and covariences (π‘Œπ‘—s are in topological order):

𝐹 π‘Œπ‘— =

π‘Œπ‘˜βˆˆπ‘„π‘ π‘Œπ‘—

π‘₯π‘—π‘˜πΉ π‘Œ

π‘˜ + 𝑐𝑗

𝑑𝑝𝑀 π‘Œπ‘—, π‘Œ

π‘˜ = π‘Œπ‘™βˆˆπ‘„π‘ π‘Œπ‘˜

π‘₯𝑗𝑙𝑑𝑝𝑀 π‘Œπ‘—, π‘Œπ‘™ + π½π‘—π‘˜π‘€π‘˜

slide-25
SLIDE 25

Multivariate Gaussian: directed model example

25

 Linear Gaussian

π‘Œ2 π‘Œ1 𝑄 π‘Œ1 = π’ͺ(π‘Œ1|𝑐1, 𝑀1) 𝑄 π‘Œ2 π‘Œ1 = π’ͺ(π‘Œ2|π‘₯π‘Œ1 + 𝑐2, 𝑀2) π‘₯ = 1 𝑐2 = 0.5 𝑀2 = 0.2 π‘Œ1 π‘Œ2 π‘ž(π‘Œ2|π‘Œ1) π‘Œ1 π‘Œ2 π‘ž(π‘Œ1, π‘Œ2) 𝑐1 = 2 𝑀1 = 0.5

slide-26
SLIDE 26

Gaussian Bayesian networks

26

 We define a Gaussian Bayesian network to be a Bayesian

network all of whose variables are continuous, and where all

  • f the CPDs are linear Gaussians.

 For Gaussian networks, the joint distribution has a compact

representation

 the number of variables will be quadratic

 Transformations from the network to the joint and back have

a fairly simple and efficiently computable closed form

slide-27
SLIDE 27

Independencies in multivariate Gaussian

27

 πœ―π‘—π‘˜

βˆ’1 = 0 ⇔ π‘Œπ‘— βŠ₯ π‘Œ π‘˜|𝒀 βˆ’ {π‘Œπ‘—, π‘Œ π‘˜}

 If π‘Œ1, … , π‘ŒπΈ have a joint normal distribution 𝑄 𝒀 = π’ͺ 𝝂, 𝜯

then πœ―π‘—π‘˜

βˆ’1 = 0 if and only if 𝑄 ⊨ π‘Œπ‘— βŠ₯ π‘Œ π‘˜|𝒀 βˆ’ {π‘Œπ‘—, π‘Œ π‘˜}

 πœ―π‘—π‘˜ = 0 ⇔ π‘Œπ‘— βŠ₯ π‘Œ

π‘˜

 If π‘Œ1, … , π‘ŒπΈ have a joint normal distribution π’ͺ 𝝂, 𝜯 then πœ―π‘—π‘˜

= 0 if and only if π‘Œπ‘— βŠ₯ π‘Œ

π‘˜

slide-28
SLIDE 28

Sparsity in covariance matrix

28

π‘Œ1 π‘Œ2 π‘Œ3 π‘Œ4 π‘Œ1 π‘Œ2 π‘Œ3 𝜯13 = 𝜯31 = 0 𝜯13

βˆ’1 β‰  0

If the parametrization is not degenerate, 𝜯 would be dense (i.e., βˆ€π‘—, π‘˜ πœ―π‘—π‘˜ β‰  0) 𝜯 = βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ—

slide-29
SLIDE 29

Multivariate Gaussian: undirected model

29

 A Gaussian

distribution can be represented by a fully connected graph with pairwise (edge) potentials

 Gaussian MRF

 The overall energy has the form

𝐹(π’š) = 1 2

𝑗,π‘˜

𝑦𝑗 βˆ’ πœˆπ‘— πš»π‘—π‘˜

βˆ’1 π‘¦π‘˜ βˆ’ πœˆπ‘˜

πœ—π‘—,π‘˜ 𝑦𝑗, π‘¦π‘˜ = βˆ’πš»π‘—π‘˜

βˆ’1π‘¦π‘—π‘¦π‘˜,

𝑗 < π‘˜ πœ—π‘— 𝑦𝑗 = βˆ’ 1 2 πš»π‘—π‘—

βˆ’1𝑦𝑗 2 + 𝑦𝑗 π‘˜

πš»π‘—π‘˜

βˆ’1πœˆπ‘˜

slide-30
SLIDE 30

Multivariate Gaussian: undirected model

30

 πœ―π‘—π‘˜

βˆ’1 = 0 ⇔ π‘Œπ‘— βŠ₯ π‘Œ π‘˜|𝒀 βˆ’ {π‘Œπ‘—, π‘Œ π‘˜}

 If π‘Œ1, … , π‘ŒπΈ have a joint normal distribution 𝑄 𝒀 = π’ͺ 𝝂, 𝜯

then πœ―π‘—π‘˜

βˆ’1 = 0 if and only if 𝑄 ⊨ π‘Œπ‘— βŠ₯ π‘Œ π‘˜|𝒀 βˆ’ {π‘Œπ‘—, π‘Œ π‘˜}

 We can view the information matrix as directly defining a

minimal I-map Markov network for the distribution

 whereby nonzero entries correspond to edges in the network.

slide-31
SLIDE 31

Sparsity in precision matrix πœ―βˆ’1

31

πœ―βˆ’1 = βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ— βˆ—

𝐡 𝐢 𝐷 𝐸

slide-32
SLIDE 32

BP for continuous variables

32

𝑦3 𝑦4 𝑦1 𝑛32(𝑦2) 𝑛42(𝑦2) 𝑛21(𝑦1) 𝑦2

slide-33
SLIDE 33

Belief propagation: integral-product

33

𝑛𝑙𝑗(𝑦𝑗) π‘›π‘šπ‘—(𝑦𝑗) π‘›π‘—π‘˜(π‘¦π‘˜) 𝑗 π‘˜

 Messages

π‘›π‘—π‘˜ π‘¦π‘˜ =

𝑦𝑗

𝜚 𝑦𝑗 𝜚 𝑦𝑗, π‘¦π‘˜

π‘™βˆˆπ’ͺ(𝑗)\j

𝑛𝑙𝑗(𝑦𝑗) 𝑒𝑦𝑗

 Marginal probability function

𝑄 𝑦𝑗 ∝ 𝜚 𝑦𝑗

π‘™βˆˆπ’ͺ(𝑗)

𝑛𝑙𝑗(𝑦𝑗)

slide-34
SLIDE 34

BP for continuous variables

34

 Is there a finitely parameterized, closed form for the

message and marginal functions?

 Is there an analytic formula for the message integral,

phrased as an update of these parameters?

slide-35
SLIDE 35

Canonical form properties

35

 The product of two canonical form is in the canonical form  The division of two canonical form is also in the canonical

form

 The marginalization of a canonical form onto a subset of its

variables 𝒁 results in a canonical form (when πœ―π’π’ is positive definite)

 Instantiating a subset of variables results in a canonical form

slide-36
SLIDE 36

Canonical forms

36

= for will be

slide-37
SLIDE 37

Messages and marginals for Gaussian networks

37

 We use the canonical form as a finitely parameterized, closed

form to show the message and marginal functions

 The message integral is phrased in the canonical form and its parameters

can be found based on the parameters involved in the integral

 The inference algorithm will be correct since we can show

that it executes a marginalization step only on canonical well- defined forms (for which this operation is well defined).

slide-38
SLIDE 38

Gaussian Markov network: factors

38

 The graph topology can be specified by the structure of the

matrix 𝑲, i.e. the edges set {𝑗, π‘˜} includes all non-zero entries of 𝑲 for which 𝑗 > π‘˜:

πœšπ‘—π‘˜ 𝑦𝑗, π‘¦π‘˜ = exp βˆ’ 1 2 π‘¦π‘—πΎπ‘—π‘˜π‘¦π‘˜ πœšπ‘— 𝑦𝑗 = exp βˆ’ 1 2 𝐾𝑗𝑗𝑦𝑗

2 + β„Žπ‘—π‘¦π‘—

πœšπ‘— 𝑦𝑗 ∝ π’ͺ πœˆπ‘—π‘—, 𝐾𝑗𝑗

βˆ’1

β„Žπ‘—π‘— = πΎπ‘—π‘—πœˆπ‘—π‘— 𝐾𝑗𝑗 = Σ𝑗𝑗

βˆ’1

One form of parametrizing factors (that is not uniquely defined)

slide-39
SLIDE 39

Gaussian Markov network: messages

39

 If we assume Gaussian MRF:

 Messages and marginal functions are all Gaussian  Updates will be in terms of updating parameters 𝐾 and β„Ž

 πœšπ‘— 𝑦𝑗 π‘™βˆˆπ’ͺ(𝑗)\π‘˜ 𝑛𝑙𝑗(𝑦𝑗) ∝ π’ͺ(πœˆπ‘—\π‘˜, 𝐾𝑗\π‘˜

βˆ’1)

 𝐾𝑗\π‘˜ = 𝐾𝑗𝑗 + π‘™βˆˆπ’ͺ(𝑗)\π‘˜ 𝐾𝑙→𝑗  β„Žπ‘—\π‘˜ = β„Žπ‘—π‘— + π‘™βˆˆπ’ͺ(𝑗)\π‘˜ β„Žπ‘™β†’π‘—

 π‘›π‘—π‘˜ π‘¦π‘˜ = πœšπ‘—,π‘˜ 𝑦𝑗, π‘¦π‘˜ πœšπ‘— 𝑦𝑗 π‘™βˆˆπ’ͺ(𝑗)\π‘˜ 𝑛𝑙𝑗(𝑦𝑗) 𝑒𝑦𝑗 ∝ π’ͺ(πœˆπ‘—β†’π‘˜, πΎπ‘—β†’π‘˜

βˆ’1 )

 πΎπ‘—β†’π‘˜ = βˆ’πΎπ‘—π‘˜ 𝐾𝑗\π‘˜

βˆ’1πΎπ‘—π‘˜

 β„Žπ‘—β†’π‘˜ = πΎπ‘—π‘˜ 𝐾𝑗\π‘˜

βˆ’1β„Žπ‘—\π‘˜

slide-40
SLIDE 40

Messages for the Gaussian networks

40

 Messages in the canonical form:

π‘›π‘—π‘˜ π‘¦π‘˜ = exp βˆ’ 1 2 πΎπ‘—β†’π‘˜π‘¦π‘˜

2 + β„Žπ‘—β†’π‘˜π‘¦π‘˜

𝐾𝑗\π‘˜ = 𝐾𝑗𝑗 +

π‘™βˆˆπ’ͺ(𝑗)\π‘˜

𝐾𝑙→𝑗 β„Žπ‘—\π‘˜ = β„Žπ‘—π‘— +

π‘™βˆˆπ’ͺ(𝑗)\π‘˜

β„Žπ‘™β†’π‘— πΎπ‘—β†’π‘˜ = βˆ’πΎπ‘—π‘˜ 𝐾𝑗\π‘˜

βˆ’1πΎπ‘—π‘˜

β„Žπ‘—β†’π‘˜ = πΎπ‘—π‘˜ 𝐾𝑗\π‘˜

βˆ’1β„Žπ‘—\π‘˜

slide-41
SLIDE 41

Marginal distributions

41

𝑄 𝑦𝑗 ∝ 𝜚 𝑦𝑗

π‘™βˆˆπ’ͺ 𝑗

𝑛𝑙𝑗 𝑦𝑗 𝑄 𝑦𝑗 ∝ π’ͺ(πœˆπ‘—, 𝐾𝑗

βˆ’1)

𝐾𝑗 = 𝐾𝑗𝑗 +

π‘™βˆˆπ’ͺ(𝑗)

𝐾𝑙→𝑗 β„Žπ‘— = β„Žπ‘—π‘— +

π‘™βˆˆπ’ͺ(𝑗)

β„Žπ‘™β†’π‘—

πœˆπ‘— = 𝐾𝑗 β„Žπ‘—

slide-42
SLIDE 42

Exact inference for Gaussian networks

42

 All exact inference algorithms can be adapted to Gaussian

networks.

 only the representation of factors and the implementation of the basic

factor operations are different.

 Inference in Gaussian networks is computationally linear in the no.

  • f cliques, and at most cubic in the size of the largest clique.

 When the Gaussian has sufficiently low dimension, naive approach

for inference may be sufficient.

 When we have a high dimensional Gaussian distribution and the network

has low tree-width, the message-passing algorithms can provide considerable savings.

slide-43
SLIDE 43

References

43

 M.I. Jordan, β€œAn

Introduction to Probabilistic Graphical Models”, Chapter 14.

 D. Koller and N. Friedman, β€œProbabilistic Graphical Models:

Principles and Techniques”, Chapter 7 and 14-2.