[PPT] - Factor analysis & Exact inference for Gaussian networks PowerPoint Presentation

SLIDE 1

Factor analysis & Exact inference for Gaussian networks

Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani

SLIDE 2

Multivariate Gaussian distribution

2

𝒪 𝒚|𝝂, 𝜯 = 1 2𝜌 𝑒/2 𝜯 1/2 exp{− 1 2 𝒚 − 𝝂 𝑈𝜯−1(𝒚 − 𝝂)}

 The natural, canonical, or information parameterization of

a Gaussian distribution arises from quadratic form 𝒪 𝒚|𝒊, 𝑲 ∝ exp{− 1 2 𝒚𝑈𝑲𝒚 + 𝒊𝑈𝒚}

𝚳 = 𝑲 = 𝜯−1 𝒊 = 𝜯−1𝝂

SLIDE 3

Joint Gaussian distribution: block elements

3

 If we partition the vector 𝒚 into 𝒚1 and 𝒚2:

𝝂 = 𝝂1 𝝂2 𝜯 = 𝜯11 𝜯12 𝜯21 𝜯22 𝑄 𝒚1 𝒚2 𝝂, 𝜯 = 𝒪 𝒚1 𝒚2 𝝂1 𝝂2 , 𝜯11 𝜯12 𝜯21 𝜯22

𝜯21 = 𝜯12

𝑈

𝜯11 and 𝜯22 are symmetric

SLIDE 4

Marginal and conditional of Gaussian

4

𝑦1 𝑦2 𝑄(𝑦1, 𝑦2) 𝑦2 = 0.7 𝑄(𝑦1) 𝑄(𝑦1|𝑦2 = 0.7) For multivariate Gaussian distribution, all marginal and conditional distributions are also Gaussians [Bishop]

SLIDE 5

Matrix inverse lemma

5

𝐵 𝐶 𝐷 𝐸

−1

= 𝑁 −𝑁𝐶𝐸−1 −𝐸−1𝐷𝑁 𝐸−1 + 𝐸−1𝐷𝑁𝐶𝐸−1 𝑁 = 𝐵 − 𝐶𝐸−1𝐷 −1

SLIDE 6

Precision matrix

6

 In many situations, it will be convenient to work with 𝚳 = 𝜯−1

known as precision matrix:

𝚳 = 𝜯−1 = 𝜯11 𝜯12 𝜯21 𝜯22

−1

𝚳 = 𝚳11 𝚳12 𝚳21 𝚳22

Relation between the inverse of a partitioned matrix and the inverses of its partitions (using matrix inverse lemma): 𝚳11 = 𝜯11 − 𝜯12𝜯22

−1𝜯21 −1

𝚳12 = − 𝜯11 − 𝜯12𝜯22

−1𝜯21 −1𝜯12𝜯22 −1

𝚳21 = 𝚳12

𝑈

𝚳11 and 𝚳22 are symmetric

SLIDE 7

Marginal and conditional distributions based on block elements of 𝚳

7

 Conditional

𝑄 𝒚1|𝒚2 = 𝒪 𝒚1|𝝂1|2, 𝜯1|2 𝝂1|2 = 𝝂1 − 𝚳11

−1𝚳12 𝒚2 − 𝝂2

𝜯1|2 = 𝚳11

−1

 Marginal

𝑄 𝒚1 = 𝒪 𝒚1|𝝂1, 𝜯1 𝜯1 = 𝚳11 − 𝚳12𝚳22

−1𝚳21 −1

𝚳 = 𝜯−1 𝚳 = 𝚳11 𝚳12 𝚳21 𝚳22 linear-Gaussian model

SLIDE 8

Marginal and conditional distributions based on block elements of 𝜯

8

 Conditional distributions :

𝑄 𝒚1|𝒚2 = 𝒪 𝒚1|𝝂1|2, 𝜯1|2 𝝂1|2 = 𝝂1 + 𝜯12𝜯22

−1 𝒚2 − 𝝂2

𝜯1|2 = 𝜯11 − 𝜯12𝜯22

−1𝜯21

 Marginal distributions based on block element of 𝝂 and 𝜯:

𝑄 𝒚1 = 𝒪 𝒚1|𝝂1, 𝜯11 𝑄 𝒚2 = 𝒪 𝒚2|𝝂2, 𝜯22

SLIDE 9

Factor analysis

9

 Gaussian latent variable 𝒂 (𝑀 dimensional)

 Continuous latent variable  Can be used for dimensionality reduction

 Observed variable 𝒀 (𝑀 < 𝐸 dimensional)

𝑄 𝒜 = 𝒪 𝒚 𝟏, 𝑱 𝑄 𝒚|𝒜 = 𝒪 𝒚 𝝂 + 𝑩𝒜, 𝛀

𝒂 𝒀

𝑩: factor loading 𝐸 × 𝑀 matrix 𝛀: diagonal covariance matrix

𝒂 ∈ ℝ𝑀 𝒀 ∈ ℝ𝐸 𝝂, 𝑩, 𝛀

SLIDE 10

Marginal distribution

10

𝒚 = 𝝂 + 𝑩𝒜 + 𝒙 𝒙~𝒪(𝟏, 𝛀)

The product of Gaussian distributions are Gaussian, as well as the marginal of Gaussian, thus 𝑄 𝒚 = 𝑄 𝒜 𝑄 𝒚|𝒜 𝑒𝒜 is Gaussian

𝝂𝒚 = 𝐹 𝒚 = 𝐹 𝝂 + 𝑩𝒜 + 𝒙 = 𝝂 + 𝑩𝐹 𝒜 = 𝝂 𝜯𝒚𝒚 = 𝐹 𝒚 − 𝝂 𝒚 − 𝝂 𝑈 = 𝐹 𝑩𝒜 + 𝒙 𝑩𝒜 + 𝒙 𝑈 = 𝑩𝐹 𝒜𝒜𝑈 𝑩𝑈 + 𝛀 = 𝑩𝑩𝑈 + 𝛀 ⇒ 𝑄 𝒚 = 𝒪 𝒚|𝝂,𝑩𝑩𝑈 + 𝛀

𝒙 is independent of 𝒚 and 𝒜 𝒜 𝒚 𝑄 𝒜 = 𝒪 𝒜 𝟏, 𝑱 𝑄 𝒚|𝒜 = 𝒪 𝒚 𝝂 + 𝑩𝒜, 𝛀

SLIDE 11

Joint Gaussian distribution

11

𝜯𝒜𝒚 = 𝐷𝑝𝑤 𝒜, 𝒚 = 𝐹 𝒜 𝑩𝒜 + 𝒙 𝑈 = 𝑩𝑈

𝜯𝒚𝒚 = 𝑩𝑩𝑈 + 𝜴

𝜯𝒜𝒜 = 𝑱

⇒ 𝜯 = 𝑱 𝑩 𝑩𝑈 𝑩𝑩𝑈 + 𝜴 𝐹 𝒜 𝒚 = 𝟏 𝝂 ⇒ 𝑄 𝒜 𝒚 = 𝒪 𝒜 𝒚 𝟏 𝝂 , 𝑱 𝑩 𝑩𝑈 𝑩𝑩𝑈 + 𝜴

SLIDE 12

Conditional distributions

12

𝑄 𝒜|𝒚 = 𝒪 𝒜|𝝂𝒜|𝒚,𝚻𝒜|𝒚 𝝂𝒜|𝒚 = 𝑩𝑈 𝑩𝑩𝑈 + 𝛀 −1 𝒚 − 𝝂 𝚻𝒜|𝒚 = 𝑱 − 𝑩𝑈 𝑩𝑩𝑈 + 𝛀 −1𝑩

 𝐸 × 𝐸 matrix is required to be inverted. If

𝑀 < 𝐸, it is preferred to use:

𝝂𝒜|𝒚 = 𝑱 + 𝑩𝑈𝛀−1𝑩 −1𝑩𝑈𝛀−1 𝒚 − 𝝂 = 𝚻𝒜|𝒚𝑩𝑈𝛀−1 𝒚 − 𝝂 𝚻𝒜|𝒚 = 𝑱 + 𝑩𝑈𝛀−1𝑩 −1

𝐵 − 𝐶𝐸−1𝐷 −1 = 𝐵−1 + 𝐵−1𝐶 𝐸 − 𝐷𝐵−1𝐶 −1𝐷𝐵−1 Posterior covariance does not depend on observed data 𝒚! Computing the posterior mean is a linear operation 𝑄 𝒚1|𝒚2 = 𝒪 𝒚1|𝝂1|2, 𝜯1|2 𝝂1|2 = 𝝂1 + 𝜯12𝜯22

−1 𝒚2 − 𝝂2

𝜯1|2 = 𝜯11 − 𝜯12𝜯22

−1𝜯21

𝜯 = 𝑱 𝑩 𝑩𝑈 𝑩𝑩𝑈 + 𝜴

SLIDE 13

Geometric illustration

13

𝑄(𝒜) 𝑄(𝒜|𝒚) To generate data, first generate a point within the manifold then add noise. 𝑦1 𝑦2 𝑦3 𝒚 [Jordan]

SLIDE 14

FA example

14

 Data is a linear function of low-dimensional latent

coordinates, plus Gaussian noise

𝑨 𝒃

𝑄 𝒜 = 𝒪(𝒜|𝟏, 𝑱) 𝑄 𝒚 𝒜 = 𝒪 𝒚|𝑩𝒜 + 𝝂, 𝜴 𝑄 𝒚 = 𝒪 𝒚|𝝂, 𝑩𝑩𝑼 + 𝜴

[Bishop] 𝒃

SLIDE 15

Factor analysis: dimensionality reduction

15

 FA is just a constrained Gaussian model

 If 𝜴 were not diagonal then we could model any Gaussian

 FA is a low rank parameterization of a multi-variate Gaussian

 Since 𝑄 𝒚 = 𝒪 𝒚|𝝂,𝑩𝑩𝑈 + 𝛀 , FA approximates the covariance matrix

f the visible vector using a low-rank decomposition 𝑩𝑩𝑈 and the

diagonal matrix 𝛀

 𝑩𝑩𝑈 + 𝛀 is the outer product of two low-rank matrices plus a diagonal matrix

(i.e., 𝑃(𝑀𝐸) parameters instead of 𝑃(𝐸2))

 Given {𝒚 1 , … , 𝒚(𝑂)} (the observation on high dimensional

data), by learning from incomplete data we find 𝑩 for transforming data to a lower dimensional space

SLIDE 16

Incomplete likelihood

16

ℓ 𝜾; 𝒠 = − 𝑂 2 log 𝑩𝑩𝑈 + 𝜴 − 1 2

𝑜=1 𝑂

𝒚 𝑜 − 𝝂

𝑈 𝑩𝑩𝑈 + 𝜴 −1 𝒚 𝑜 − 𝝂

= − 𝑂 2 log 𝑩𝑩𝑈 + 𝜴 − 1 2 𝑢𝑠 𝑩𝑩𝑈 + 𝜴 −1𝑻 𝑻 =

𝑜=1 𝑂

𝒚 𝑜 − 𝝂 𝒚 𝑜 − 𝝂

𝑈

𝝂𝑁𝑀 = 1 𝑂

𝑜=1 𝑂

𝒚 𝑜

SLIDE 17

E-step: expected sufficient statistics

17

𝐹𝑄 ℋ|𝒠,𝜾𝑢 log 𝑄 𝒠, ℋ 𝜾 =

𝑜=1 𝑂

𝐹𝑄 𝒜(𝑜)|𝒚(𝑜),𝜾𝑢 log 𝑄 𝒜(𝑜) 𝜾 + log 𝑄 𝒚(𝑜) 𝒜(𝑜), 𝜾

 Expected sufficient statistics:

𝐹 log 𝑄 𝒠, ℋ 𝜾 = − 𝑂 2 log 𝜴 − 1 2

𝑜=1 𝑂

𝑢𝑠 𝐹 𝒜 𝑜 𝒜 𝑜 𝑈 − 1 2

𝑜=1 𝑂

𝑢𝑠 𝐹 𝒚 𝑜 − 𝑩𝒜 𝑜 𝒚 𝑜 − 𝑩𝒜 𝑜

𝑈 𝜴−1 + 𝑑

𝐹 𝒚 𝑜 − 𝑩𝒜(𝑜) 𝒚 𝑜 − 𝑩𝒜(𝑜) 𝑈 = 𝒚 𝑜 𝒚 𝑜 𝑈 − 𝑩𝐹 𝒜 𝑜 𝒚 𝑜 − 𝒚 𝑜 𝐹 𝒜 𝑜

𝑈𝑩𝑈 + 𝑩𝐹 𝒜(𝑜)𝒜 𝑜 𝑈 𝑩𝑈

𝑈

𝚻𝒜|𝒚 = 𝑱 + 𝑩𝑈𝛀−1𝑩 −1 𝝂𝒜|𝒚 = 𝚻𝒜|𝒚𝑩𝑈𝛀−1 𝒚 − 𝝂

SLIDE 18

M-Step

18

𝑩𝑢+1 =

𝑜=1 𝑂

𝒚 𝑜 𝐹 𝒜 𝑜

𝑈 𝑜=1 𝑂

𝐹 𝒜 𝑜 𝒜 𝑜 𝑈

−1

𝜴𝑢+1 = 1 𝑂 diag 𝐹 𝒚 𝑜 − 𝑩𝑢+1𝒜(𝑜) 𝒚 𝑜 − 𝑩𝑢+1𝒜(𝑜) 𝑈 = diag(

𝑜=1 𝑂

𝒚 𝑜 𝒚 𝑜 𝑈 − 𝑩𝑢+1

𝑜=1 𝑂

𝐹 𝒜 𝑜 𝒚 𝑜 𝑈)

SLIDE 19

Unidentifiability

19

 𝑩 only appears as outer product 𝑩𝑩𝑈, thus the model is

invariant to rotation and axis flips of the latent space.

 𝑩 can be replaced with 𝑩𝑹 for any orthonormal matrix 𝑹 and

the model containing only 𝑩𝑩𝑈 remains the same.

 Thus, FA is an un-identifiable model.

 Likelihood objective function on a set of data will not have a

unique maximum (an infinite number of parameters give the maximum score)

 It not be guaranteed to identify the same parameters.

SLIDE 20

Probabilistic PCA (PPCA)

20

 Factor analysis: 𝜴 is a general diagonal matrix  Probabilistic PCA: 𝜴 = 𝛽𝑱 and 𝑩 is orthogonal

Posterior mean is not an orthogonal projection, since it is shrunk somewhat towards the prior mean [Murphy] PCA

SLIDE 21

21

Exact inference for Gaussian networks

SLIDE 22

Multivariate Gaussian distribution

22

𝑄 𝒀 = 1 2𝜌 𝑒/2 𝜯 1/2 exp{− 1 2 𝒚 − 𝝂 𝑈𝜯−1(𝒚 − 𝝂)} 𝑄 𝒀 ∝ exp{− 1 2 𝒚𝑈𝑲𝒚 + 𝑲𝝂 𝑈𝒚}

 Directed model

 Linear Gaussian model

 Undirected model

 Gaussian MRF

𝑲 = 𝜯−1 𝑄 is normalizable (i.e., normalization constant is finite) and defines a legal Gaussian distribution if and only if 𝑲 is positive definite.

SLIDE 23

Linear-Gaussian model

23

 Linear-Gaussian model for CPDs:

𝑄 𝑌𝑗 𝑄𝑏 𝑌𝑗 = 𝒪 𝑌𝑗|

𝑌𝑘∈𝑄𝑏 𝑌𝑗

𝑥𝑗𝑘𝑌

𝑘 + 𝑐𝑗 , 𝑤𝑗

 The joint distribution is Gaussian:

ln 𝑄 𝑌1, … , 𝑌𝐸 = −

𝑗=1 𝐸

1 2𝑤𝑗 𝑌𝑗 −

𝑌𝑘∈𝑄𝑏 𝑌𝑗

𝑥𝑗𝑘𝑌

𝑘 − 𝑐𝑗 2

+ 𝐷

SLIDE 24

From linear-Gaussian model to joint multi- variate distribution

24

 We can find the parameters of the multi-variate Gaussian

from the linear-Gaussian model

 Mean and covariences (𝑌𝑗s are in topological order):

𝐹 𝑌𝑗 =

𝑌𝑘∈𝑄𝑏 𝑌𝑗

𝑥𝑗𝑘𝐹 𝑌

𝑘 + 𝑐𝑗

𝑑𝑝𝑤 𝑌𝑗, 𝑌

𝑘 = 𝑌𝑙∈𝑄𝑏 𝑌𝑘

𝑥𝑗𝑙𝑑𝑝𝑤 𝑌𝑗, 𝑌𝑙 + 𝐽𝑗𝑘𝑤𝑘

SLIDE 25

Multivariate Gaussian: directed model example

25

 Linear Gaussian

𝑌2 𝑌1 𝑄 𝑌1 = 𝒪(𝑌1|𝑐1, 𝑤1) 𝑄 𝑌2 𝑌1 = 𝒪(𝑌2|𝑥𝑌1 + 𝑐2, 𝑤2) 𝑥 = 1 𝑐2 = 0.5 𝑤2 = 0.2 𝑌1 𝑌2 𝑞(𝑌2|𝑌1) 𝑌1 𝑌2 𝑞(𝑌1, 𝑌2) 𝑐1 = 2 𝑤1 = 0.5

SLIDE 26

Gaussian Bayesian networks

26

 We define a Gaussian Bayesian network to be a Bayesian

network all of whose variables are continuous, and where all

f the CPDs are linear Gaussians.

 For Gaussian networks, the joint distribution has a compact

representation

 the number of variables will be quadratic

 Transformations from the network to the joint and back have

a fairly simple and efficiently computable closed form

SLIDE 27

Independencies in multivariate Gaussian

27

 𝜯𝑗𝑘

−1 = 0 ⇔ 𝑌𝑗 ⊥ 𝑌 𝑘|𝒀 − {𝑌𝑗, 𝑌 𝑘}

 If 𝑌1, … , 𝑌𝐸 have a joint normal distribution 𝑄 𝒀 = 𝒪 𝝂, 𝜯

then 𝜯𝑗𝑘

−1 = 0 if and only if 𝑄 ⊨ 𝑌𝑗 ⊥ 𝑌 𝑘|𝒀 − {𝑌𝑗, 𝑌 𝑘}

 𝜯𝑗𝑘 = 0 ⇔ 𝑌𝑗 ⊥ 𝑌

𝑘

 If 𝑌1, … , 𝑌𝐸 have a joint normal distribution 𝒪 𝝂, 𝜯 then 𝜯𝑗𝑘

= 0 if and only if 𝑌𝑗 ⊥ 𝑌

𝑘

SLIDE 28

Sparsity in covariance matrix

28

𝑌1 𝑌2 𝑌3 𝑌4 𝑌1 𝑌2 𝑌3 𝜯13 = 𝜯31 = 0 𝜯13

−1 ≠ 0

If the parametrization is not degenerate, 𝜯 would be dense (i.e., ∀𝑗, 𝑘 𝜯𝑗𝑘 ≠ 0) 𝜯 = ∗ ∗ ∗ ∗ ∗ ∗ ∗

SLIDE 29

Multivariate Gaussian: undirected model

29

 A Gaussian

distribution can be represented by a fully connected graph with pairwise (edge) potentials

 Gaussian MRF

 The overall energy has the form

𝐹(𝒚) = 1 2

𝑗,𝑘

𝑦𝑗 − 𝜈𝑗 𝚻𝑗𝑘

−1 𝑦𝑘 − 𝜈𝑘

𝜗𝑗,𝑘 𝑦𝑗, 𝑦𝑘 = −𝚻𝑗𝑘

−1𝑦𝑗𝑦𝑘,

𝑗 < 𝑘 𝜗𝑗 𝑦𝑗 = − 1 2 𝚻𝑗𝑗

−1𝑦𝑗 2 + 𝑦𝑗 𝑘

𝚻𝑗𝑘

−1𝜈𝑘

SLIDE 30

Multivariate Gaussian: undirected model

30

 𝜯𝑗𝑘

−1 = 0 ⇔ 𝑌𝑗 ⊥ 𝑌 𝑘|𝒀 − {𝑌𝑗, 𝑌 𝑘}

 If 𝑌1, … , 𝑌𝐸 have a joint normal distribution 𝑄 𝒀 = 𝒪 𝝂, 𝜯

then 𝜯𝑗𝑘

−1 = 0 if and only if 𝑄 ⊨ 𝑌𝑗 ⊥ 𝑌 𝑘|𝒀 − {𝑌𝑗, 𝑌 𝑘}

 We can view the information matrix as directly defining a

minimal I-map Markov network for the distribution

 whereby nonzero entries correspond to edges in the network.

SLIDE 31

Sparsity in precision matrix 𝜯−1

31

𝜯−1 = ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

𝐵 𝐶 𝐷 𝐸

SLIDE 32

BP for continuous variables

32

𝑦3 𝑦4 𝑦1 𝑛32(𝑦2) 𝑛42(𝑦2) 𝑛21(𝑦1) 𝑦2

SLIDE 33

Belief propagation: integral-product

33

𝑛𝑙𝑗(𝑦𝑗) 𝑛𝑚𝑗(𝑦𝑗) 𝑛𝑗𝑘(𝑦𝑘) 𝑗 𝑘

 Messages

𝑛𝑗𝑘 𝑦𝑘 =

𝑦𝑗

𝜚 𝑦𝑗 𝜚 𝑦𝑗, 𝑦𝑘

𝑙∈𝒪(𝑗)\j

𝑛𝑙𝑗(𝑦𝑗) 𝑒𝑦𝑗

 Marginal probability function

𝑄 𝑦𝑗 ∝ 𝜚 𝑦𝑗

𝑙∈𝒪(𝑗)

𝑛𝑙𝑗(𝑦𝑗)

SLIDE 34

BP for continuous variables

34

 Is there a finitely parameterized, closed form for the

message and marginal functions?

 Is there an analytic formula for the message integral,

phrased as an update of these parameters?

SLIDE 35

Canonical form properties

35

 The product of two canonical form is in the canonical form  The division of two canonical form is also in the canonical

form

 The marginalization of a canonical form onto a subset of its

variables 𝒁 results in a canonical form (when 𝜯𝒁𝒁 is positive definite)

 Instantiating a subset of variables results in a canonical form

SLIDE 36

Canonical forms

36

= for will be

SLIDE 37

Messages and marginals for Gaussian networks

37

 We use the canonical form as a finitely parameterized, closed

form to show the message and marginal functions

 The message integral is phrased in the canonical form and its parameters

can be found based on the parameters involved in the integral

 The inference algorithm will be correct since we can show

that it executes a marginalization step only on canonical well- defined forms (for which this operation is well defined).

SLIDE 38

Gaussian Markov network: factors

38

 The graph topology can be specified by the structure of the

matrix 𝑲, i.e. the edges set {𝑗, 𝑘} includes all non-zero entries of 𝑲 for which 𝑗 > 𝑘:

𝜚𝑗𝑘 𝑦𝑗, 𝑦𝑘 = exp − 1 2 𝑦𝑗𝐾𝑗𝑘𝑦𝑘 𝜚𝑗 𝑦𝑗 = exp − 1 2 𝐾𝑗𝑗𝑦𝑗

2 + ℎ𝑗𝑦𝑗

𝜚𝑗 𝑦𝑗 ∝ 𝒪 𝜈𝑗𝑗, 𝐾𝑗𝑗

−1

ℎ𝑗𝑗 = 𝐾𝑗𝑗𝜈𝑗𝑗 𝐾𝑗𝑗 = Σ𝑗𝑗

−1

One form of parametrizing factors (that is not uniquely defined)

SLIDE 39

Gaussian Markov network: messages

39

 If we assume Gaussian MRF:

 Messages and marginal functions are all Gaussian  Updates will be in terms of updating parameters 𝐾 and ℎ

 𝜚𝑗 𝑦𝑗 𝑙∈𝒪(𝑗)\𝑘 𝑛𝑙𝑗(𝑦𝑗) ∝ 𝒪(𝜈𝑗\𝑘, 𝐾𝑗\𝑘

−1)

 𝐾𝑗\𝑘 = 𝐾𝑗𝑗 + 𝑙∈𝒪(𝑗)\𝑘 𝐾𝑙→𝑗  ℎ𝑗\𝑘 = ℎ𝑗𝑗 + 𝑙∈𝒪(𝑗)\𝑘 ℎ𝑙→𝑗

 𝑛𝑗𝑘 𝑦𝑘 = 𝜚𝑗,𝑘 𝑦𝑗, 𝑦𝑘 𝜚𝑗 𝑦𝑗 𝑙∈𝒪(𝑗)\𝑘 𝑛𝑙𝑗(𝑦𝑗) 𝑒𝑦𝑗 ∝ 𝒪(𝜈𝑗→𝑘, 𝐾𝑗→𝑘

−1 )

 𝐾𝑗→𝑘 = −𝐾𝑗𝑘 𝐾𝑗\𝑘

−1𝐾𝑗𝑘

 ℎ𝑗→𝑘 = 𝐾𝑗𝑘 𝐾𝑗\𝑘

−1ℎ𝑗\𝑘

SLIDE 40

Messages for the Gaussian networks

40

 Messages in the canonical form:

𝑛𝑗𝑘 𝑦𝑘 = exp − 1 2 𝐾𝑗→𝑘𝑦𝑘

2 + ℎ𝑗→𝑘𝑦𝑘

𝐾𝑗\𝑘 = 𝐾𝑗𝑗 +

𝑙∈𝒪(𝑗)\𝑘

𝐾𝑙→𝑗 ℎ𝑗\𝑘 = ℎ𝑗𝑗 +

𝑙∈𝒪(𝑗)\𝑘

ℎ𝑙→𝑗 𝐾𝑗→𝑘 = −𝐾𝑗𝑘 𝐾𝑗\𝑘

−1𝐾𝑗𝑘

ℎ𝑗→𝑘 = 𝐾𝑗𝑘 𝐾𝑗\𝑘

−1ℎ𝑗\𝑘

SLIDE 41

Marginal distributions

41

𝑄 𝑦𝑗 ∝ 𝜚 𝑦𝑗

𝑙∈𝒪 𝑗

𝑛𝑙𝑗 𝑦𝑗 𝑄 𝑦𝑗 ∝ 𝒪(𝜈𝑗, 𝐾𝑗

−1)

𝐾𝑗 = 𝐾𝑗𝑗 +

𝑙∈𝒪(𝑗)

𝐾𝑙→𝑗 ℎ𝑗 = ℎ𝑗𝑗 +

𝑙∈𝒪(𝑗)

ℎ𝑙→𝑗

𝜈𝑗 = 𝐾𝑗 ℎ𝑗

SLIDE 42

Exact inference for Gaussian networks

42

 All exact inference algorithms can be adapted to Gaussian

networks.

 only the representation of factors and the implementation of the basic

factor operations are different.

 Inference in Gaussian networks is computationally linear in the no.

f cliques, and at most cubic in the size of the largest clique.

 When the Gaussian has sufficiently low dimension, naive approach

for inference may be sufficient.

 When we have a high dimensional Gaussian distribution and the network

has low tree-width, the message-passing algorithms can provide considerable savings.