[PPT] - CSci 8980: Advanced Topics in Graphical Models Expectation PowerPoint Presentation

SLIDE 1

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation

Instructor: Arindam Banerjee October 26, 2007

SLIDE 2

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

SLIDE 3

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u)

SLIDE 4

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

SLIDE 5

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

SLIDE 6

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D)

SLIDE 7

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

SLIDE 8

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

For conjugate priors, posterior is in the same family

SLIDE 9

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

For conjugate priors, posterior is in the same family In general, it can be intractable

SLIDE 10

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

For conjugate priors, posterior is in the same family In general, it can be intractable What is the best approximation in the (prior) family?

SLIDE 11

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

i=1

ti(u)

SLIDE 12

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

i=1

ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)

n

i=1

ti(u)

SLIDE 13

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

i=1

ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)

n

i=1

ti(u) The normalizer Z is the same as the data likelihood, i.e., Z =

u

P(0)(u)

n

i=1

ti(u)du =

u

P(D|u)P(0)(u)du = P(D)

SLIDE 14

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

i=1

ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)

n

i=1

ti(u) The normalizer Z is the same as the data likelihood, i.e., Z =

u

P(0)(u)

n

i=1

ti(u)du =

u

P(D|u)P(0)(u)du = P(D) The two problems are closely related

SLIDE 15

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ))

SLIDE 16

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D)

SLIDE 17

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute

SLIDE 18

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute

Approach 1: Assumed density filtering, online Bayesian learning

SLIDE 19

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute

Approach 1: Assumed density filtering, online Bayesian learning Approach 2: Expectation propagation

SLIDE 20

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u)

SLIDE 21

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

i=1

ti(u)

SLIDE 22

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

i=1

ti(u) At each step, update Q to incorporate one ti(u)

SLIDE 23

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

i=1

ti(u) At each step, update Q to incorporate one ti(u)

Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)

z ti(z)Q(z)dz

SLIDE 24

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

i=1

ti(u) At each step, update Q to incorporate one ti(u)

Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)

z ti(z)Q(z)dz

Find Qnew ∈ F such that Qnew(u) = argmin

˜ Q∈F

KL(ˆ P(u) ˜ Q(u))

SLIDE 25

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

i=1

ti(u) At each step, update Q to incorporate one ti(u)

Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)

z ti(z)Q(z)dz

Find Qnew ∈ F such that Qnew(u) = argmin

˜ Q∈F

KL(ˆ P(u) ˜ Q(u))

Maximum likelihood estimate with ˆ P as the true distribution

SLIDE 26

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

SLIDE 27

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

For each factor ti(u)

SLIDE 28

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

For each factor ti(u)

Compute the means (moments) of ˆ P(u) ∝ ti(u)Q(u)

SLIDE 29

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

For each factor ti(u)

Compute the means (moments) of ˆ P(u) ∝ ti(u)Q(u) Pick Qnew ∈ F with these mean parameters

SLIDE 30

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

SLIDE 31

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u)

SLIDE 32

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u)

SLIDE 33

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

SLIDE 34

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

i=1

˜ ti(u)

SLIDE 35

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

i=1

˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)

SLIDE 36

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

i=1

˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)

Compute Qnew with µnew = Eˆ

P[s(u)]

SLIDE 37

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

i=1

˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)

Compute Qnew with µnew = Eˆ

P[s(u)]

Set ˜ tnew

i

(u) ∝ Qnew(u)/Q(u)

SLIDE 38

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

SLIDE 39

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor

SLIDE 40

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

SLIDE 41

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

SLIDE 42

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor

SLIDE 43

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations

SLIDE 44

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed

SLIDE 45

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed

In principle, ˜ ti can be updated multiple times

SLIDE 46

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed

In principle, ˜ ti can be updated multiple times EP effectively extends ADF allowing multiple passes

SLIDE 47

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u)

SLIDE 48

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz

SLIDE 49

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

SLIDE 50

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine

SLIDE 51

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u)

SLIDE 52

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u)

SLIDE 53

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ

P[s(u)] and normalizer Zi

SLIDE 54

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ

P[s(u)] and normalizer Zi

Update ˜ ti(u) = ZiQnew(u)/Q i(u)

SLIDE 55

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ

P[s(u)] and normalizer Zi

Update ˜ ti(u) = ZiQnew(u)/Q i(u)

Estimate the data likelihood as P(D) ≈

z

n

i=1

˜ ti(z)dz

SLIDE 56

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I)

SLIDE 57

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

j=1

p(xj|u)

SLIDE 58

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

j=1

p(xj|u) Evaluation

SLIDE 59

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

j=1

p(xj|u) Evaluation

Evidence/likelihood p(D) =

u p(u, D)du

SLIDE 60

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

j=1

p(xj|u) Evaluation

Evidence/likelihood p(D) =

u p(u, D)du

Posterior mean E[u|D] =

u up(u|D)du

SLIDE 61

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Results: Likelihood P(D)

SLIDE 62

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Results: Posterior Mean E[u|D]

SLIDE 63

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments