CSci 8980: Advanced Topics in Graphical Models Expectation - - PowerPoint PPT Presentation

csci 8980 advanced topics in graphical models expectation
SMART_READER_LITE
LIVE PREVIEW

CSci 8980: Advanced Topics in Graphical Models Expectation - - PowerPoint PPT Presentation

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam Banerjee October 26, 2007 Posterior Estimation Assumed Density


slide-1
SLIDE 1

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation

Instructor: Arindam Banerjee October 26, 2007

slide-2
SLIDE 2

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

slide-3
SLIDE 3

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u)

slide-4
SLIDE 4

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

slide-5
SLIDE 5

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

slide-6
SLIDE 6

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D)

slide-7
SLIDE 7

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

slide-8
SLIDE 8

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

For conjugate priors, posterior is in the same family

slide-9
SLIDE 9

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

For conjugate priors, posterior is in the same family In general, it can be intractable

slide-10
SLIDE 10

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation

Consider a Bayesian model

Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}

Quantities of interest

Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)

For conjugate priors, posterior is in the same family In general, it can be intractable What is the best approximation in the (prior) family?

slide-11
SLIDE 11

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

  • i=1

ti(u)

slide-12
SLIDE 12

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

  • i=1

ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u)

slide-13
SLIDE 13

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

  • i=1

ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u) The normalizer Z is the same as the data likelihood, i.e., Z =

  • u

P(0)(u)

n

  • i=1

ti(u)du =

  • u

P(D|u)P(0)(u)du = P(D)

slide-14
SLIDE 14

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Posterior Estimation (Contd.)

The likelihood function often factorizes P(D|u) =

n

  • i=1

ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u) The normalizer Z is the same as the data likelihood, i.e., Z =

  • u

P(0)(u)

n

  • i=1

ti(u)du =

  • u

P(D|u)P(0)(u)du = P(D) The two problems are closely related

slide-15
SLIDE 15

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ))

slide-16
SLIDE 16

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D)

slide-17
SLIDE 17

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute

slide-18
SLIDE 18

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute

Approach 1: Assumed density filtering, online Bayesian learning

slide-19
SLIDE 19

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Approximating the Posterior

Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute

Approach 1: Assumed density filtering, online Bayesian learning Approach 2: Expectation propagation

slide-20
SLIDE 20

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u)

slide-21
SLIDE 21

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u)

slide-22
SLIDE 22

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u) At each step, update Q to incorporate one ti(u)

slide-23
SLIDE 23

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u) At each step, update Q to incorporate one ti(u)

Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)

  • z ti(z)Q(z)dz
slide-24
SLIDE 24

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u) At each step, update Q to incorporate one ti(u)

Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)

  • z ti(z)Q(z)dz

Find Qnew ∈ F such that Qnew(u) = argmin

˜ Q∈F

KL(ˆ P(u) ˜ Q(u))

slide-25
SLIDE 25

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering

Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)

n

  • i=1

ti(u) At each step, update Q to incorporate one ti(u)

Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)

  • z ti(z)Q(z)dz

Find Qnew ∈ F such that Qnew(u) = argmin

˜ Q∈F

KL(ˆ P(u) ˜ Q(u))

Maximum likelihood estimate with ˆ P as the true distribution

slide-26
SLIDE 26

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

slide-27
SLIDE 27

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

For each factor ti(u)

slide-28
SLIDE 28

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

For each factor ti(u)

Compute the means (moments) of ˆ P(u) ∝ ti(u)Q(u)

slide-29
SLIDE 29

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Assumed Density Filtering (Contd.)

To obtain Qnew it is sufficient to do moment matching µnew = Eˆ

P[s(u)]

For each factor ti(u)

Compute the means (moments) of ˆ P(u) ∝ ti(u)Q(u) Pick Qnew ∈ F with these mean parameters

slide-30
SLIDE 30

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

slide-31
SLIDE 31

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u)

slide-32
SLIDE 32

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u)

slide-33
SLIDE 33

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

slide-34
SLIDE 34

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

  • i=1

˜ ti(u)

slide-35
SLIDE 35

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

  • i=1

˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)

slide-36
SLIDE 36

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

  • i=1

˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)

Compute Qnew with µnew = Eˆ

P[s(u)]

slide-37
SLIDE 37

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

ADF: An Alternative Viewpoint

For a single factor ti(u)

The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)

In general, after a pass through all factors Q(u) ∝ P(0)(u)

n

  • i=1

˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)

Compute Qnew with µnew = Eˆ

P[s(u)]

Set ˜ tnew

i

(u) ∝ Qnew(u)/Q(u)

slide-38
SLIDE 38

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

slide-39
SLIDE 39

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor

slide-40
SLIDE 40

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

slide-41
SLIDE 41

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

slide-42
SLIDE 42

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor

slide-43
SLIDE 43

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations

slide-44
SLIDE 44

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed

slide-45
SLIDE 45

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed

In principle, ˜ ti can be updated multiple times

slide-46
SLIDE 46

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Issues with ADF

ADF makes one pass through the data

Q is updated once for each factor Equivalently, ˜ ti(u) is updated once

Issues with ADF

Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed

In principle, ˜ ti can be updated multiple times EP effectively extends ADF allowing multiple passes

slide-47
SLIDE 47

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u)

slide-48
SLIDE 48

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz

slide-49
SLIDE 49

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

slide-50
SLIDE 50

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine

slide-51
SLIDE 51

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u)

slide-52
SLIDE 52

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u)

slide-53
SLIDE 53

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ

P[s(u)] and normalizer Zi

slide-54
SLIDE 54

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ

P[s(u)] and normalizer Zi

Update ˜ ti(u) = ZiQnew(u)/Q i(u)

slide-55
SLIDE 55

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Expectation Propagation

Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n

i=1 ˜

ti(u)

  • z

n

i=1 ˜

ti(z)dz Until all ˜ ti converge

Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ

P[s(u)] and normalizer Zi

Update ˜ ti(u) = ZiQnew(u)/Q i(u)

Estimate the data likelihood as P(D) ≈

  • z

n

  • i=1

˜ ti(z)dz

slide-56
SLIDE 56

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I)

slide-57
SLIDE 57

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

  • j=1

p(xj|u)

slide-58
SLIDE 58

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

  • j=1

p(xj|u) Evaluation

slide-59
SLIDE 59

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

  • j=1

p(xj|u) Evaluation

Evidence/likelihood p(D) =

  • u p(u, D)du
slide-60
SLIDE 60

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Experiments

The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)

n

  • j=1

p(xj|u) Evaluation

Evidence/likelihood p(D) =

  • u p(u, D)du

Posterior mean E[u|D] =

  • u up(u|D)du
slide-61
SLIDE 61

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Results: Likelihood P(D)

slide-62
SLIDE 62

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Results: Posterior Mean E[u|D]

slide-63
SLIDE 63

Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments

Results: Complex Posterior