Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
CSci 8980: Advanced Topics in Graphical Models Expectation - - PowerPoint PPT Presentation
CSci 8980: Advanced Topics in Graphical Models Expectation - - PowerPoint PPT Presentation
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam Banerjee October 26, 2007 Posterior Estimation Assumed Density
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Quantities of interest
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Quantities of interest
Posterior over latent variable P(0)(u|D)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Quantities of interest
Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Quantities of interest
Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)
For conjugate priors, posterior is in the same family
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Quantities of interest
Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)
For conjugate priors, posterior is in the same family In general, it can be intractable
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation
Consider a Bayesian model
Latent variable u with prior P(0)(u) Observable D, such as {x1, . . . , xm}
Quantities of interest
Posterior over latent variable P(0)(u|D) Likelihood of observation P(D)
For conjugate priors, posterior is in the same family In general, it can be intractable What is the best approximation in the (prior) family?
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation (Contd.)
The likelihood function often factorizes P(D|u) =
n
- i=1
ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation (Contd.)
The likelihood function often factorizes P(D|u) =
n
- i=1
ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)
n
- i=1
ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation (Contd.)
The likelihood function often factorizes P(D|u) =
n
- i=1
ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)
n
- i=1
ti(u) The normalizer Z is the same as the data likelihood, i.e., Z =
- u
P(0)(u)
n
- i=1
ti(u)du =
- u
P(D|u)P(0)(u)du = P(D)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Posterior Estimation (Contd.)
The likelihood function often factorizes P(D|u) =
n
- i=1
ti(u) The true posterior may be intractable P(u|D) ∝ P(0)(u)
n
- i=1
ti(u) The normalizer Z is the same as the data likelihood, i.e., Z =
- u
P(0)(u)
n
- i=1
ti(u)du =
- u
P(D|u)P(0)(u)du = P(D) The two problems are closely related
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Approximating the Posterior
Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ))
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Approximating the Posterior
Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Approximating the Posterior
Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Approximating the Posterior
Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute
Approach 1: Assumed density filtering, online Bayesian learning
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Approximating the Posterior
Assume prior P(0)(u) belongs to exponential family F P(0)(u) = exp(θ0, s(u) − ψ(θ)) Let Q(u) ∈ F be the best approximation to P(u|D) Tractably compute Q(u) when P(u|D) is hard to compute
Approach 1: Assumed density filtering, online Bayesian learning Approach 2: Expectation propagation
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering
Start with an initial guess Q(u) = P(0)(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering
Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)
n
- i=1
ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering
Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)
n
- i=1
ti(u) At each step, update Q to incorporate one ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering
Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)
n
- i=1
ti(u) At each step, update Q to incorporate one ti(u)
Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)
- z ti(z)Q(z)dz
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering
Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)
n
- i=1
ti(u) At each step, update Q to incorporate one ti(u)
Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)
- z ti(z)Q(z)dz
Find Qnew ∈ F such that Qnew(u) = argmin
˜ Q∈F
KL(ˆ P(u) ˜ Q(u))
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering
Start with an initial guess Q(u) = P(0)(u) Recall that P(u|D) ∝ P(0)(u)
n
- i=1
ti(u) At each step, update Q to incorporate one ti(u)
Compute the true Bayesian update ˆ P(u) = ti(u)Q(u)
- z ti(z)Q(z)dz
Find Qnew ∈ F such that Qnew(u) = argmin
˜ Q∈F
KL(ˆ P(u) ˜ Q(u))
Maximum likelihood estimate with ˆ P as the true distribution
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering (Contd.)
To obtain Qnew it is sufficient to do moment matching µnew = Eˆ
P[s(u)]
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering (Contd.)
To obtain Qnew it is sufficient to do moment matching µnew = Eˆ
P[s(u)]
For each factor ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering (Contd.)
To obtain Qnew it is sufficient to do moment matching µnew = Eˆ
P[s(u)]
For each factor ti(u)
Compute the means (moments) of ˆ P(u) ∝ ti(u)Q(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Assumed Density Filtering (Contd.)
To obtain Qnew it is sufficient to do moment matching µnew = Eˆ
P[s(u)]
For each factor ti(u)
Compute the means (moments) of ˆ P(u) ∝ ti(u)Q(u) Pick Qnew ∈ F with these mean parameters
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)
In general, after a pass through all factors Q(u) ∝ P(0)(u)
n
- i=1
˜ ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)
In general, after a pass through all factors Q(u) ∝ P(0)(u)
n
- i=1
˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)
In general, after a pass through all factors Q(u) ∝ P(0)(u)
n
- i=1
˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)
Compute Qnew with µnew = Eˆ
P[s(u)]
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
ADF: An Alternative Viewpoint
For a single factor ti(u)
The true posterior ˆ P(u) ∝ ti(u)Q(u) The approximate posterior Qnew(u) ∝ ˜ ti(u)Q(u) The factor ˜ ti(u) ∝ Qnew(u)/Q(u)
In general, after a pass through all factors Q(u) ∝ P(0)(u)
n
- i=1
˜ ti(u) Algo: Set ˜ ti = 1, ∀i. For each factor ti(u)
Compute Qnew with µnew = Eˆ
P[s(u)]
Set ˜ tnew
i
(u) ∝ Qnew(u)/Q(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Issues with ADF
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Issues with ADF
Earlier approximations ˜ ti(u) may be poor
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Issues with ADF
Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Issues with ADF
Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Issues with ADF
Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed
In principle, ˜ ti can be updated multiple times
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Issues with ADF
ADF makes one pass through the data
Q is updated once for each factor Equivalently, ˜ ti(u) is updated once
Issues with ADF
Earlier approximations ˜ ti(u) may be poor No way of going back and fixing the earlier approximations Depends on the order in which data is processed
In principle, ˜ ti can be updated multiple times EP effectively extends ADF allowing multiple passes
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Choose a ˜ ti(u) to refine
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ
P[s(u)] and normalizer Zi
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ
P[s(u)] and normalizer Zi
Update ˜ ti(u) = ZiQnew(u)/Q i(u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Expectation Propagation
Initialize the term approximations ˜ ti(u) Compute the posterior Q(u) = n
i=1 ˜
ti(u)
- z
n
i=1 ˜
ti(z)dz Until all ˜ ti converge
Choose a ˜ ti(u) to refine Remove ˜ ti(u) from Q(u) to get ‘old’ posterior Q i(u) ∝ Q(u)/˜ ti(u) Construct ˆ P ∝ ti(u)Q i(u) Get Qnew with µnew = Eˆ
P[s(u)] and normalizer Zi
Update ˜ ti(u) = ZiQnew(u)/Q i(u)
Estimate the data likelihood as P(D) ≈
- z
n
- i=1
˜ ti(z)dz
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Experiments
The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Experiments
The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)
n
- j=1
p(xj|u)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Experiments
The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)
n
- j=1
p(xj|u) Evaluation
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Experiments
The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)
n
- j=1
p(xj|u) Evaluation
Evidence/likelihood p(D) =
- u p(u, D)du
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Experiments
The clutter problem: p(u) = N(u; 0, 100I) p(x|u) = (1 − w)N(x; u, I) + wN(x; 0, 100I) For a set of observations D = {x1, . . . , xn} p(u, D) = p(u)
n
- j=1
p(xj|u) Evaluation
Evidence/likelihood p(D) =
- u p(u, D)du
Posterior mean E[u|D] =
- u up(u|D)du
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Results: Likelihood P(D)
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments
Results: Posterior Mean E[u|D]
Posterior Estimation Assumed Density Filtering Expectation Propagation Experiments