Approximate Inference Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

approximate inference
SMART_READER_LITE
LIVE PREVIEW

Approximate Inference Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary Approximate Inference Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA


slide-1
SLIDE 1

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Approximate Inference

Henrik I. Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I. Christensen (RIM@GT) Approximate Inference 1 / 36

slide-2
SLIDE 2

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Outline

1

Introduction

2

Variational Inference

3

Variational Mixture of Gaussians

4

Exponential Family

5

Expectation Propagation

6

Summary

Henrik I. Christensen (RIM@GT) Approximate Inference 2 / 36

slide-3
SLIDE 3

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Introduction

We often are required to estimate a (conditional) prior of the form p(Z|X) The solution might be intractable

1

There might not be a close form solution

2

The integration over X or a parameter space θ might be computationally challenging

3

The set of possible outcomes might be significant/exponential

Two strategies

1

Deterministic Approximation Methods

2

Stochastic Sampling (Monte Carlo Techniques)

Today we will talk about deterministic techniques

Henrik I. Christensen (RIM@GT) Approximate Inference 3 / 36

slide-4
SLIDE 4

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Outline

1

Introduction

2

Variational Inference

3

Variational Mixture of Gaussians

4

Exponential Family

5

Expectation Propagation

6

Summary

Henrik I. Christensen (RIM@GT) Approximate Inference 4 / 36

slide-5
SLIDE 5

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Variational Inference

In general we have a Bayesian Model as seen earlier, ie. ln p(X) = ln p(X, Z) − ln p(Z|X) We can rewrite this to ln p(X) = L(q) + KL(q||p) where L(q) =

  • q(Z) ln

p(X, Z) q(Z)

  • KL(q||p)

= −

  • q(Z) ln

p(Z|X) q(Z)

  • So L(q) is an estimate of the joint distribution and KL is the

Kullback-Leibler comparison of q(Z) to p(Z|X).

Henrik I. Christensen (RIM@GT) Approximate Inference 5 / 36

slide-6
SLIDE 6

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Factorized Distributions

Assume for now that we can factorize Z into disjoint groups so that q(Z) =

M

  • i=1

qi(Zi) In physics a similar model has been adopted termed mean field theory We can them optimize L(q) through a component wise optimization L(q) =

i

qi   ln p(X, Z) −

  • j

qj    dZ =

  • qj ln ˜

p(X, Zj)dZj −

  • qj ln qjdZj + const

where ˜ p(X, Zj) = Ei=j[ln p(X, Z)] + c = ln p(X, Z)

  • i=j

qidZi + c

Henrik I. Christensen (RIM@GT) Approximate Inference 6 / 36

slide-7
SLIDE 7

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Factorized distributions

The optimal solution is now ln q∗

j (Zj) = Ei=j[ln p(X, Z)] + c

Ie the solution where every factor minimizes the influence on L(q)

Henrik I. Christensen (RIM@GT) Approximate Inference 7 / 36

slide-8
SLIDE 8

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Outline

1

Introduction

2

Variational Inference

3

Variational Mixture of Gaussians

4

Exponential Family

5

Expectation Propagation

6

Summary

Henrik I. Christensen (RIM@GT) Approximate Inference 8 / 36

slide-9
SLIDE 9

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Variational Mixture of Gaussians

We encounter mixtures of Gaussians all the time Examples are multi-wall modelling, ambiguous localization, ... We have:

a set of observed data X, a set of latent variables, Z that describe the mixture

Henrik I. Christensen (RIM@GT) Approximate Inference 9 / 36

slide-10
SLIDE 10

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixture of Gaussians - Modelling

We can model the mixture model p(Z|π) =

N

  • n=1

K

  • k=1

πznk

k

We can also derive the observed conditional p(X|Z, µ, Λ) =

N

  • n=1

K

  • k=1

N(xn|µk, Λ−1

k )znk

We will for now assume that mixtures are modelled as diraclets p(π) = Dir(π|α0) = C(α0)

K

  • k=1

πα0−1

k

Henrik I. Christensen (RIM@GT) Approximate Inference 10 / 36

slide-11
SLIDE 11

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixture of Gaussians - Modelling

The component processes can be modelled as a Gaussian-Wishart p(µ, Λ) = p(µ|Λ)p(Λ) =

K

  • k=1

N(µk|m0, (β0Λk)−1)W (Λk|W0, ν0) Ie a total model of

xn zn N π µ Λ

Henrik I. Christensen (RIM@GT) Approximate Inference 11 / 36

slide-12
SLIDE 12

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Variational

The conditional model can be seen as p(X, Z, π, µ, Λ) = p(X|Z, µ, Λ)p(Z|π)p(π)p(µ|Λ)p(Λ) Only X is observed We can now consider the selection of a distribution q(Z, π, µ, Λ) = q(Z)q(π, µ, Λ) this is clear an assumption of independence. We can use the general result of component-wise optimization ln q∗(Z) = Eπ,µ,Λ[ln p(X, Z, π, µ, Λ] + const Decomposition gives us ln q∗(Z) = Eπ[ln p(Z|π)] + Eµ,Λ[ln p(X|Z, µ, Λ)] + const ln q∗(Z) =

N

  • n=1

K

  • k=1

znk ln ρnk + const

Henrik I. Christensen (RIM@GT) Approximate Inference 12 / 36

slide-13
SLIDE 13

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Variational

We can further achieve

ln ρnk = E[ln πk]+ 1 2E[ln |Λk|]− D 2 ln 2π− 1 2Eµk,Λk[(xn−µk)TΛk(xn−µk)]+c

Taking the exponential we have q∗(Z) ∝

K

  • k=1

N

  • n=1

ρznk

nk

Using normalization we arrive at q∗(Z) ∝

K

  • k=1

N

  • n=1

rznk

nk

Where rnk = ρnk

  • j ρnj

Henrik I. Christensen (RIM@GT) Approximate Inference 13 / 36

slide-14
SLIDE 14

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Variational

Just as we saw for EM we can define Nk =

N

  • n=1

rnk ¯ xk = 1 Nk

N

  • n=1

rnkxn Sk = 1 Nk

N

  • n=1

rnk(xn − ¯ xn)(xn − ¯ xn)T

Henrik I. Christensen (RIM@GT) Approximate Inference 14 / 36

slide-15
SLIDE 15

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Parameters/Mixture

Lets now consider q(π, µ, Λ) to arrive at

ln q∗(π, µ, Λ) = ln p(π) +

K

X

k=1

ln p(µk , Λk ) + EZ [ln p(Z|π)] +

k

X

k=1 N

X

n=1

E[znk ] ln N(xn|µk , Λ−1

k

) + c

We can partition the problem into q(π, µ, Λ) = q(π)

K

  • k=1

q(µk, Λk) We can derive ln q∗(π) = (α0 − 1)

K

  • k=1

ln πk +

K

  • k=1

N

  • n=1

rnk ln πk + c We can now derive q∗(π) = Dir(π|α) where αk = α0 + Nk

Henrik I. Christensen (RIM@GT) Approximate Inference 15 / 36

slide-16
SLIDE 16

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Parameters/Mixture

We can then derive q∗(µk, Λk) = N(µk|mk, (βkΛk)−1)W (λk|Wk, νk) where βk = β0 + Nk mk = 1 βk (β0m0 + Nk¯ xk) W −1

K

= W −1 + NkSk + β0Nk β0 + Nk (¯ xk − m0)(¯ xk − m0)T νk = ν0 + Nk + 1

Henrik I. Christensen (RIM@GT) Approximate Inference 16 / 36

slide-17
SLIDE 17

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Parameters

We can now arrive at the parameters

Eµk,Λk[(xn − µk)T(xn − µk)] = Dβ−1

k

+ νk(xn − mk)TWK(xn − mk) ln ˜ Λk = E[ln |Λ|k|] =

D

  • i=1

ψ νk + 1 − i 2

  • + D ln 2 + ln |Wk|

ln ˜ πk = E[ln πk] = ψ(αk) − ψ(ˆ α)

here ψ(.) which is defined as d/da ln Γ(a) also known as the digramma

  • function. The last two results are given by the Gauss-Wishart

Henrik I. Christensen (RIM@GT) Approximate Inference 17 / 36

slide-18
SLIDE 18

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixtures of Gaussians - Parameters

We can finally find the responsibilities rnk ∝ πk|Λk|1/2 exp

  • −1

2(xn − µk)TΛk(xn − µk)

  • The optimization is stepwise

1

Estimate µ, Λ and then rnk

2

Estimate π and Z

3

Check for convergence - return to 1 if not converged

Henrik I. Christensen (RIM@GT) Approximate Inference 18 / 36

slide-19
SLIDE 19

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Mixture of Gaussians - Example

15 60 120

Henrik I. Christensen (RIM@GT) Approximate Inference 19 / 36

slide-20
SLIDE 20

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

MoG - Varional Lower Bound

We can estimate the best fit / lower bound

L = E[ln p(X|Z, µ, Λ)] + E[ln p(Z|pi)] + E[ln p(µ, Λ)] − E[ln q(Z)] − E[ln q(π)] − E[ln q(µ, Λ)]

E[ln p(X|Z, µ, Λ)] = 1 2

  • k

Nk

  • ln ˜

Λk − Dβ−1

k

− νkTr(SkWk) −νk(¯ xk − mk)TWK(¯ xk − mk) − D ln 2π

  • E[ln p(Z|π)]

=

  • n
  • k

rnk ln rnk E[ln p(π)] = ln C(α0) + (α0 − 1)

  • k

ln ˜ πk . . . = . . . (see book)

Henrik I. Christensen (RIM@GT) Approximate Inference 20 / 36

slide-21
SLIDE 21

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Outline

1

Introduction

2

Variational Inference

3

Variational Mixture of Gaussians

4

Exponential Family

5

Expectation Propagation

6

Summary

Henrik I. Christensen (RIM@GT) Approximate Inference 21 / 36

slide-22
SLIDE 22

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Exponential Family Distribution

Recall from 3rd lecture: Exponential family p(x|η) = h(x)g(η) exp

  • ηTu(x)
  • where η represent the “natural parameters”

g(η) is the normalization “factor” u(x) is some general function of data

Henrik I. Christensen (RIM@GT) Approximate Inference 22 / 36

slide-23
SLIDE 23

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Exponential Family Distribution

The joint distribution for observed and latent variables is then p(X, Z|η) =

N

  • n=1

h(xn, zn)g(η) exp

  • ηTu(xn, zn)
  • The conjugate prior for η is then

p(η|ν0, v0) = f (ν0, χ0)g(η)ν0 exp

  • ν0ηTχ
  • where ν0 is prior number of observations and χ is the sufficient

statistics (moments)

Henrik I. Christensen (RIM@GT) Approximate Inference 23 / 36

slide-24
SLIDE 24

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Exponential Family Distribution - Variational

As before we can compute ln q∗(Z) = Eη[ln p(X, Z|η)] + const =

  • n
  • ln h(xn, zn) + E[ηT]u(xn, zn)
  • + const

i.e. a sum of independent terms Taking exponential on both sides we have q∗(zn) = h(xn, zn)g(E[η]) exp

  • E[ηT]u(xn, zn)
  • Henrik I. Christensen (RIM@GT)

Approximate Inference 24 / 36

slide-25
SLIDE 25

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Exponential Family Distribution - Variational

Similarly the natural parameters can be optimized by ln q∗(η) = ln p(η|ν0, χ0) + EZ[ln p(X, Z|η)] + const Which expands to

ln q∗(η) = ν0 ln g(η) + ηTχ0 + ln g(η) + ηTEzn[u(xn, zn)]

  • + const

Using the trick of exponentials on both sides we have q∗(η) = f (νN, χN)g(η)νN exp

  • ηTχN
  • where

νN = ν0 + N χn = χ0 +

  • n

Ezn[u(xn, zn)]

Henrik I. Christensen (RIM@GT) Approximate Inference 25 / 36

slide-26
SLIDE 26

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Exponential Family Distribution - Variational

As expected the solution is iterative q∗(zn) and q∗(η) are coupled. In the E step compute E[u(xn, zn)] - the sufficient statistics and compute q(η) In the M step use the estimate to maximize the estimate for q(zn) and compute E[ηT]

Henrik I. Christensen (RIM@GT) Approximate Inference 26 / 36

slide-27
SLIDE 27

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Outline

1

Introduction

2

Variational Inference

3

Variational Mixture of Gaussians

4

Exponential Family

5

Expectation Propagation

6

Summary

Henrik I. Christensen (RIM@GT) Approximate Inference 27 / 36

slide-28
SLIDE 28

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation

Fundamentally we are trying to match distributions to the data and match up the natural parameters. I.e. find the “best”family of distributions and at the same time fit the parameter. In the end we are trying to minimize the Kullback-Leibler (KL) with respect to q(z) Consider for a minute KL(p||q) where p(z) is fixed and q(z) is a member of the exponential family q(z) = h(z)g(η) exp

  • ηTu(z)
  • Henrik I. Christensen (RIM@GT)

Approximate Inference 28 / 36

slide-29
SLIDE 29

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation - Optimization

The Kullback - Leibler is then KL(p||q) = − ln g(η) − ηTEp(z)[u(z)] + const The extrema is then given by −∇ ln g(η) = Ep(z)[u(z)] i.e. the best estimate is to match q(z) to p(z) by setting “natural parameters” to the sufficient statistics (moment matching). I.e. q(z) = N(z|µ, Σ) as a model for the data

Henrik I. Christensen (RIM@GT) Approximate Inference 29 / 36

slide-30
SLIDE 30

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation - Modelling

Consider a model with factorized probabilities p(D, θ) =

  • i

fi(θ) where fi(theta) = p(xn|θ) and you might have a prior f0(θ) = p(θ). The posterior is then p(θ|D) = 1 p(D)

  • i

fi(θ) The model evident is given by p(D) =

i

fi(θ)dθ

Henrik I. Christensen (RIM@GT) Approximate Inference 30 / 36

slide-31
SLIDE 31

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation - Computing

The estimate is then q(θ) = 1 Z

  • i

˜ fi(θ) q(θ) can be factorized so that each term is optimized Through optimization factor-by-factor it is possible to generate an estimate - take-one-out-and-optimize

Henrik I. Christensen (RIM@GT) Approximate Inference 31 / 36

slide-32
SLIDE 32

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation - Algorithm

Initialize factor approximation - ˜ fi(θ) Initialize posterior estimate - q(θ) ∝

i ˜

fi(θ) iterate

1

Choose a factor to refine

2

Remove ˜ fj(θ) from prior q\j = q/f

3

Evaluate new posterior/sufficient statistics

4

Update factors

5

Evaluate aproximation

Henrik I. Christensen (RIM@GT) Approximate Inference 32 / 36

slide-33
SLIDE 33

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation - Example

θ x −5 5 10

p(x|θ) = (1 − w)N(x|θ, I) + wN(x|0, aI)

Henrik I. Christensen (RIM@GT) Approximate Inference 33 / 36

slide-34
SLIDE 34

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Expectation Propagation - Example

θ −5 5 10 θ −5 5 10

Henrik I. Christensen (RIM@GT) Approximate Inference 34 / 36

slide-35
SLIDE 35

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Outline

1

Introduction

2

Variational Inference

3

Variational Mixture of Gaussians

4

Exponential Family

5

Expectation Propagation

6

Summary

Henrik I. Christensen (RIM@GT) Approximate Inference 35 / 36

slide-36
SLIDE 36

Introduction Variational Inference Mixture of Gaussians Exponential Family Expectation Propagation Summary

Summary

Often computation of complete model is a challenge Two ways to approximate computations

Deterministic Approximations Sampling Based Methods

Many tricks for approximation Factorization is typically a first strategy Iterative optimization of factors Next time we will talk about sampling based methods

Henrik I. Christensen (RIM@GT) Approximate Inference 36 / 36