CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) - - PDF document

cs786 lecture 12 may 12 2012
SMART_READER_LITE
LIVE PREVIEW

CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) - - PDF document

26/06/2012 CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) [KF Chapter 11] CS786 P. Poupart 2012 1 Cluster Tree Recap Variable elimination: Induces a cluster tree Inference: message propagation on cluster tree


slide-1
SLIDE 1

26/06/2012 1

CS786 Lecture 12: May 12, 2012

Inference as Optimization (continued) [KF Chapter 11]

CS786 P. Poupart 2012 1

Cluster Tree Recap

  • Variable elimination:

– Induces a cluster tree – Inference: message propagation on cluster tree

  • Cluster tree:

– Graph is a tree (i.e., no loops) – Node: cluster of variables – Edge: subset of variables (a.k.a. sepset) that are common to nodes it connects – Satisfies running intersection property

CS786 P. Poupart 2012 2

slide-2
SLIDE 2

26/06/2012 2

Cluster Tree Calibration

  • : variables in the cluster at node

: factor at node

  • : variables in sepset at edge

: factor at edge

  • Calibrated cluster tree

For all edges : sepset factor is marginal of cluster factors

  • ∈\
  • ∈\

CS786 P. Poupart 2012 3

Calibration by Message Passing

  • Initialization:

– Messages: → ← 1 and

→ ← 1 ∀

– Potentials: ← ∏ potentials associated with

  • Update messages until calibration

→ ←

∈\ ∈\

  • Return

← ∏ →

← →

CS786 P. Poupart 2012 4

slide-3
SLIDE 3

26/06/2012 3

Properties of Calibrated Trees

  • Normalized : marginal of

– i.e., Pr

  • Normalized : marginal of

– i.e., Pr

  • The ’s and ’s can be used to simultaneously

answer many marginal queries

CS786 P. Poupart 2012 5

Loopy Belief Propagation

  • Approximate inference
  • Consider cluster graph (with loops) instead of

a cluster tree

– Scalability: clusters can be much smaller – Approximation: calibrated cluster graph does not necessarily yield correct marginals

CS786 P. Poupart 2012 6

slide-4
SLIDE 4

26/06/2012 4

Cluster Graph

  • Same as cluster tree but loops are allowed:

– Any graph structure is allowed – Node: cluster of variables – Edge: subset of variables (a.k.a. sepset) that are common to nodes it connects

  • Generalized running intersection property

– Whenever variable is in clusters and then there is exactly one path between and such that ∈ for all edges in that path

CS786 P. Poupart 2012 7

Cluster Graph Calibration

  • Same algorithm as for cluster tree calibration
  • Disadvantages:

– Convergence is not guaranteed

  • Damping techniques may be used to ensure convergence

– When convergence is achieved:

  • ’s and ’s are not necessarily the correct marginals for ,
  • Advantages:

– Approximation is often good in practice and inference scales linearly with the size of the graph

CS786 P. Poupart 2012 8

slide-5
SLIDE 5

26/06/2012 5

Expectation Propagation

  • Alternative approximation for inference
  • Idea: stick with cluster tree, but approximate the

messages

  • Consequence: propagate expectations of some

statistics instead of full marginals in each sepset

CS786 P. Poupart 2012 9

Example

CS786 P. Poupart 2012 10

slide-6
SLIDE 6

26/06/2012 6

Cluster Tree with Factored Potentials

  • : variables in the cluster at node

: product of factors at node

  • : variables in sepset at edge

: product of factors at edge

  • Calibrated cluster tree (same as before)

For all edges :

  • ∈\
  • ∈\

CS786 P. Poupart 2012 11

Calibration with Factored Messages

  • Initialization:

– Messages: → ← 1 and

→ ← 1 ∀

– Potentials: ← Set of potentials associated with

  • Update messages until calibration

→ ←

∈\ ∈\

  • Return

← ∏ →

← →

CS786 P. Poupart 2012 12

slide-7
SLIDE 7

26/06/2012 7

Projection

  • Approximate distribution by the “closest”

distribution from some class of distributions.

  • Examples:

– Factorization: joint distribution  product of marginals

  • – Mixture of Gaussians  single Gaussian

∑ |,

  • |,

– Mixture of Dirichlets  single Dirichlet:

∑ |

  • |

CS786 P. Poupart 2012 13

KL‐Divergence

  • Common distance measure for projections
  • KL‐divergence (a.k.a relative entropy) definition

| log

  • Since | ||, we can also use

| log

  • CS786 P. Poupart 2012

14

slide-8
SLIDE 8

26/06/2012 8

Exponential Family

  • Projection by KL‐divergence corresponds to

matching expectation of some statistics

  • Exponential Family

: vector of parameters defining : vector of statistics ∝ exp ,

CS786 P. Poupart 2012 15

Examples

  • Bernoulli: Pr

, ln , ln1 Pr exp , ⋅ ⋅

  • Gaussian: Pr
  • exp
  • ,

,

  • ,
  • CS786 P. Poupart 2012

16