cs786 lecture 12 may 12 2012
play

CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) - PDF document

26/06/2012 CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) [KF Chapter 11] CS786 P. Poupart 2012 1 Cluster Tree Recap Variable elimination: Induces a cluster tree Inference: message propagation on cluster tree


  1. 26/06/2012 CS786 Lecture 12: May 12, 2012 Inference as Optimization (continued) [KF Chapter 11] CS786 P. Poupart 2012 1 Cluster Tree Recap • Variable elimination: – Induces a cluster tree – Inference: message propagation on cluster tree • Cluster tree: – Graph is a tree (i.e., no loops) – Node: cluster of variables – Edge: subset of variables (a.k.a. sepset) that are common to nodes it connects – Satisfies running intersection property CS786 P. Poupart 2012 2 1

  2. 26/06/2012 Cluster Tree Calibration • � � : variables in the cluster at node � � � � � : factor at node � • � �� : variables in sepset at edge � � � � �� �� �� � : factor at edge � � � • Calibrated cluster tree For all edges � � � : sepset factor is marginal of cluster factors � �� � �� � � � � �� � � � � � � �� � � �∈� � \� �� �∈� � \� �� CS786 P. Poupart 2012 3 Calibration by Message Passing • Initialization: – Messages: � �→� ← 1 and � �→� ← 1 ∀�� – Potentials: � � ← ∏ potentials associated with � � • Update messages until calibration � �→� �� �� � ← � � � � � � � �→� � �� �∈� � \� �� �∈�����\��� • Return � � � � ← � � � � ∏ � �→� �� �� � �∈����� � �� � �� ← � �→� � �� � �→� �� �� � CS786 P. Poupart 2012 4 2

  3. 26/06/2012 Properties of Calibrated Trees • Normalized � � : marginal of � � – i.e., Pr � � � � � �� � � • Normalized � �� : marginal of � �� – i.e., Pr � �� � � �� �� �� � • The � � ’s and � �� ’s can be used to simultaneously answer many marginal queries CS786 P. Poupart 2012 5 Loopy Belief Propagation • Approximate inference • Consider cluster graph (with loops) instead of a cluster tree – Scalability: clusters can be much smaller – Approximation: calibrated cluster graph does not necessarily yield correct marginals CS786 P. Poupart 2012 6 3

  4. 26/06/2012 Cluster Graph • Same as cluster tree but loops are allowed: – Any graph structure is allowed – Node: cluster of variables – Edge: subset of variables (a.k.a. sepset) that are common to nodes it connects • Generalized running intersection property – Whenever variable � is in clusters � � and � � then there is exactly one path between � � and � � such that � ∈ � � for all edges � in that path CS786 P. Poupart 2012 7 Cluster Graph Calibration • Same algorithm as for cluster tree calibration • Disadvantages: – Convergence is not guaranteed • Damping techniques may be used to ensure convergence – When convergence is achieved: • � � ’s and � �� ’s are not necessarily the correct marginals for � � , � �� • Advantages: – Approximation is often good in practice and inference scales linearly with the size of the graph CS786 P. Poupart 2012 8 4

  5. 26/06/2012 Expectation Propagation • Alternative approximation for inference • Idea: stick with cluster tree, but approximate the messages • Consequence: propagate expectations of some statistics instead of full marginals in each sepset CS786 P. Poupart 2012 9 Example CS786 P. Poupart 2012 10 5

  6. 26/06/2012 Cluster Tree with Factored Potentials • � � : variables in the cluster at node � � � � � : product of factors at node � • � �� : variables in sepset at edge � � � � �� �� �� � : product of factors at edge � � � • Calibrated cluster tree (same as before) For all edges � � � : � �� � �� � � � � �� � � � � � � �� � � �∈� � \� �� �∈� � \� �� CS786 P. Poupart 2012 11 Calibration with Factored Messages • Initialization: – Messages: � �→� ← 1 and � �→� ← 1 ∀�� – Potentials: � � ← Set of potentials associated with � � • Update messages until calibration � �→� �� �� � ← ������� � � � � � � � �→� � �� �∈� � \� �� �∈�����\��� • Return � � � � ← � � � � ∏ � �→� �� �� � �∈����� � �� � �� ← � �→� � �� � �→� �� �� � CS786 P. Poupart 2012 12 6

  7. 26/06/2012 Projection • Approximate distribution � by the “closest” distribution � from some class of distributions. • Examples: – Factorization: joint distribution  product of marginals ���� � ∏ ��� � � � – Mixture of Gaussians  single Gaussian ∑ � � � � ��|� � , � � � � ���|�, �� � – Mixture of Dirichlets  single Dirichlet: ∑ � � � � ��|� � � � ���|�� � CS786 P. Poupart 2012 13 KL ‐ Divergence • Common distance measure for projections • KL ‐ divergence (a.k.a relative entropy) definition ����| � � � � � log � � � � � • Since ����| � � ����||�� , we can also use ����| � � � � � log � � � � � CS786 P. Poupart 2012 14 7

  8. 26/06/2012 Exponential Family • Projection by KL ‐ divergence corresponds to matching expectation of some statistics • Exponential Family � : vector of parameters defining � �: vector of statistics � � � ∝ exp �� � � , � � �� CS786 P. Poupart 2012 15 Examples • Bernoulli: Pr � � � � � � � � � � � � � � � , � � � � � � � � � � ln � , ln�1 � �� � Pr � � � � � exp � � � , � � � � � �⋅�� ���⋅�� ��� � � ��� exp � ��� � � • Gaussian: Pr � � �� � � � � � �, � � � � �, � � � � � � � � , � �� � � CS786 P. Poupart 2012 16 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend