Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014

Recall: Belief Propagation on undirected trees Joint distribution of undirected tree: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) Z X i X j nodes i edges ( ij ) Messages computed recursively: � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions: � p ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

Loopy Belief Propagation Joint distribution of undirected graph: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) X i X j Z nodes i edges ( ij ) Messages computed recursively (with few guarantees of convergence): � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions are approximate in general: � p ( X i ) ≈ b i ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ≈ b ij ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees, but converged approximations are frequently found to be good. ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j ◮ Grouping variables : Variables can be grouped into cliques to improve accuracy. ◮ Region graph approximations. ◮ Cluster variational method. ◮ Junction graph.

Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

Loopy BP as message-based Expectation Propagation ⇒ Approximate pairwise factors f ij by product of messages: f ij ( X i , X j ) ≈ ˜ f ij ( X i , X j ) = M i → j ( X j ) M j → i ( X i ) Thus, the full joint is approximated by a factorised distribution: � � p ( X ) ≈ 1 f ij ( X i , X j ) = 1 � � � � � ˜ f i ( X i ) f i ( X i ) M j → i ( X i ) = b i ( X i ) Z Z nodes i edges ( ij ) nodes i j ∈ ne ( i ) nodes i but with multiple factors for most X i .

Loopy BP as message-based EP X j X i Then the EP updates to the messages are:

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )]

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij ()

Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij () � � � � � M new j → i ( X i ) q ¬ ij ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) f i ( X i ) M k → i ( X i ) X j l ∈ ne ( j ) \ i k ∈ ne ( i ) \ j � � � �� ⇒ M new j → i ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) q ¬ ij ( X i ) X j l ∈ ne ( j ) \ i

Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

Loopy BP as tree-based reparametrisation Tree-structured distributions can be parametrised in many ways: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) undirected tree (1) Z nodes i edges ( ij ) � = p ( X r ) p ( X i | X pa ( i ) ) directed (rooted) tree (2) i � = r p ( X i , X j ) � � = p ( X i ) pairwise marginals (3) p ( X i ) p ( X j ) nodes i edges ( ij ) where (3) requires that � X j p ( X i , X j ) = p ( X i ) . The undirected tree representation is not unique—multiplying a factor f ij ( X i , X j ) by g ( X i ) and dividing f i ( X i ) by the same g ( X i ) does not change the distribution. BP can be seen as an iterative replacement of f i ( X i ) by the local marginal of p ij ( X i , X j ) , along with the corresponding reparametrisation of f ij ( X i , X j ) . Cf. Hugin propagation. Converged BP on a tree finds p ( X i ) and p ( X i , X j ) , allowing us to transform (1) to (3).

Reparametrisation on trees X e X b � p ( X ) = f ij ( X i , X j ) X f ( ij ) X a X d ⇓ p ( X i , X j ) � � X c p ( X ) = p ( X i ) p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) :

Reparametrisation on trees X e X b � f de 1 · f ab · 1 p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j

Reparametrisation on trees X e X b � f de 1 · f ab · 1 p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� M j → i

Reparametrisation on trees X e X b f ab � f de M b → a p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� f n − 1 ( X i , X j ) M j → i f n ij ij ( X i , X j ) = M j → i ( X i )

Reparametrisation on trees X e X b f ab � f de M b → a p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) 1 · f ac · M b → a X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� f n − 1 ( X i , X j ) M j → i f n ij ij ( X i , X j ) = M j → i ( X i )

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014 Recall: Belief

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Shuffled Belief Propagation Decoding Juntan Zhang and Marc Fossorier Department of Electrical

An empirical study of Gaussian belief propagation and application in the detection of F-formations

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

The web version sidebar of webinar screen looks like this: We will begin shortly. Until then,

1 About Jo Miller, GPC, CSMS Jo Miller, GPC, CSMS, Owner, JM Grants, J. Miller &

What the heck is time-series data (and why do I need a time-series database?) Ajay Kulkarni |

JCCC Johnson County Community College Office of Institutional Research - COM 305 12345 College

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Polygons as optimal shapes with convexity constraint Jimmy Lamboley Arian Novruzi Ecole

PERVASIVE Home ! Work ! Play 2 2 Pervasive (Home) TURBOCHEF www.turbochef.com MOXI

Fi Fiscal Transparency and the l T d th impact of International Public Sector Accounting

Sambuz

Useful Links

Newsletter

Mail Us

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014 Recall: Belief

Probabilistic &amp; Unsupervised Learning Belief Propagation Maneesh Sahani

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Shuffled Belief Propagation Decoding Juntan Zhang and Marc Fossorier Department of Electrical

An empirical study of Gaussian belief propagation and application in the detection of F-formations

Probabilistic &amp; Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic &amp; Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

The web version sidebar of webinar screen looks like this: We will begin shortly. Until then,

1 About Jo Miller, GPC, CSMS Jo Miller, GPC, CSMS, Owner, JM Grants, J. Miller &amp;

What the heck is time-series data (and why do I need a time-series database?) Ajay Kulkarni |

JCCC Johnson County Community College Office of Institutional Research - COM 305 12345 College

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Polygons as optimal shapes with convexity constraint Jimmy Lamboley Arian Novruzi Ecole

PERVASIVE Home ! Work ! Play 2 2 Pervasive (Home) TURBOCHEF www.turbochef.com MOXI

Fi Fiscal Transparency and the l T d th impact of International Public Sector Accounting

Sambuz

Useful Links

Newsletter

Mail Us

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

1 About Jo Miller, GPC, CSMS Jo Miller, GPC, CSMS, Owner, JM Grants, J. Miller &