probabilistic unsupervised learning belief propagation
play

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014 Recall: Belief


  1. Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2014

  2. Recall: Belief Propagation on undirected trees Joint distribution of undirected tree: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) Z X i X j nodes i edges ( ij ) Messages computed recursively: � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions: � p ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

  3. Loopy Belief Propagation Joint distribution of undirected graph: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) X i X j Z nodes i edges ( ij ) Messages computed recursively (with few guarantees of convergence): � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions are approximate in general: � p ( X i ) ≈ b i ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ≈ b ij ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

  4. Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees, but converged approximations are frequently found to be good. ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j ◮ Grouping variables : Variables can be grouped into cliques to improve accuracy. ◮ Region graph approximations. ◮ Cluster variational method. ◮ Junction graph.

  5. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  6. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  7. Loopy BP as message-based Expectation Propagation ⇒ Approximate pairwise factors f ij by product of messages: f ij ( X i , X j ) ≈ ˜ f ij ( X i , X j ) = M i → j ( X j ) M j → i ( X i ) Thus, the full joint is approximated by a factorised distribution: � � p ( X ) ≈ 1 f ij ( X i , X j ) = 1 � � � � � ˜ f i ( X i ) f i ( X i ) M j → i ( X i ) = b i ( X i ) Z Z nodes i edges ( ij ) nodes i j ∈ ne ( i ) nodes i but with multiple factors for most X i .

  8. Loopy BP as message-based EP X j X i Then the EP updates to the messages are:

  9. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

  10. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

  11. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )]

  12. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij ()

  13. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij () � � � � � M new j → i ( X i ) q ¬ ij ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) f i ( X i ) M k → i ( X i ) X j l ∈ ne ( j ) \ i k ∈ ne ( i ) \ j � � � �� � � � ⇒ M new j → i ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) q ¬ ij ( X i ) X j l ∈ ne ( j ) \ i

  14. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  15. Loopy BP as tree-based reparametrisation Tree-structured distributions can be parametrised in many ways: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) undirected tree (1) Z nodes i edges ( ij ) � = p ( X r ) p ( X i | X pa ( i ) ) directed (rooted) tree (2) i � = r p ( X i , X j ) � � = p ( X i ) pairwise marginals (3) p ( X i ) p ( X j ) nodes i edges ( ij ) where (3) requires that � X j p ( X i , X j ) = p ( X i ) . The undirected tree representation is not unique—multiplying a factor f ij ( X i , X j ) by g ( X i ) and dividing f i ( X i ) by the same g ( X i ) does not change the distribution. BP can be seen as an iterative replacement of f i ( X i ) by the local marginal of p ij ( X i , X j ) , along with the corresponding reparametrisation of f ij ( X i , X j ) . Cf. Hugin propagation. Converged BP on a tree finds p ( X i ) and p ( X i , X j ) , allowing us to transform (1) to (3).

  16. Reparametrisation on trees X e X b � p ( X ) = f ij ( X i , X j ) X f ( ij ) X a X d ⇓ p ( X i , X j ) � � X c p ( X ) = p ( X i ) p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) :

  17. Reparametrisation on trees X e X b � f de 1 · f ab · 1 p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j

  18. Reparametrisation on trees X e X b � f de 1 · f ab · 1 p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� � M j → i

  19. Reparametrisation on trees X e X b f ab � f de M b → a p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � f ac X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� � f n − 1 ( X i , X j ) M j → i f n ij ij ( X i , X j ) = M j → i ( X i )

  20. Reparametrisation on trees X e X b f ab � f de M b → a p ( X ) = f ij ( X i , X j ) f df X f f ad ( ij ) X a X d ⇓ p ( X i , X j ) � � X c p ( X ) = p ( X i ) M b → a f dg p ( X i ) p ( X k ) 1 · f ac · M b → a X g i ( ij ) Define f 0 ij = f ij , f 0 i = p 0 i = 1. Iterate over edges ( ij ) : p n ( X i , X j ) = f n − 1 ( X i ) f n − 1 ( X i , X j ) f n − 1 ( X j ) i ij j � � p n ( X i , X j ) = f n − 1 f n − 1 ( X i , X j ) f n − 1 f n i ( X i ) = p n ( X i ) = ( X i ) ( X j ) i ij j X j X j � �� � f n − 1 ( X i , X j ) M j → i f n ij ij ( X i , X j ) = M j → i ( X i )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend