probabilistic unsupervised learning belief propagation
play

Probabilistic & Unsupervised Learning Belief Propagation - PowerPoint PPT Presentation

Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2016 Recall: Belief


  1. Probabilistic & Unsupervised Learning Belief Propagation Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2016

  2. Recall: Belief Propagation on undirected trees Joint distribution of undirected tree: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) Z X i X j nodes i edges ( ij ) Messages computed recursively: � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions: � p ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

  3. Loopy Belief Propagation Joint distribution of undirected graph: p ( X ) = 1 � � f i ( X i ) f ij ( X i , X j ) X i X j Z nodes i edges ( ij ) Messages computed recursively (with few guarantees of convergence): � � M j → i ( X i ) := f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) X j l ∈ ne ( j ) \ i Marginal distributions are approximate in general: � p ( X i ) ≈ b i ( X i ) ∝ f i ( X i ) M k → i ( X i ) k ∈ ne ( i ) � � p ( X i , X j ) ≈ b ij ( X i , X j ) ∝ f ij ( X i , X j ) f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i

  4. Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means).

  5. Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means). ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge.

  6. Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means). ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j

  7. Dealing with loops ◮ Accuracy : BP posterior marginals are approximate on all non-trees because evidence is over counted, but converged approximations are frequently found to be good (particularly in their means). ◮ Convergence : no general guarantee, but BP does converge in some cases: ◮ Trees. ◮ Graphs with a single loop. ◮ Distributions with sufficiently weak interactions. ◮ Graphs with long (and weak) loops ◮ Gaussian networks: means correct, variances may also converge. ◮ Damping : Common approach to encourage convergence (cf EP) � � M new i → j ( X j ) := ( 1 − α ) M old i → j ( X j ) + α f ij ( X i , X j ) f i ( X i ) M k → i ( X i ) X i k ∈ ne ( i ) \ j ◮ Grouping variables : Variables can be grouped into cliques to improve accuracy. ◮ Region graph approximations. ◮ Cluster variational method. ◮ Junction graph.

  8. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  9. Different Interpretations of Loopy Belief Propagation Loopy BP can be interpreted as a fixed point algorithm from a few different perspectives: ◮ Expectation propagation. ◮ Tree-based reparametrization. ◮ Bethe free energy.

  10. Loopy BP as message-based Expectation Propagation ⇒ Approximate pairwise factors f ij by product of messages: f ij ( X i , X j ) ≈ ˜ f ij ( X i , X j ) = M i → j ( X j ) M j → i ( X i ) Thus, the full joint is approximated by a factorised distribution: � � p ( X ) ≈ 1 f ij ( X i , X j ) = 1 � � � � � ˜ f i ( X i ) f i ( X i ) M j → i ( X i ) = b i ( X i ) Z Z nodes i edges ( ij ) nodes i j ∈ ne ( i ) nodes i but with multiple factors for most X i .

  11. Loopy BP as message-based EP X j X i Then the EP updates to the messages are:

  12. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

  13. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s )

  14. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )]

  15. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij ()

  16. Loopy BP as message-based EP X j X i Then the EP updates to the messages are: ◮ Deletion: � � � � q ¬ ij ( X i , X j ) = f i ( X i ) f j ( X j ) M k → i ( X i ) M l → j ( X j ) f s ( X s ) M t → s ( X s ) s � = i , j k ∈ ne ( i ) \ j l ∈ ne ( j ) \ i t ∈ ne ( s ) ◮ Projection: { M new i → j , M new j → i } = argmin KL [ f ij ( X i , X j ) q ¬ ij ( X i , X j ) � M j → i ( X i ) M i → j ( X j ) q ¬ ij ( X i , X j )] Now, q ¬ ij () factors ⇒ rhs factors ⇒ min is achieved by marginals of f ij () q ¬ ij () � � � � � M new j → i ( X i ) q ¬ ij ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) f i ( X i ) M k → i ( X i ) X j l ∈ ne ( j ) \ i k ∈ ne ( i ) \ j � � � �� � � � ⇒ M new j → i ( X i ) = f ij ( X i , X j ) f j ( X j ) M l → j ( X j ) q ¬ ij ( X i ) X j l ∈ ne ( j ) \ i

  17. Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture).

  18. Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages.

  19. Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages. ◮ On a tree-structured graph the message-factored version of EP finds the same marginals as standard EP .

  20. Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages. ◮ On a tree-structured graph the message-factored version of EP finds the same marginals as standard EP . ◮ Messages are calculated in exactly the same way as before (cf NLSSM).

  21. Message-based EP ◮ Thus message-based EP in a loopy graph need not be seen as two separate approximations one to the sites and one to the cavity (as we had in the EP lecture). ◮ Instead, we can see it as a more severe constraint on the approximate sites: not just to an ExpFam factor, but to a product of ExpFam messages. ◮ On a tree-structured graph the message-factored version of EP finds the same marginals as standard EP . ◮ Messages are calculated in exactly the same way as before (cf NLSSM). ◮ Pairwise marginals can be found after convergence by computing ˜ P ( y i − 1 , y i ) as required (cf Forward-backward for HMMs).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend