Graphical Models Graphical Models Loopy BP and Bethe Free Energy - PowerPoint PPT Presentation

Graphical Models Graphical Models Loopy BP and Bethe Free Energy Siamak Ravanbakhsh Winter 2018

Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation

So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree

So far... So far... exact inference: variable elimination equivalent to belief propagation (BP) in a clique tree This class... This class... what if the exact inference is too expensive? (i.e., the tree-width is large) continue to use BP: loopy BP why is this a good idea? answer using variational interpretation

Recap: BP in clique trees Recap : BP in clique trees sum-product BP message update: ( S ) = ψ ( C ) ( S ) ∑ C − S i ∏ k ∈ Nb − j δ δ i → j i , j k → i i , k i i , j i i sepset cluster/clique from leaves towards the root back to leaves

Recap: BP in clique trees Recap : BP in clique trees sum-product BP message update: ( S ) = ψ ( C ) ( S ) ∑ C − S i ∏ k ∈ Nb − j δ δ i → j i , j k → i i , k i i , j i i sepset cluster/clique from leaves towards the root back to leaves marginal (belief) for each cluster: p ( C ) ∝ β ( C ) = ψ ( C ) ( S ) i ∏ k ∈ Nb i δ k → i i , k i i i i i

Clique-tree for Clique-tree for tree structures tree structures x 1 x 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x 2 x 4 x 3 x 6 one possible clique-tree what are the sepsets? one cluster per factor

Clique-tree for tree structures Clique-tree for tree structures x 1 x 5 pairwise potentials ( x , x ) ϕ i , j i j tree width = 1 x 2 x 4 x 3 x 6 one possible clique-tree what are the sepsets? one cluster per factor a different valid clique-tree check for running intersection property

BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x i x j ( x ) = ( x , x ) ( x ) ∑ x i j ∏ k ∈ Nb − j δ ϕ δ i → j i , j k → i j i i i from leaves towards a root back to leaves one cluster per factor

BP for BP for tree structures tree structures pairwise potentials ( x , x ) ϕ i , j i j message update x i x j ( x ) = ( x , x ) ( x ) ∑ x i j ∏ k ∈ Nb − j δ ϕ δ i → j i , j k → i j i i i from leaves towards a root back to leaves marginal (belief) for each cluster one cluster per factor p ( x ) ∝ ( x ) ∏ k ∈ Nb i δ k → i i i i ( x , x ) ∝ ϕ ( x , x ) ( x ) ( x ) j ∏ k ∈ Nb − j i ∏ k ∈ Nb − i p δ δ i , j i , j k → i k → j i j i j i j

BP for tree structures: BP for tree structures: reparametrization reparametrization graphical model represents 1 ∏ i , j ∈ E * p ( x ) = ( x , x ) ϕ i , j i j z write it in terms of marginals ( x , x ) ∏ i , j ∈ E p i , j p ( x ) = i j ∣ Nb ∣−1 i ∏ i p i one cluster per factor why is this correct? the denominator is adjusting for double-counts substitute the marginals using BP messages to get (*)

Variational Variational interpretation interpretation BP as I-projection arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i write q in terms of marginals of interest minimization gives us the marginals q , q i , j i

Variational Variational free energy free energy q ( x )(ln q ( x ) − ln p ( x )) D ( q ∥ p ) = ∑ x − H ( q ) E [ ln ϕ ( x , x )] − ln( Z ) q ∑ i , j i , j i j = − H ( q ) − E [ ln ϕ ( x , x )] + ln Z q ∑ i , j i , j i j ignore: does not depend on q I-projection is equivalent to arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j variational free energy free energy is a lower-bound on ln Z

Simplifying the free energy Simplifying the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i ≡ arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j so far did not use the decomposed form of q both entropy and energy involve summation over exponentially many terms

Simplifying Simplifying the free energy the free energy arg min D ( q ∥ p ) q 1 ∏ k p ( x ) = ( x , x ) ϕ i , j i j Z ( x , x ) ∏ i , j ∈ E q i , j q ( x ) = i j i ∣ Nb ∣−1 q ( x ) ∏ i i i ≡ arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i j i j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j follows from the decomposition of q i i

Variational interpretation: Variational interpretation: marginal constraints marginal constraints arg max H ( q ) + E [ ln ϕ ( x , x )] q ∑ i , j i , j q i j ( x , x ) ln ϕ ( x , x ) ∑ i , j ∈ E ∑ x i , j q i , j i , j i j i j H ( q ) − (∣ Nb ∣ − 1) H ( q ) ∑ i , j ∈ E ∑ i i , j i i marginals should be "valid" , q q a real distribution with these marginals should exist i , j i marginal polytope ( x , x ) = q ( x ) ∀ i , j ∈ E , x ∑ x i q i , j i j j j j for tree graphical models this local consistency is enough

Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j

Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j locally consistent ∑ x i ( x , x ) = q ( x ) ∀ i , j ∈ E , x q i , j i j j j j marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j q ( x ) = 1 ∀ i ∑ x i i i

Variational derivation of BP Variational derivation of BP arg max { q } ∑ i , j ∈ E H ( q ) − ∑ i (∣ Nb ∣ − 1) H ( q ) + ∑ i , j ∈ E ∑ x i , j ( x , x ) ln ϕ ( x , x ) q i , j i , j i , j i i i j i j locally consistent ∑ x i ( x , x ) = q ( x ) ∀ i , j ∈ E , x q i , j i j j j j marginal distributions ( x , x ) ≥ 0 ∀ i , j ∈ E , x , x q i , j i j i j q ( x ) = 1 ∀ i ∑ x i i i BP update is derived as "fixed-points" of the Lagrangian BP messages are the (exponential form of the) Lagrange multipliers

What happens if there are loops What happens if there are loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability

What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially

What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior)

What happens if there are What happens if there are loops loops? We can still apply BP update: ( x ) ∝ ∑ x i ( x , x ) j ∏ k ∈ Nb − j ( x ) δ ψ δ i → j i , j k → i j i k i proportional to normalize the message for numerical stability update the messages synchronously or sequentially may not converge (oscillating behavior) even when convergent only gives an approximation: ^ ( x ) ∝ ( x ) is not (proportional to) the exact marginal p ( x ) ∏ k ∈ Nb i p δ k → i i i i

Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes

Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i

Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i factor-to-variable message: ( x ) ∝ ψ ( x ) ( x ) ∑ x I − i I ∏ j ∈ I − i δ δ I → i j → I i I i

Loopy BP on Loopy BP on factor graphs factor graphs ψ {1,2,3} ψ {3,5} 1 ∏ I p ( x ) = ψ ( x ) factor nodes I I Z is a subset of variables I ⊆ {1, … , N } x 5 x 1 x 2 x 3 x 4 variable nodes variable-to-factor message: ( x ) ∝ ( x ) ∏ J ∣ i ∈ J , J ≠ I δ δ i → I J → i i i factor-to-variable message: ( x ) ∝ ψ ( x ) ( x ) ∑ x I − i I ∏ j ∈ I − i δ δ I → i j → I i I i after convergence: ^ ( x ) ∝ ( x ) ∏ J ∣ i ∈ J p δ J → i i i

Graphical Models Graphical Models Loopy BP and Bethe Free Energy - PowerPoint PPT Presentation

Graphical Models Graphical Models Loopy BP and Bethe Free Energy Siamak Ravanbakhsh Winter 2018 Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation So far... So far... exact

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

DO EDMONTON, Alberta T6E 6A5 Phone [780) 438-1460 F a x (780) 4 3 7 - 7 1 2 5 THURBER

CS 5220: More Sparse LA David Bindel 2017-10-26 1 Reminder: Conjugate Gradients What if we only

Wall-Crossing of D4/D2/D0 on the Conifold ( arXiv: 1007.2731 [hep-th] ) Takahiro Nishinaka (

Is the data missing at random? DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi Deep

Imperfect Banking Competition and Financial Stability Jiaqi Li Bank of Canada November 2020

The role of metabolic trade-offs in the establishment of biodiversity Stochastic Models in

Quivers, black holes and attractor indices Boris Pioline Conference "Quantum fields, knots

SBND Photon Detec-on System (PDS) Plan 9/02/2015 M. Toups

Graphical Models Graphical Models Loopy BP and Bethe Free Energy - PowerPoint PPT Presentation

Graphical Models Graphical Models Loopy BP and Bethe Free Energy Siamak Ravanbakhsh Winter 2018 Learning objective Learning objective loopy belief propagation its variational derivation: Bethe approximation So far... So far... exact

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

DO EDMONTON, Alberta T6E 6A5 Phone [780) 438-1460 F a x (780) 4 3 7 - 7 1 2 5 THURBER

CS 5220: More Sparse LA David Bindel 2017-10-26 1 Reminder: Conjugate Gradients What if we only

Wall-Crossing of D4/D2/D0 on the Conifold ( arXiv: 1007.2731 [hep-th] ) Takahiro Nishinaka (

Is the data missing at random? DEALIN G W ITH MIS S IN G DATA IN P YTH ON Suraj Donthi Deep

Imperfect Banking Competition and Financial Stability Jiaqi Li Bank of Canada November 2020

The role of metabolic trade-offs in the establishment of biodiversity Stochastic Models in

Quivers, black holes and attractor indices Boris Pioline Conference &quot;Quantum fields, knots

SBND Photon Detec-on System (PDS) Plan 9/02/2015 M. Toups

Graphical Models Graphical Models Relationship between the directed & undirected models

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Quivers, black holes and attractor indices Boris Pioline Conference "Quantum fields, knots