approximate inference mean field methods
play

Approximate Inference: Mean Field Methods Probabilistic Graphical - PDF document

School of Computer Science Approximate Inference: Mean Field Methods Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 17, Nov 12, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2


  1. School of Computer Science Approximate Inference: Mean Field Methods Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 17, Nov 12, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: KF-Chap. 12 Gene G Gene G X 7 X 7 X 7 X 8 X 8 X 8 Gene H Gene H 1 � Questions???? Kalman Filters � � Complex models LBP-Bethe Minimization � Eric Xing 2 1

  2. Approximate Inference Eric Xing 3 Variational Methods � For a distribution p (X | θ ) associated with a complex graph, computing the marginal (or conditional) probability of arbitrary random variable(s) is intractable � Variational methods formulating probabilistic inference as an optimization problem: � { } = * f arg max F ( f ) e.g. ∈ f S a (tractable ) probabilit y distributi on f : or, solutions to certain probabilis tic queries Eric Xing 4 2

  3. Exponential Family � Exponential representation of graphical models: 1 ⎧ ⎫ ∏ ∑ = ψ θ = θ φ − θ ⇒ ⎨ ⎬ p ( | ) exp ( ) A ( ) P ( ) ( ) X X X X α α D c c ⎩ α ⎭ Z α ∈ c C � Includes discrete models, Gaussian, Poisson, exponential, and many others = ∑ − θ φ E ( X ) ( X ) energy x is referred to as the of state α α D α α ⇒ { } θ = − − θ p ( | ) exp E ( ) A ( ) X X { } = − − θ, exp E ( , ) A ( ) X x x H E E Eric Xing 5 Example: the Boltzmann distribution on atomic lattice ⎧ ⎫ 1 ∑ ∑ = θ + θ ⎨ ⎬ p ( X ) exp X X X ij i j i 0 i Z ⎩ ⎭ < i j i Eric Xing 6 3

  4. Lower bounds of exponential functions 8 4 -2 -1 0 1 2 exp( x x ) µ x 1 ≥ − µ + exp( ) exp( )( ) 1 ( ) exp( x x ) x 3 3 x 2 6 x 1 ≥ µ − µ + − µ + − µ + exp( ) exp( ) ( ) ( ) ( ) 6 Eric Xing 7 Lower bounding likelihood Representing q ( X H ) by exp{- E ’( X H )} : Lemma : Every marginal distribution q (X H ) defines a lower bound of likelihood: ≥ ∫ { } ′ − p ( x ) d x exp E ( x ) E H H ( ( ) ) , 1 ′ − − − A ( ) E ( , ) E ( ) x x x x E H E H where x E denotes observed variables (evidence). Upgradeable to higher order bound [Leisink and Kappen, 2000] Upgradeable to higher order bound Eric Xing 8 4

  5. Lower bounding likelihood Representing q ( X H ) by exp{- E ’( X H )} : Lemma : Every marginal distribution q (X H ) defines a lower bound of likelihood: ∫ ≥ − − p ( ) C E ( , ) d q ( ) log q ( ) x X x x x x E H E H H H q ( X ) H = = − − + + C C E E H H , , q q q q where x E denotes observed variables (evidence). − E : expected energy E H : Gibbs free energy q q q H : entropy q Eric Xing 9 KL and variational (Gibbs) free energy � Kullback-Leibler Distance: q ( z ) ∑ ≡ KL ( q || p ) q ( z ) ln p ( z ) z � “Boltzmann’s Law” (definition of “energy”): = 1 [ ] − p ( z ) exp E ( z ) C ∑ ∑ ≡ + + KL ( q || p ) q ( z ) E ( z ) q ( z ) ln q ( z ) ln C z z Gibbs Free Energy ; G ( q ) = minimized when q ( Z ) p ( Z ) Eric Xing 10 5

  6. KL and Log Likelihood � Jensen’s inequality l θ = θ ( ; x ) log p ( x | ) ∑ = θ log p ( x , z | ) z θ p ( x , z | ) ∑ = log q ( z | x ) q ( z | x ) z θ p ( x , z | ) ∑ ⇒ l θ ≥ l θ + = L ≥ q ( z | x ) log ( ; x ) ( ; x , z ) H ( q ) q ( z | x ) c q q z � KL and Lower bound of likelihood KL( || ) KL( || ) q p q p θ θ p ( x , z | ) p ( x , z | ) ∑ l θ = θ = = ln ( ) ln ( ) p D p D ( ; x ) log p ( x | ) log q ( z ) log θ θ p ( z | x , ) p ( z | x , ) z L ( ) L ( ) q q θ p ( x , z | ) q ( z ) ∑ = q ( z ) log θ q ( z ) p ( z | x , ) z θ p ( x , z | ) q ( z ) ∑ ∑ = + q ( z ) log q ( z ) log ⇒ l θ = + L θ ( ; x ) ( q ) KL ( q || p ) q ( z ) p ( z | x , ) z z � Setting q ()= p ( z | x ) closes the gap (c.f. EM) Eric Xing 11 A variational representation of probability distributions { } = − + q arg max E H q q ∈ q Q { } = − arg min E H q ∈ q q Q where Q is the equivalent sets of realizable distributions, e.g ., all valid parameterizations of exponential family distributions, marginal polytopes [winright et al. 2003]. H q is intractable for general q Difficulty: “solution”: approximate H q and/or, relax or tighten Q Eric Xing 12 6

  7. Bethe Free Energy/LBP But we do not optimize q ( X ) explicitly, focus on the set of beliefs � = = τ = τ e.g ., b { b ( x , x ), b ( x )} � i , j i j i i Relax the optimization problem � H H q ≈ H b b = F ( b ) approximate objective: ( , ) � Betha i j i , { ∑ ∑ } relaxed feasible set: M → 0 x 1 x x x � = τ ≥ τ = τ = τ M M | ( M ⊇ ) , ( , ) ( ) ( M ) o i i j j o o x x i i { } = − * b arg min E F ( b ) ∈ M b b o � The loopy BP algorithm: a fixed point iteration procedure that tries to solve b* � Eric Xing 13 Mean field methods Optimize q ( X H ) in the space of tractable families � i.e ., subgraph of G p over which exact computation of H q is � feasible � Tightening the optimization space H exact objective: � q Q → T T ⊆ tightened feasible set: Q � ( ) = − * q arg min ∈ E H q q T q Eric Xing 14 7

  8. Mean Field Approximation Mean Field Approximation Eric Xing 15 Cluster-based approx. to the Gibbs free energy (Wiegerinck 2001, Xing et al 03,04) Exact: G [ p ( X )] (intractable) G [{ q c X ( )}] Clusters: c Eric Xing 16 8

  9. Mean field approx. to Gibbs free energy � Given a disjoint clustering, {C 1 , … , C I }, of all variables ∏ � Let = q ( ) q ( ), X X C i i i � Mean-field free energy ( ) ( ) ( ) ∑∑∏ ∑∑ = + G q x E ( x ) q x ln q x MF i C C i C i C i i i i i i x i x C C i i ( ) ( ) ∑∑ ∑∑ ( ) ∑∑ ( ) ( ) = φ + φ + e.g., G q x q x ( x x ) q x ( x ) q x ln q x (naïve mean field) MF i j i j i i i i < i j x x i x i x i j i i Will never equal to the exact Gibbs free energy no matter what � clustering is used, but it does always define a lower bound of the likelihood � Optimize each q i ( x c ) 's. � Variational calculus … � Do inference in each q i ( x c ) using any tractable algorithm Eric Xing 17 The Generalized Mean Field theorem Theorem: The optimum GMF approximation to the cluster marginal is isomorphic to the cluster posterior of the original distribution given internal evidence and its generalized mean fields: = * q ( ) p ( | , ) X X x X i H , C H , C E , C H , MB i i i i q ≠ j i GMF algorithm: Iterate over each q i Eric Xing 18 9

  10. A generalized mean field algorithm [xing et al . UAI 2003] Eric Xing 19 A generalized mean field algorithm [xing et al . UAI 2003] Eric Xing 20 10

  11. Convergence theorem Theorem: The GMF algorithm is guaranteed to converge to a local optimum, and provides a lower bound for the likelihood of evidence (or partition function) the model. Eric Xing 21 The naive mean field approximation � Approximate p ( X ) by fully factorized q ( X ) = P i q i ( X i ) � For Boltzmann distribution p ( X ) = exp{ ∑ i < j q ij X i X j +q io X i } /Z : mean field equation: Gibbs predictive distribution: ⎧ ⎧ ⎫ ⎫ ⎪ ⎪ ⎪ ⎪ ∑ ∑ p q X X x X X X X X X x A A = = θ 0 θ 0 + + θ θ + + x ⎨ ⎨ ⎬ ⎬ ( ( | ) exp ) exp i i i i i i i i ij i ij i j j j i i − j ⎪ ⎪ ⎪ ⎪ q q ⎩ ⎩ ⎭ ⎭ X i X i j j j ∈ j ∈ N N i i p X x j p = X X X j j ∈ = 〈 〈 〉 〉 ∈ ∈ x ( | { : N }) N N ( | { { : : } }) i j i i j j j i i q q j j � ℑ x j 〉 q j X resembles a “message” sent from node j to i j q j � { 〈 x j 〉 q j : j ∈ N i } 〈 〉 ∈ forms the “mean field” applied to X i from its neighborhood N { X : j } j q i j Eric Xing 22 11

  12. Generalized MF approximation to Ising models Cluster marginal of a square block C k : ⎧ ⎫ ⎪ ⎪ ∑ ∑ ∑ ∝ θ + θ + θ ⎨ ⎬ q ( X ) exp X X X X X 0 C ij i j i i ij i j k q ( X ) ⎪ ⎪ ∈ ∈ C k ' i , j C i C ∈ ∈ i C , j MB , ⎩ ⎭ k k k k ∈ k ' MBC k Virtually a reparameterized Ising model of small size. Eric Xing 23 GMF approximation to Ising models GMF 4x4 GMF 2x2 BP Attractive coupling: positively weighted Repulsive coupling: negatively weighted Eric Xing 24 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend