bayesian networks approximate inference
play

Bayesian networks: approximate inference Machine Intelligence - PowerPoint PPT Presentation

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Approximate Inference Motivation Because of the (worst-case) intractability of exact inference in


  1. Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25

  2. Approximate Inference Motivation Because of the (worst-case) intractability of exact inference in Bayesian networks, try to find more efficient approximate inference techniques: Instead of computing exact posterior P ( A | E = e ) compute approximation P ( A | E = e ) ˆ with P ( A | E = e ) ∼ P ( A | E = e ) ˆ Approximative inference September 2008 2 / 25

  3. Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . Approximative inference September 2008 3 / 25

  4. Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . p is approximation for p with relative error ≤ ǫ , if ˆ p / p |≤ ǫ, i.e. ˆ p ∈ [ p ( 1 − ǫ ) , p ( 1 + ǫ )] . | 1 − ˆ Approximative inference September 2008 3 / 25

  5. Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . p is approximation for p with relative error ≤ ǫ , if ˆ p / p |≤ ǫ, i.e. ˆ p ∈ [ p ( 1 − ǫ ) , p ( 1 + ǫ )] . | 1 − ˆ This definition is not always fully satisfactory, because it is not symmetric in p and ˆ p and not invariant under the transition p → ( 1 − p ) , ˆ p → ( 1 − ˆ p ) . Use with care! p 1 , ˆ p 2 are approximations for p 1 , p 2 with absolute error ≤ ǫ , then no error bounds follow for When ˆ p 1 / ˆ p 2 as an approximation for p 1 / p 2 . ˆ p 1 , ˆ p 2 are approximations for p 1 , p 2 with relative error ≤ ǫ , then ˆ p 1 / ˆ p 2 approximates p 1 / p 2 When ˆ with relative error ≤ ( 2 ǫ ) / ( 1 + ǫ ) . Approximative inference September 2008 3 / 25

  6. Approximate Inference Randomized Methods Most methods for approximate inference are randomized algorithms that compute approximations P from random samples of instantiations. ˆ We shall consider: Forward sampling Likelihood weighting Gibbs sampling Metropolis Hastings algorithm Approximative inference September 2008 4 / 25

  7. Approximate Inference Forward Sampling Observation: can use Bayesian network as random generator that produces full instantiations V = v according to distribution P ( V ) . Example: A A t f .2 .8 - Generate random numbers r 1 , r 2 uniformly from [0,1]. - Set A = t if r 1 ≤ . 2 and A = f else. B - Depending on the value of A and r 2 set B B A t f to t or f . t .7 .3 f .4 .6 Generation of one random instantiation: linear in size of network. Approximative inference September 2008 5 / 25

  8. Approximate Inference Sampling Algorithm Thus, we have a randomized algorithm S that produces possible outputs from sp ( V ) according to the distribution P ( V ) . Define P ( A = a | E = e ) := |{ i ∈ 1 , . . . , N | E = e , A = a in S i }| ˆ |{ i ∈ 1 , . . . , N | E = e in S i }| Approximative inference September 2008 6 / 25

  9. Approximate Inference Forward Sampling: Illustration Sample with not E = e E = e , A � = a E = e , A = a # Approximation for P ( A = a | E = e ) : # ∪ Approximative inference September 2008 7 / 25

  10. Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . Approximative inference September 2008 8 / 25

  11. Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . A tempting approach: Fix the variables in E to e and sample from the nonevidence variables only! Approximative inference September 2008 8 / 25

  12. Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . A tempting approach: Fix the variables in E to e and sample from the nonevidence variables only! Problem: Only evidence from the ancestors are taken into account! Approximative inference September 2008 8 / 25

  13. Approximate Inference Likelihood weighting We would like to sample from ( pa ( X ) ′′ are the parents in E ) P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) × P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) , P ( U , e ) = Y Y X ∈U\ E X ∈ E but by applying forward sampling with fixed E we actually sample from: P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y Sampling distribution = X ∈U\ E Approximative inference September 2008 9 / 25

  14. Approximate Inference Likelihood weighting We would like to sample from ( pa ( X ) ′′ are the parents in E ) P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) × P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) , P ( U , e ) = Y Y X ∈U\ E X ∈ E but by applying forward sampling with fixed E we actually sample from: P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y Sampling distribution = X ∈U\ E Solution: Instead of letting each sample count as 1, use w ( x , e ) = P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y X ∈ E Approximative inference September 2008 9 / 25

  15. Approximate Inference Likelihood weighting: example A A - Assume evidence B = t . t f .2 .8 - Generate a random number r uniformly from [0,1]. - Set A = t if r ≤ . 2 and A = f else. B - If A = t then let the sample count as B A t f w ( t , t ) = 0 . 7; otherwise w ( f , t ) = 0 . 4. t .7 .3 f .4 .6 Approximative inference September 2008 10 / 25

  16. Approximate Inference Likelihood weighting: example A A - Assume evidence B = t . t f .2 .8 - Generate a random number r uniformly from [0,1]. - Set A = t if r ≤ . 2 and A = f else. B - If A = t then let the sample count as B A t f w ( t , t ) = 0 . 7; otherwise w ( f , t ) = 0 . 4. t .7 .3 f .4 .6 With N samples ( a 1 , . . . , a N ) we get P N i = 1 w ( a i = t , e ) P ( A = t | B = t ) = ˆ . P N i = 1 ( w ( a i = t , e ) + w ( a i = f , e )) Approximative inference September 2008 10 / 25

  17. Approximate Inference Gibbs Sampling For notational convenience assume from now on that for some l : E = V l + 1 , V l + 2 , . . . , V n . Write W for V 1 , . . . , V l . Principle: obtain new sample from previous sample by randomly changing the value of only one selected variable. Procedure Gibbs sampling v 0 = ( v 0 , 1 , . . . , v 0 , l ) := arbitrary instantiation of W i := 1 repeat forever choose V k ∈ W # deterministic or randomized generate randomly v i , k according to distribution P ( V k | V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) set v i = ( v i − 1 , 1 , . . . v i − 1 , k − 1 , v i , k , v i − 1 , k + 1 , . . . , v i − 1 , l ) i := i + 1 Approximative inference September 2008 11 / 25

  18. Approximate Inference Illustration The process of Gibbs sampling can be understood as a random walk in the space of all instantiations with E = e : Reachable in one step: instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variable V k ). Approximative inference September 2008 12 / 25

  19. Approximate Inference Implementation of Sampling Step The sampling step generate randomly v i , k according to distribution P ( V k | V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) requires sampling from a conditional distribution. In this special case (all but one variables are instantiated) this is easy: just need to compute for each v ∈ sp ( V k ) the probability P ( V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k = v , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) (linear in network size), and choose v i , k according to these probabilities (normalized). This can be further simplified by computing the distribution on sp ( V k ) only in the Markov blanket of V k , i.e. the subnetwork consisting of V k , its parents, its children, and the parents of its children. Approximative inference September 2008 13 / 25

  20. Approximate Inference Convergence of Gibbs Sampling Under certain conditions: the distribution of samples converges to the posterior distribution P ( W | E = e ) : i →∞ P ( v i = v ) = P ( W = v | E = e ) ( v ∈ sp ( W )) . lim Sufficient conditions are: in the repeat loop of the Gibbs sampler, variable V k is randomly selected (with non-zero selection probability for all V k ∈ W ), and the Bayesian network has no zero-entries in its cpt’s Approximative inference September 2008 14 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend