Bayesian networks: approximate inference Machine Intelligence - PowerPoint PPT Presentation

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25

Approximate Inference Motivation Because of the (worst-case) intractability of exact inference in Bayesian networks, try to find more efficient approximate inference techniques: Instead of computing exact posterior P ( A | E = e ) compute approximation P ( A | E = e ) ˆ with P ( A | E = e ) ∼ P ( A | E = e ) ˆ Approximative inference September 2008 2 / 25

Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . Approximative inference September 2008 3 / 25

Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . p is approximation for p with relative error ≤ ǫ , if ˆ p / p |≤ ǫ, i.e. ˆ p ∈ [ p ( 1 − ǫ ) , p ( 1 + ǫ )] . | 1 − ˆ Approximative inference September 2008 3 / 25

Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . p is approximation for p with relative error ≤ ǫ , if ˆ p / p |≤ ǫ, i.e. ˆ p ∈ [ p ( 1 − ǫ ) , p ( 1 + ǫ )] . | 1 − ˆ This definition is not always fully satisfactory, because it is not symmetric in p and ˆ p and not invariant under the transition p → ( 1 − p ) , ˆ p → ( 1 − ˆ p ) . Use with care! p 1 , ˆ p 2 are approximations for p 1 , p 2 with absolute error ≤ ǫ , then no error bounds follow for When ˆ p 1 / ˆ p 2 as an approximation for p 1 / p 2 . ˆ p 1 , ˆ p 2 are approximations for p 1 , p 2 with relative error ≤ ǫ , then ˆ p 1 / ˆ p 2 approximates p 1 / p 2 When ˆ with relative error ≤ ( 2 ǫ ) / ( 1 + ǫ ) . Approximative inference September 2008 3 / 25

Approximate Inference Randomized Methods Most methods for approximate inference are randomized algorithms that compute approximations P from random samples of instantiations. ˆ We shall consider: Forward sampling Likelihood weighting Gibbs sampling Metropolis Hastings algorithm Approximative inference September 2008 4 / 25

Approximate Inference Forward Sampling Observation: can use Bayesian network as random generator that produces full instantiations V = v according to distribution P ( V ) . Example: A A t f .2 .8 - Generate random numbers r 1 , r 2 uniformly from [0,1]. - Set A = t if r 1 ≤ . 2 and A = f else. B - Depending on the value of A and r 2 set B B A t f to t or f . t .7 .3 f .4 .6 Generation of one random instantiation: linear in size of network. Approximative inference September 2008 5 / 25

Approximate Inference Sampling Algorithm Thus, we have a randomized algorithm S that produces possible outputs from sp ( V ) according to the distribution P ( V ) . Define P ( A = a | E = e ) := |{ i ∈ 1 , . . . , N | E = e , A = a in S i }| ˆ |{ i ∈ 1 , . . . , N | E = e in S i }| Approximative inference September 2008 6 / 25

Approximate Inference Forward Sampling: Illustration Sample with not E = e E = e , A � = a E = e , A = a # Approximation for P ( A = a | E = e ) : # ∪ Approximative inference September 2008 7 / 25

Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . Approximative inference September 2008 8 / 25

Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . A tempting approach: Fix the variables in E to e and sample from the nonevidence variables only! Approximative inference September 2008 8 / 25

Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . A tempting approach: Fix the variables in E to e and sample from the nonevidence variables only! Problem: Only evidence from the ancestors are taken into account! Approximative inference September 2008 8 / 25

Approximate Inference Likelihood weighting We would like to sample from ( pa ( X ) ′′ are the parents in E ) P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) × P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) , P ( U , e ) = Y Y X ∈U\ E X ∈ E but by applying forward sampling with fixed E we actually sample from: P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y Sampling distribution = X ∈U\ E Approximative inference September 2008 9 / 25

Approximate Inference Likelihood weighting We would like to sample from ( pa ( X ) ′′ are the parents in E ) P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) × P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) , P ( U , e ) = Y Y X ∈U\ E X ∈ E but by applying forward sampling with fixed E we actually sample from: P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y Sampling distribution = X ∈U\ E Solution: Instead of letting each sample count as 1, use w ( x , e ) = P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y X ∈ E Approximative inference September 2008 9 / 25

Approximate Inference Likelihood weighting: example A A - Assume evidence B = t . t f .2 .8 - Generate a random number r uniformly from [0,1]. - Set A = t if r ≤ . 2 and A = f else. B - If A = t then let the sample count as B A t f w ( t , t ) = 0 . 7; otherwise w ( f , t ) = 0 . 4. t .7 .3 f .4 .6 Approximative inference September 2008 10 / 25

Approximate Inference Likelihood weighting: example A A - Assume evidence B = t . t f .2 .8 - Generate a random number r uniformly from [0,1]. - Set A = t if r ≤ . 2 and A = f else. B - If A = t then let the sample count as B A t f w ( t , t ) = 0 . 7; otherwise w ( f , t ) = 0 . 4. t .7 .3 f .4 .6 With N samples ( a 1 , . . . , a N ) we get P N i = 1 w ( a i = t , e ) P ( A = t | B = t ) = ˆ . P N i = 1 ( w ( a i = t , e ) + w ( a i = f , e )) Approximative inference September 2008 10 / 25

Approximate Inference Gibbs Sampling For notational convenience assume from now on that for some l : E = V l + 1 , V l + 2 , . . . , V n . Write W for V 1 , . . . , V l . Principle: obtain new sample from previous sample by randomly changing the value of only one selected variable. Procedure Gibbs sampling v 0 = ( v 0 , 1 , . . . , v 0 , l ) := arbitrary instantiation of W i := 1 repeat forever choose V k ∈ W # deterministic or randomized generate randomly v i , k according to distribution P ( V k | V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) set v i = ( v i − 1 , 1 , . . . v i − 1 , k − 1 , v i , k , v i − 1 , k + 1 , . . . , v i − 1 , l ) i := i + 1 Approximative inference September 2008 11 / 25

Approximate Inference Illustration The process of Gibbs sampling can be understood as a random walk in the space of all instantiations with E = e : Reachable in one step: instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variable V k ). Approximative inference September 2008 12 / 25

Approximate Inference Implementation of Sampling Step The sampling step generate randomly v i , k according to distribution P ( V k | V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) requires sampling from a conditional distribution. In this special case (all but one variables are instantiated) this is easy: just need to compute for each v ∈ sp ( V k ) the probability P ( V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k = v , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) (linear in network size), and choose v i , k according to these probabilities (normalized). This can be further simplified by computing the distribution on sp ( V k ) only in the Markov blanket of V k , i.e. the subnetwork consisting of V k , its parents, its children, and the parents of its children. Approximative inference September 2008 13 / 25

Approximate Inference Convergence of Gibbs Sampling Under certain conditions: the distribution of samples converges to the posterior distribution P ( W | E = e ) : i →∞ P ( v i = v ) = P ( W = v | E = e ) ( v ∈ sp ( W )) . lim Sufficient conditions are: in the repeat loop of the Gibbs sampler, variable V k is randomly selected (with non-zero selection probability for all V k ∈ W ), and the Bayesian network has no zero-entries in its cpt’s Approximative inference September 2008 14 / 25

Bayesian networks: approximate inference Machine Intelligence - PowerPoint PPT Presentation

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Approximate Inference Motivation Because of the (worst-case) intractability of exact inference in

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Complexity Results for MCMC derived from Quantitative Bounds Jun Yang (joint work with Jeffrey

STAT 339 Hidden Markov Models III 21 April 2017 Bayesian Estimation / Model Averaging Outline

Part 3: Probabilistic Inference in Graphical Models Sebastian Nowozin and Christoph H. Lampert

Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , 2016 Dimitris Berberidis 1

Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC Berkeley with Michael I.

Hierarchical Modeling Hierarchical modeling has taken over the landscape in contemporaery