Slides Set 9(part b): Sampling Techniques for Probabilistic and - PowerPoint PPT Presentation

Reasoning with graphical models Slides Set 9(part b): Sampling Techniques for Probabilistic and Deterministic Graphical models Rina Dechter (Reading” Darwiche chapter 15, cutset-sampling paper posted) slides 9b 276 2020

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling slides 9b 276 2020

Markov Chain x 1 x 2 x 3 x 4 • A Markov chain is a discrete random process with the property that the next state depends only on the current state ( Markov Property ) :    t 1 2 t 1 t t 1 P ( x | x , x ,..., x ) P ( x | x ) • If P(X t |x t-1 ) does not depend on t ( time homogeneous ) and state space is finite, then it is often expressed as a transition function (aka  transition matrix )   P ( X x ) 1 x slides 9b 276 2020

Example: Drunkard’s Walk • a random walk on the number line where, at each step, the position may change by +1 or −1 with equal probability 1 2 1 2 3   P ( n 1 ) P ( n 1 )  D ( X ) { 0 , 1 , 2 ,...} n 0 . 5 0 . 5 transition matrix P(X) slides 9b 276 2020

Example: Weather Model rain rain rain sun rain  D ( X ) { rainy , sunny } P ( rainy ) P ( sunny ) rainy 0 . 9 0 . 1 sunny 0 . 5 0 . 5 transition matrix P(X) slides 9b 276 2020

Multi-Variable System   X { X , X , X }, D ( X ) discrete , finite 1 2 3 i • state is an assignment of values to all the variables t t+1 x 1 x 1 t t+1 x 2 x 2 t t+1 x 3 x 3 x  t t t t { x , x ,..., x } 1 2 n slides 9b 276 2020

Bayesian Network System • Bayesian Network is a representation of the joint probability distribution over 2 or more variables t t+1 X 1 x 1 X 1 t t+1 X 2 x 2 X 2 X 3 t t+1 X 3 x 3 X  { X , X , X } x  t t t t { x , x , x } 1 2 3 1 2 3 slides 9b 276 2020

Stationary Distribution Existence • If the Markov chain is time-homogeneous, then the vector  (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point”), if its entries sum up to 1 and satisfy:     ( x ) ( x ) P ( x | x ) i j i j  x D ( X ) i • Finite state space Markov chain has a unique stationary distribution if and only if: – The chain is irreducible – All of its states are positive recurrent slides 9b 276 2020

Irreducible • A state x is irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps • If one state is irreducible, then all the states must be irreducible (Liu, Ch. 12, pp. 249, Def. 12.1.1) slides 9b 276 2020

Recurrent • A state x is recurrent if the chain returns to x with probability 1 • Let M( x ) be the expected number of steps to return to state x • State x is positive recurrent if M( x ) is finite The recurrent states in a finite state chain are positive recurrent . slides 9b 276 2020

Stationary Distribution Convergence • Consider infinite Markov chain:   ( n ) n 0 0 n P P ( x | x ) P P • If the chain is both irreducible and aperiodic , then:   ( n ) lim P   n • Initial state is not important in the limit “The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…” (Liu, Ch. 12.1) slides 9b 276 2020

Aperiodic • Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for  i , then chain is aperiodic • Positive recurrent, aperiodic states are ergodic slides 9b 276 2020

Markov Chain Monte Carlo • How do we estimate P(X) , e.g., P(X|e) ? • Generate samples that form Markov Chain with stationary distribution  =P(X|e) • Estimate  from samples (observed states): visited states x 0 ,…,x n can be viewed as “samples” from distribution  1 T     t ( x ) ( x , x ) T  t 1    lim ( x )   T slides 9b 276 2020

MCMC Summary • Convergence is guaranteed in the limit • Initial state is not important, but… typically, we throw away first K samples - “ burn-in ” • Samples are dependent, not i.i.d. • Convergence ( mixing rate ) may be slow • The stronger correlation between states, the slower convergence! slides 9b 276 2020

Gibbs Sampling (Geman&Geman,1984) • Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables • Sample new variable value one variable at a time from the variable’s conditional distribution:   t t t t t P ( X ) P ( X | x ,.., x , x ,..., x } P ( X | x \ x )   i i 1 i 1 i 1 n i i • Samples form a Markov chain with stationary distribution P(X|e) slides 9b 276 2020

Gibbs Sampling: Illustration The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables X i ). slides 9b 276 2020

Ordered Gibbs Sampler Generate sample x t+1 from x t :    t 1 t t t X x P ( X | x , x ,..., x , e ) 1 1 1 2 3 N Process     t 1 t 1 t t All X x P ( X | x , x ,..., x , e ) 2 2 2 1 3 N Variables ... In Some Order       t 1 t 1 t 1 t 1 X x P ( X | x , x ,..., x , e )  N N N 1 2 N 1 In short, for i=1 to N:    t 1 t X x sampled from P ( X | x \ x , e ) i i i i slides 9b 276 2020

Transition Probabilities in BN Given Markov blanket (parents, children, and their parents), X i is independent of all other nodes X i Markov blanket :   markov ( X ) pa ch ( pa )   U U U i i i j  X j ch j  t t P ( X | x \ x ) P ( X | markov ) : i i i i   t P ( x | x \ x ) P ( x | pa ) P ( x | pa ) i i i i j j  X j ch i Computation is linear in the size of Markov blanket! slides 9b 276 2020

Ordered Gibbs Sampling Algorithm (Pearl,1988) Input: X, E=e Output: T samples {x t } Fix evidence E=e, initialize x 0 at random 1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) t+1  P(X i | markov i t ) 3. x i 4. End For 5. End For slides 9b 276 2020

Gibbs Sampling Example - BN   X { X , X ,..., X }, E { X } 1 2 9 9 X 1 = x 1 0 X1 X3 X6 X 6 = x 6 0 X 2 = x 2 0 X2 X5 X8 X 7 = x 7 0 X 3 = x 3 0 X9 X 8 = x 8 0 X4 X7 X 4 = x 4 0 X 5 = x 5 0 slides 9b 276 2020

Gibbs Sampling Example - BN   X { X , X ,..., X }, E { X } 1 2 9 9 X1 X3 X6 x  1 0 0 P ( X | x ,..., x , x ) 1 1 2 8 9 X2 X5 X8 x  1 1 0 P ( X | x ,..., x , x ) 2 2 1 8 9  X9 X4 X7 slides 9b 276 2020

Answering Queries P(x i |e) = ? • Method 1 : count # of samples where X i = x i ( histogram estimator ): Dirac delta f-n 1 T     t P ( X x ) ( x , x ) i i i T  t 1 • Method 2 : average probability ( mixture estimator ): 1 T     t P ( X x ) P ( X x | markov ) i i i i i T  t 1 • Mixture estimator converges faster slides 9b 276 2020

Importance vs. Gibbs ˆ  t x P ( X | e ) Gibbs: ˆ       T P ( X | e ) P ( X | e ) 1 T   ˆ t g ( X ) g ( x ) T  t 1  t Importance: X Q ( X | e ) w t t t T 1 g ( x ) P ( x )   g t T Q ( x )  t 1 slides 9b 276 2020

Gibbs Sampling: Convergence • Sample from ` P(X|e)  P(X|e) • Converges iff chain is irreducible and ergodic • Intuition - must be able to explore all states: – if X i and X j are strongly correlated, X i =0  X j =0, then, we cannot explore states with X i =1 and X j =1 • All conditions are satisfied when all probabilities are positive • Convergence rate can be characterized by the second eigen-value of transition matrix slides 9b 276 2020

Gibbs: Speeding Convergence Reduce dependence between samples (autocorrelation) • Skip samples • Randomize Variable Sampling Order • Employ blocking (grouping) • Multiple chains Reduce variance (cover in the next section) slides 9b 276 2020

Blocking Gibbs Sampler • Sample several variables together, as a block • Example: Given three variables X,Y,Z , with domains of size 2, group Y and Z together to form a variable W ={ Y,Z } with domain size 4. Then, given sample ( x t , y t , z t ), compute next sample:    t 1 t t t x P ( X | y , z ) P ( w )       t 1 t 1 t 1 t 1 ( y , z ) w P ( Y , Z | x ) + Can improve convergence greatly when two variables are strongly correlated! - Domain of the block variable grows exponentially with the #variables in a block! slides 9b 276 2020

Slides Set 9(part b): Sampling Techniques for Probabilistic and - PowerPoint PPT Presentation

Reasoning with graphical models Slides Set 9(part b): Sampling Techniques for Probabilistic and Deterministic Graphical models Rina Dechter (Reading Darwiche chapter 15, cutset-sampling paper posted) slides 9b 276 2020 Overview 1.

MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Knape &Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

6 KEYNOTE ADDRESS SLIDES 7 KEYNOTE ADDRESS SLIDES 8 KEYNOTE ADDRESS SLIDES 9 KEYNOTE ADDRESS

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

set list tuple set set() Sets methods .intersection() .union() .difference() set sets

Data Visualization Jeffrey Heer Stanford University Set A Set B Set C Set D X Y X Y X Y

Set Theory Supartha Podder uOttawa Set Theory A set is an unordered collection of objects

STRIPS Planning Set of operators , where each operator has Set of parameters Set

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

ARITHMETIC, SET THEORY, AND THEIR MODELS PART ONE: END EXTENSIONS Ali Enayat YOUNG SET THEORY

ARITHMETIC, SET THEORY, AND THEIR MODELS PART TWO: ENDOMORPHISMS Ali Enayat YOUNG SET THEORY

ECE 566: Grid Integration of Wind Energy Systems S. Suryanarayanan Associate Professor ECE

CSE 473: Artificial Intelligence Constraint Satisfaction Luke Zettlemoyer Multiple slides

650 MHz couplers for PIP-II Sergey Kazakov, June 25, 2018, CEA, Paris PIP-II Fine Tuning

Linear System of Equations - Conditioning Numerical experiments Input has uncertainties:

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65 1. Relationships between

Efficient Weight Learning for Markov Logic Networks Speaker Manuel Noll Advisor Maximilian

S Graphics Paul Murrell paul@stat.auckland.ac.nz The University of Auckland S Graphics

Cartographic Papers covered Temporally Varying Georeferenced Statistics MacEachren et al. (1998)

Slides Set 9(part b): Sampling Techniques for Probabilistic and - PowerPoint PPT Presentation

Reasoning with graphical models Slides Set 9(part b): Sampling Techniques for Probabilistic and Deterministic Graphical models Rina Dechter (Reading Darwiche chapter 15, cutset-sampling paper posted) slides 9b 276 2020 Overview 1.

MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Knape &amp;Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

6 KEYNOTE ADDRESS SLIDES 7 KEYNOTE ADDRESS SLIDES 8 KEYNOTE ADDRESS SLIDES 9 KEYNOTE ADDRESS

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

set list tuple set set() Sets methods .intersection() .union() .difference() set sets

Data Visualization Jeffrey Heer Stanford University Set A Set B Set C Set D X Y X Y X Y

Set Theory Supartha Podder uOttawa Set Theory A set is an unordered collection of objects

STRIPS Planning Set of operators , where each operator has Set of parameters Set

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

ARITHMETIC, SET THEORY, AND THEIR MODELS PART ONE: END EXTENSIONS Ali Enayat YOUNG SET THEORY

ARITHMETIC, SET THEORY, AND THEIR MODELS PART TWO: ENDOMORPHISMS Ali Enayat YOUNG SET THEORY

ECE 566: Grid Integration of Wind Energy Systems S. Suryanarayanan Associate Professor ECE

CSE 473: Artificial Intelligence Constraint Satisfaction Luke Zettlemoyer Multiple slides

650 MHz couplers for PIP-II Sergey Kazakov, June 25, 2018, CEA, Paris PIP-II Fine Tuning

Linear System of Equations - Conditioning Numerical experiments Input has uncertainties:

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65 1. Relationships between

Efficient Weight Learning for Markov Logic Networks Speaker Manuel Noll Advisor Maximilian

S Graphics Paul Murrell paul@stat.auckland.ac.nz The University of Auckland S Graphics

Cartographic Papers covered Temporally Varying Georeferenced Statistics MacEachren et al. (1998)

Knape &Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides