Monte Carlo methods Draw random samples from the desired - PDF document

School of Computer Science Approximate Inference: Monte Carlo Inference Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 18, Nov 19, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap. 1, KF-Chap. 11 Gene G Gene G X 7 X 7 X 7 X 8 X 8 X 8 Gene H Gene H 1 Monte Carlo methods � Draw random samples from the desired distribution � Yield a stochastic representation of a complex distribution marginals and other expections can be approximated using sample- � based averages 1 N ∑ f x f x = t ( ) [ ( )] ( ) E N t = 1 � Asymptotically exact and easy to apply to arbitrary models � Challenges: how to draw samples from a given dist. (not all distributions can be � trivially sampled)? how to make better use of the samples (not all sample are useful, or � eqally useful, see an example later)? how to know we've sampled enough? � Eric Xing 2 1

Example: naive sampling � Construct samples according to probabilities given in a BN. E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 E0 E0 B0 B0 A0 A0 M0 M0 J1 J1 E0 B0 A0 M0 J0 E0 B0 A0 M0 J0 E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 E1 E1 B0 B0 A1 A1 M1 M1 J1 J1 Alarm example: (Choose the right sampling sequence) E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 1) Sampling:P(B)=<0.001, 0.999> suppose it is false, B0. Same for E0. P(A|B0, E0)=<0.001, 0.999> suppose E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 it is false... 2) Frequency counting: In the samples right, E0 E0 B0 B0 A0 A0 M0 M0 J0 J0 P(J|A0)=P(J,A0)/P(A0)=<1/9, 8/9>. Eric Xing 3 Example: naive sampling � Construct samples according to probabilities given in a BN. E0 B0 A0 M0 J0 Alarm example: (Choose the right sampling E0 B0 A0 M0 J0 sequence) E0 B0 A0 M0 J1 3) what if we want to compute P(J|A1) ? E0 B0 A0 M0 J0 we have only one sample ... P(J|A1)=P(J,A1)/P(A1)=<0, 1>. E0 B0 A0 M0 J0 4) what if we want to compute P(J|B1) ? E0 B0 A0 M0 J0 No such sample available! E1 B0 A1 M1 J1 P(J|A1)=P(J,B1)/P(B1) can not be defined. E0 B0 A0 M0 J0 For a model with hundreds or more variables, E0 B0 A0 M0 J0 rare events will be very hard to garner evough samples even after a long time or sampling ... E0 B0 A0 M0 J0 Eric Xing 4 2

Monte Carlo methods (cond.) � Direct Sampling � We have seen it. � Very difficult to populate a high-dimensional state space � Rejection Sampling Create samples like direct sampling, only count samples which is � consistent with given evidences. � Likelihood weighting, ... Sample variables and calculate evidence weight. Only create the � samples which support the evidences. � Markov chain Monte Carlo (MCMC) Metropolis-Hasting � � Gibbs Eric Xing 5 Rejection sampling � Suppose we wish to sample from dist. Π ( X )= Π ' ( X )/Z. Π ( X ) is difficult to sample, but Π ' ( X ) is easy to evaluate � Sample from a simpler dist Q ( X ) � Rejection sampling � x Q X x x kQ x Π * * * * ~ ( ), accept w.p. ' ( ) / ( ) Correctness: � [ ] Π x kQ x Q x ' ( ) / ( ) ( ) p x = ( ) [ ] ∫ x kQ x Q x dx Π ' ( ) / ( ) ( ) x Π ' ( ) = = Π x ( ) ∫ x dx Π ' ( ) � Pitfall … Eric Xing 6 3

Rejection sampling Pitfall: � Using Q= N ( µ , σ q I ) to sample P= N ( µ , σ p I ) � If σ q exceeds σ p by 1%, and dimensional=1000, � The optimal acceptance rate k=( σ q / σ p ) d ≈ 1/20,000 � � Big waste of samples! � Adaptive rejection sampling Using envelope functions to define Q � Eric Xing 7 Unnormalized importance sampling � Suppose sampling from P (·) is hard. � Suppose we can sample from a "simpler" proposal distribution Q (·) instead. � If Q dominates P (i.e., Q ( x ) > 0 whenever P ( x ) > 0), we can sample from Q and reweight: ∫ X = x P x dx f ( ) f ( ) ( ) P x ( ) ∫ x Q x dx = ( ) ( ) f Q x ( ) 1 P x m ( ) ∑ x m x m Q X ≈ f ( ) where ~ ( ) M Q x m ( ) m 1 ∑ x w = m m f ( ) M m Eric Xing 8 4

Normalized importance sampling � Suppose we can only evaluate P' ( x ) = α P ( x ) (e.g. for an MRF). � We can get around the nasty normalization constant α as follows: P x P x ' ( ) ' ( ) r X r X ∫ Q x dx ∫ P x dx = ⇒ = = = α Let ( ) ( ) ( ) ' ( ) � Q x Q Q x ( ) ( ) � Now 1 P x ' ( ) X ∫ x P x dx ∫ x Q x dx = = ( ) ( ) ( ) ( ) ( ) f f f Q x P α ( ) ∫ x r x Q x dx f ( ) ( ) ( ) = ∫ r x Q x dx ( ) ( ) ∑ x m r m ( ) f m x m Q X ≈ where ~ ( ) ∑ r m m r m ∑ x m w m w m = = f ( ) whe re ∑ r m m m Eric Xing 9 Normalized vs unnormalized importance sampling Unormalized importance sampling is unbiased: � [ ] = ( ) ( ) E Q f X w X Normalized importance sampling is biased, eg for M = 1: � ⎡ 1 1 ⎤ f ( x ) w ( x ) = E Q ⎢ ⎥ 1 ⎣ ⎦ w ( x ) However, the variance of the normalized importance sampler is � usually lower in practice. Also, it is common that we can evaluate P'(x) but not P(x), e.g. � P(x|e) = P'(x, e)/P(e) for Bayes net, or P(x) = P'(x)/Z for MRF. Eric Xing 10 5

Likelhood weighting We now apply normalized importance sampling to a Bayes net. � The proposal Q is gotten from the mutilated BN where we clamp � evidence nodes, and cut their incoming arcs. Call this P M . The unnormalized posterior is P'(x) = P(x, e). � ∑ δ = m ( ) w x x So for f(X i ) = δ (X i = x i ), we get ˆ = = m i i m � P ( X x | e ) ∑ i i w = ′ m m m m where . w P ( x , e ) / P ( x ) m M Eric Xing 11 Likelhood weighting algorithm Eric Xing 12 6

Efficiency of likelihood weighting � The efficiency of importance sampling depends on how close the proposal Q is to the target P. � Suppose all the evidence is at the roots. Then Q = P(X|e), and all samples have weight 1. � Suppose all the evidence is at the leaves. Then Q is the prior, so many samples might get small weight if the evidence is unlikely. � We can use arc reversal to make some of the evidence nodes be roots instead of leaves, but the resulting network can be much more densely connected. Eric Xing 13 Weighted resampling � Problem of importance sampling: depends on how well Q matches P If P(x)f(x) is strongly varying and has a significant proportion of its mass � concentrated in a small region, r m will be dominated by a few samples * * * * * * * * * * Note that if the high-prob mass region of Q falls into the low-prob mass � region of P, the variance of can be small even if the r m P x m Q x m = ( ) / ( ) samples come from low-prob region of P and potentially erroneous . � Solution Use heavy tail Q . � P x m Q x m r m ( ) / ( ) w m = = ∑ ∑ P x l Q x l r m � Weighted resampling ( ) / ( ) l m Eric Xing 14 7

Weighted resampling Sampling importance resampling (SIR): � Draw N samples from Q : X 1 … X N 1. P x m Q x m r m Constructing weights: w 1 … w N , ( ) / ( ) w m = = 2. ∑ ∑ P x l Q x l r m ( ) / ( ) l m Sub-sample x from { X 1 … X N } w.p. ( w 1 … w N ) 3. Particular Filtering � X 1 ... X t X t+1 A special weighted resampler � Yield samples from posterior p( X t | Y 1:t ) Y 1 A A Y t Y t+1 A � Eric Xing 15 Sketch of Particle Filters � The starting point p X p Y X ( | ) ( | ) Y p X p X Y t 1 t 1 t t = = − : ( ) ( | , ) Y Y t 1 t t t 1 t − 1 ∫ p X p Y X dX : : ( | ) ( | ) Y t 1 t 1 t t t − : Thus p( X t | Y 1:t ) is represented by � ⎧ ⎫ ⎪ ⎪ m X m p X w m p Y X = ( | ) ⎨ t t ⎬ ~ ( | Y ), t t 1 t − 1 t M ⎪ : ⎪ p Y X m ∑ ⎩ ( | ) ⎭ t t m 1 = � A sequential weighted resampler Time update � ∫ p X p X X p X dX = ( | Y ) ( | ) ( | Y ) t 1 1 t t 1 t t 1 t t + + : : ∑ w m p X X = ( | ) (sample from a mixture model) t t 1 t ( | ) + p X Y + 1 t t m Measurement update � p X p Y X ( | Y ) ( | ) p X = t + 1 1 t t + 1 t + 1 : ( Y ) t + 1 1 t + 1 : ∫ p X p Y X dX ( | Y ) ( | ) t + 1 1 t t + 1 t + 1 t + 1 : ⎧ ⎫ ⎪ ⎪ m X p X w p Y X ⇒ m m = ( | ) ⎨ ~ ( | ), t + 1 t + 1 ⎬ (reweight) Y t + 1 t + 1 1 t t + 1 M ⎪ : m ⎪ p Y X ⎩ ∑ ⎭ ( | ) t + 1 t + 1 m = 1 Eric Xing 16 8

Monte Carlo methods Draw random samples from the desired - PDF document

School of Computer Science Approximate Inference: Monte Carlo Inference Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 18, Nov 19, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

SurFi: Detecting Surveillance Camera Looping Attacks with Wi-Fi Channel State Information Nitya

PrefMiner: Mining Users Preferences for Intelligent Mobile

Autonomous Mobility-on-Demand Systems: False Myths and Open Questions Prof. Dr. Emilio Frazzoli,

WINLAB Rutgers, The State University of New Jersey www.winlab.rutgers.edu Song Liu, Larry J.

Spoken Language Understanding strategies developed at the University of Avignon: For a better

Robustness Techniques for Speech Recognition Berlin Chen, 2004 References: 1. X. Huang et al.

R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In the 1970s, at Bell Labs,

Introduction to the R Language Data Types and Basic Operations Computing for Data Analysis 1 /

Monte Carlo methods Draw random samples from the desired - PDF document

School of Computer Science Approximate Inference: Monte Carlo Inference Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 18, Nov 19, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

SurFi: Detecting Surveillance Camera Looping Attacks with Wi-Fi Channel State Information Nitya

PrefMiner: Mining Users Preferences for Intelligent Mobile

Autonomous Mobility-on-Demand Systems: False Myths and Open Questions Prof. Dr. Emilio Frazzoli,

WINLAB Rutgers, The State University of New Jersey www.winlab.rutgers.edu Song Liu, Larry J.

Spoken Language Understanding strategies developed at the University of Avignon: For a better

Robustness Techniques for Speech Recognition Berlin Chen, 2004 References: 1. X. Huang et al.

R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In the 1970s, at Bell Labs,

Introduction to the R Language Data Types and Basic Operations Computing for Data Analysis 1 /

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.