Sampling and Monte Carlo Integration Michael Gutmann Probabilistic - PowerPoint PPT Presentation

Sampling and Monte Carlo Integration Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018

Recap Learning and inference often involves intractable integrals, e.g. ◮ Marginalisation � p ( x ) = p ( x , y ) d y y ◮ Expectations � E [ g ( x ) | y o ] = g ( x ) p ( x | y o ) d x for some function g . ◮ For unobserved variables, likelihood and gradient of the log lik � L ( θ ) = p ( D ; θ ) = p ( u , D ; θ d u ) , u ∇ θ ℓ ( θ ) = E p ( u |D ; θ ) [ ∇ θ log p ( u , D ; θ )] Notation: E p ( x ) is sometimes used to indicate that the expectation is taken with respect to p ( x ). Michael Gutmann Sampling and Monte Carlo Integration 2 / 41

Recap Learning and inference often involves intractable integrals, e.g. ◮ For unnormalised models with intractable partition functions p ( D ; θ ) ˜ L ( θ ) = � x ˜ p ( x ; θ ) d x ∇ θ ℓ ( θ ) ∝ m ( D ; θ ) − E p ( x ; θ ) [ m ( x ; θ )] ◮ Combined case of unnormalised models with intractable partition functions and unobserved variables. ◮ Evaluation of intractable integrals can sometimes be avoided by using other learning criteria (e.g. score matching). ◮ Here: methods to approximate integrals like those above using sampling. Michael Gutmann Sampling and Monte Carlo Integration 3 / 41

Program 1. Monte Carlo integration 2. Sampling Michael Gutmann Sampling and Monte Carlo Integration 4 / 41

Program 1. Monte Carlo integration Approximating expectations by averages Importance sampling 2. Sampling Michael Gutmann Sampling and Monte Carlo Integration 5 / 41

Averages with iid samples ◮ Tutorial 7: For Gaussians, the sample average is an estimate (MLE) of the mean (expectation) E [ x ] n � x = 1 ¯ x i ≈ E [ x ] n i =1 ◮ Gaussianity not needed: assume x i are iid observations of x ∼ p ( x ). � n � x n = 1 E [ x ] = xp ( x ) d x ≈ ¯ ¯ x n x i n i =1 ◮ Subscript n reminds us that we used n samples to compute the average. ◮ Approximating integrals by means of sample averages is called Monte Carlo integration. Michael Gutmann Sampling and Monte Carlo Integration 6 / 41

Averages with iid samples ◮ Sample average is unbiased n � x n ] = 1 = n E [ x i ] ∗ E [¯ n E [ x ] = E [ x ] n i =1 ( ∗ : “identically distributed” assumption is used, not independence) ◮ Variability � n � n � � x n ] = 1 = 1 V [ x i ] = 1 ∗ V [¯ n V [ x ] x i n 2 V n 2 i =1 i =1 ( ∗ : independence assumption used) ◮ Squared error decreases as 1 / n � x n − E [ x ]) 2 � = 1 V [¯ x n ] = E (¯ n V [ x ] Michael Gutmann Sampling and Monte Carlo Integration 7 / 41

Averages with iid samples ◮ Weak law of large numbers: x n − E [ x ] | ≥ ǫ ) ≤ V [ x ] Pr ( | ¯ n ǫ 2 ◮ As n → ∞ , the probability for the sample average to deviate from the expected value goes to zero. ◮ We say that sample average converges in probability to the expected value. ◮ Speed of convergence depends on the variance V [ x ]. ◮ Different “laws of large numbers” exist that make different assumptions. Michael Gutmann Sampling and Monte Carlo Integration 8 / 41

Chebyshev’s inequality ◮ Weak law of large numbers is a direct consequence of Chebyshev’s inequality ◮ Chebyshev’s inequality: Let s be some random variable with mean E [ s ] and variance V [ s ]. Pr ( | s − E [ s ] | ≥ ǫ ) ≤ V [ s ] ǫ 2 ◮ This means that for all random variables: ◮ probability to deviate more than three standard deviation from the mean is less than 1 / 9 ≈ 0 . 11 � (set ǫ = 3 V ( s )) ◮ Probability to deviate more than 6 standard deviations: 1 / 36 ≈ 0 . 03. These are conservative values; for many distributions, the probabilities will be smaller. Michael Gutmann Sampling and Monte Carlo Integration 9 / 41

Proofs (not examinable) ◮ Chebyshev’s inequality follows from Markov’s inequality. ◮ Markov’s inequality: For a random variable y ≥ 0 Pr( y ≥ t ) ≤ E [ y ] ( t > 0) t ◮ Chebyshev’s inequality is obtained by setting y = | s − E [ s ] | � ( s − E [ s ]) 2 ≥ t 2 � Pr ( | s − E [ s ] | ≥ t ) = Pr � ( s − E [ s ]) 2 � ≤ E . t 2 Chebyshev’s inequality follows with t = ǫ , and because E [( s − E [ s ] 2 ] is the variance V [ s ] of s . Michael Gutmann Sampling and Monte Carlo Integration 10 / 41

Proofs (not examinable) Proof for Markov’s inequality: Let t be an arbitrary positive number and y a one-dimensional non-negative random variable with pdf p . We can decompose the expectation of y using t as split-point, � ∞ � t � ∞ E [ y ] = up ( u ) d u = up ( u ) d u + up ( u ) d u . 0 0 t Since u ≥ t in the second term, we obtain the inequality � t � ∞ E [ y ] ≥ up ( u ) d u + tp ( u ) d u . 0 t The second term is t times the probability that y ≥ t , so that � t E [ y ] ≥ up ( u ) d u + t Pr( y ≥ t ) 0 ≥ t Pr( y ≥ t ) , where the second line holds because the first term in the first line is non-negative. This gives Markov’s inequality Pr( y ≥ t ) ≤ E ( y ) ( t > 0) t Michael Gutmann Sampling and Monte Carlo Integration 11 / 41

Averages with correlated samples ◮ When computing the variance of the sample average x n ] = V [ x ] V [¯ n we assumed the samples are identically and independently distributed. ◮ The variance shrinks with increasing n and the average becomes more and more concentrated around E [ x ]. ◮ Corresponding results exist for the case of statistically dependent samples x i . Known as “ergodic theorems”. ◮ Important for the theory of Markov chain Monte Carlo methods but requires advanced mathematical theory. Michael Gutmann Sampling and Monte Carlo Integration 12 / 41

More general expectations ◮ So far, we have considered � n � xp ( x ) d x ≈ 1 E [ x ] = x i n i =1 where x i ∼ p ( x ) ◮ This generalises � n � g ( x ) p ( x ) d x ≈ 1 E [ g ( x )] = g ( x i ) n i =1 where x i ∼ p ( x ) ◮ Variance of the approximation if the x i are iid is 1 n V [ g ( x )] Michael Gutmann Sampling and Monte Carlo Integration 13 / 41

Example (Based on a slide from Amos Storkey) � n � g ( x ) N ( x ; 0 , 1) d x ≈ 1 E [ g ( x )] = g ( x i ) ( x i ∼ N ( x ; 0 , 1)) n i =1 for g ( x ) = x and g ( x ) = x 2 Left: sample average as a function of n Right: Variability (0.5 quantile: solid, 0.1 and 0.9 quantiles: dashed) 3 3 2.5 2.5 2 2 Distribution of the average 1.5 1.5 1 Average 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 -1.5 -2 -1.5 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Michael Gutmann Sampling and Monte Carlo Integration 14 / 41

Example (Based on a slide from Amos Storkey) � n � g ( x ) N ( x ; 0 , 1) d x ≈ 1 E [ g ( x )] = g ( x i ) ( x i ∼ N ( x ; 0 , 1)) n i =1 for g ( x ) = exp(0 . 6 x 2 ) Left: sample average as a function of n Right: Variability (0.5 quantile: solid, 0.1 and 0.9 quantiles: dashed) 10 15 9 8 Distribution of the average 7 10 Average 6 5 4 5 3 2 1 0 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Michael Gutmann Sampling and Monte Carlo Integration 15 / 41

Example ◮ Indicators that something is wrong: ◮ Strong fluctuations in the sample average as n increases. ◮ Large non-declining variability. ◮ Note: integral is not finite: � � 1 exp(0 . 6 x 2 ) N ( x ; 0 , 1) d x = exp(0 . 6 x 2 ) exp( − 0 . 5 x 2 ) d x √ 2 π � 1 exp(0 . 1 x 2 ) d x √ = 2 π = ∞ but for any n , the sample average is finite and may be mistaken for a good approximation. ◮ Check variability when approximating the expected value by a sample average! Michael Gutmann Sampling and Monte Carlo Integration 16 / 41

Approximating general integrals ◮ If the integral does not correspond to an expectation, we can smuggle in a pdf q to rewrite it as an expected value with respect to q � � g ( x ) q ( x ) I = g ( x ) d x = q ( x ) d x � g ( x ) = q ( x ) q ( x ) d x � g ( x ) � = E q ( x ) q ( x ) n � ≈ 1 g ( x i ) q ( x i ) n i =1 with x i ∼ q ( x ) (iid) ◮ This is the basic idea of importance sampling. ◮ q is called the importance (or proposal) distribution Michael Gutmann Sampling and Monte Carlo Integration 17 / 41

Choice of the importance distribution ◮ Call the approximation � I , n � I = 1 g ( x i ) � q ( x i ) n i =1 ◮ � I is unbiased by construction � g ( x ) � � g ( x ) � E [ � I ] = E = q ( x ) q ( x ) d x = g ( x ) d x = I q ( x ) ◮ Variance �� g ( x ) � 2 � � g ( x ) � � � g ( x ) �� 2 � � = 1 = 1 − 1 � I V n V n E E q ( x ) q ( x ) q ( x ) n � �� I 2 Depends on the second moment. Michael Gutmann Sampling and Monte Carlo Integration 18 / 41

Choice of the importance distribution ◮ The second moment is �� g ( x ) � 2 � � � g ( x ) � 2 � g ( x ) 2 = q ( x ) d x = q ( x ) d x E q ( x ) q ( x ) � | g ( x ) || g ( x ) | = q ( x ) d x ◮ Bad: q ( x ) is small when | g ( x ) | is large. Gives large variance. ◮ Good: q ( x ) is large when | g ( x ) | is large. ◮ Optimal q equals | g ( x ) | q ∗ ( x ) = � | g ( x ) | d x ◮ Optimal q cannot be computed, but justifies the heuristic that q ( x ) should be large when | g ( x ) | is large, or that the ratio | g ( x ) | / q ( x ) should be approximately constant . Michael Gutmann Sampling and Monte Carlo Integration 19 / 41

Sampling and Monte Carlo Integration Michael Gutmann Probabilistic - PowerPoint PPT Presentation

Sampling and Monte Carlo Integration Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap Learning and inference often involves intractable integrals, e.g.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Integration Monte Carlo methods use random numbers for sampling Although

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Geometrically Coupled Monte Carlo Sampling Mark Rowland Krzysztof Choromanski Franois Chalus

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

The Monte Carlo Method Estimating through sampling (estimating , p -value, integrals,...)

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

System Acceptance and Regression System, Acceptance, and Regression Testing (c) 2007 Mauro

Paxos Made Simple John Nguyen Slides adapted from Leslie Lamport and Thomas Marshall Problem

Introduction Amith Pulla QA Manger at Intel Involved in software testing strategies and

Generation of Non-Uniform Random Numbers Refs: Chapter 8 in Law and book by Devroye (watch for

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani

1 Ex. 1 The mean salt content of a certain type of potato chips is supposed to be 2.0mg. The salt

Sampling and Monte Carlo Integration Michael Gutmann Probabilistic - PowerPoint PPT Presentation

Sampling and Monte Carlo Integration Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap Learning and inference often involves intractable integrals, e.g.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Integration Monte Carlo methods use random numbers for sampling Although

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

Geometrically Coupled Monte Carlo Sampling Mark Rowland Krzysztof Choromanski Franois Chalus

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

The Monte Carlo Method Estimating through sampling (estimating , p -value, integrals,...)

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

System Acceptance and Regression System, Acceptance, and Regression Testing (c) 2007 Mauro

Paxos Made Simple John Nguyen Slides adapted from Leslie Lamport and Thomas Marshall Problem

Introduction Amith Pulla QA Manger at Intel Involved in software testing strategies and

Generation of Non-Uniform Random Numbers Refs: Chapter 8 in Law and book by Devroye (watch for

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Probabilistic &amp; Unsupervised Learning Sampling Methods Maneesh Sahani

1 Ex. 1 The mean salt content of a certain type of potato chips is supposed to be 2.0mg. The salt

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani