Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona - PowerPoint PPT Presentation

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th, 2007 1 / 35

Outline 1 Introduction 2 Basic Sampling Algorithms 3 Markov Chain Monte Carlo (MCMC) 4 Gibbs Sampling 5 Slice Sampling 6 Hybrid Monte Carlo Algorithms 7 Estimating the Partition Function 2 / 35

Introduction Exact inference is intractable for most probabilistic models of practical interest. We’ve already discussed deterministic approximations including Variational Bayes and Expectation propagation . Here we consider approximation based on numerical sampling , also known as Monte Carlo techniques. 3 / 35

What is Monte Carlo? Monte Carlo is a small hillside town in Monaco (near Italy) with casino since 1865 like Las Vegas. Stainslaw Marcin Ulam (Polish Mathematician) named the statistical sampling methods in honor of his uncle, who was a gambler and would borrow money from relatives because he “just had to go to Monte Carlo” (which is suggested by another mathematician Nicholas Metropolis). The magic is running dice. 4 / 35

Common Questions Why do we need Monte Carlo techniques? Isn’t it trivial to sample from a probability? Are Monte Carlo methods always slow? What can Monte Carlo methods do for me? 5 / 35

General Idea of Sampling Mostly, the posterior distribution is primarily required for prediction. Fundamental problem: find the expectation of some function f ( z ) with respect to a probability p ( z ). � E [ f ] = f ( z ) p ( z ) dz General idea: obtain a set of samples z ( l ) drawn independently from the distribution p ( z ). So we can estimate the expectation: L 1 ˆ � f ( z ( l ) ) = f L l =1 E [ˆ f ] = E [ f ] 1 var [ˆ ( f − E [ f ]) 2 � � f ] = LE Note that the variance of estimate is independent of the sample dimensionality. Usually, 20+ independent samples may be sufficient. 6 / 35

So sampling is trivial? Expectation might be dominated by regions of small probability. f ( z ) p ( z ) z The samples might not be independent, so the effective sample size might be much smaller than the apparent sample size. 1 In complicated distributions like p ( z ) = Z p ˆ p ( z ), the normalization factor Z p is hard to calculate directly. 7 / 35

Sampling from Directed Graphical Models No variables are observed: Sample from the joint distribution using ancestral sampling. � p ( z ) = p ( z i | pa i ) Make one pass through the set of variables in some order and sample from the conditional distribution p ( z i | pa i ). Some nodes are observed: draw samples from the joint distribution and throw away samples which are not consistent with observations. Any serious problem? The overall probability of accepting a sample from the posterior decreases rapidly as the number of observed variables increases. 8 / 35

Sampling from Undirected Graphical Models For undirected graph, p ( x ) = 1 � φ C ( x C ) z C where C represents the maximal cliques. No one-pass sampling strategy that will sample even from the prior distribution with no observed variables. More computational expensive techniques must be employed like Gibbs Sampling (covered later). 9 / 35

Sampling from marginal distribution Sample from joint distribution. Sample from conditional distribution (posterior). Sample from marginal distribution. If we already have a strategy to sample from a joint distribution p ( u , v ), then we can obtain marginal distribution p ( u ) simply by ignoring the values of v in each sample. This strategy is used in some sampling techniques. 10 / 35

Review of Basic Probability Probability distribution function (pdf) Cumulative distribution function (cdf) 11 / 35

Probability under Transformation If we define a mapping f ( x ) from the original sample space X to another sample space Y : f ( x ) : X → Y y = f ( x ) What’s p ( y ) given p ( x )? F ( y ) = P ( Y ≤ y ) = P ( f ( X ) ≤ y ) � = p ( x ) dx { x ∈X : f ( x ) ≤ y } 12 / 35

For simplicity, we assume the function f is monotonic. Monotonic Increasing: � F Y ( y ) = p ( x ) dx { x ∈X : x ≤ f − 1 ( y ) } � f − 1 ( y ) = p ( x ) dx −∞ F X ( f − 1 ( y )) = Monotonic Decreasing: � F Y ( y ) = p ( x ) dx { x ∈X : x ≥ f − 1 ( y ) } � + ∞ = p ( x ) dx f − 1 ( y ) 1 − F X ( f − 1 ( y )) = 13 / 35

d p Y ( y ) = dy F Y ( y ) � p X ( f − 1 ( y )) d dy f − 1 ( y ) if f is increasing = − p X ( f − 1 ( y )) d dy f − 1 ( y ) if f is decreasing � � dx p X ( f − 1 ( y )) � � = � � dy � � This can be generalized to multiple variables: y i = f i ( x 1 , x 2 , · · · , x M ) , i = 1 , 2 , · · · , M . Then p ( y 1 , y 2 , · · · , y M ) = p ( x 1 , · · · , x M ) | J | where J is the Jacobian matrix: � ∂ x 1 ∂ x M � · · · � � ∂ y 1 ∂ y 1 � � | J | = · · · · · · · · · � � � � ∂ x 1 ∂ x M � · · · � ∂ y M ∂ y M � � 14 / 35

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona - PowerPoint PPT Presentation

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th, 2007 1 / 35 Outline 1 Introduction 2 Basic Sampling Algorithms 3 Markov Chain Monte Carlo (MCMC) 4 Gibbs Sampling 5 Slice Sampling 6 Hybrid Monte Carlo

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th,

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

CS786 Lecture 13: May 14, 2012 Sampling techniques [KF Chapter 12] CS786 P. Poupart 2012 1

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Lecture 4/Chapter 4 How to Get a Good Sample Sampling Activity Study Designs; Focus on

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II

TESTIMONY Manitoba Public Insurance 2019/20 GRA Valter Viola SYMPTOMS VS PROBLEMS 3-5

FAST COMPRESSIVE SAMPLING WITH STRUCTURALLY RANDOM MATRICES Thong T. Do , Trac D. Tran

Outline I. Why is power fun? Ubiquitous uncertainty II. Why is power modeling fun? III.

Summary of Polson and Sokolov 2018 Deep Learning for Energy Markets David Prentiss OR750-004

Modeling Jump Dependence using L evy copulas Andrea Krajina, Institut f ur Matematische

Quantum Theory and Social Choice Graciela Chichilnisky Columbia University and Stanford

F ACTOR graph [1], [2], or more often referred to as the classical factor graph (CFG) in this

Exchange clear Pure exchange: model where all of the Teemu Olvio economic agents are consumers

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona - PowerPoint PPT Presentation

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th, 2007 1 / 35 Outline 1 Introduction 2 Basic Sampling Algorithms 3 Markov Chain Monte Carlo (MCMC) 4 Gibbs Sampling 5 Slice Sampling 6 Hybrid Monte Carlo

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Chapter 11: Sampling Methods Lei Tang Department of CSE Arizona State University Dec. 18th,

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

CS786 Lecture 13: May 14, 2012 Sampling techniques [KF Chapter 12] CS786 P. Poupart 2012 1

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Lecture 4/Chapter 4 How to Get a Good Sample Sampling Activity Study Designs; Focus on

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II

TESTIMONY Manitoba Public Insurance 2019/20 GRA Valter Viola SYMPTOMS VS PROBLEMS 3-5

FAST COMPRESSIVE SAMPLING WITH STRUCTURALLY RANDOM MATRICES Thong T. Do , Trac D. Tran

Outline I. Why is power fun? Ubiquitous uncertainty II. Why is power modeling fun? III.

Summary of Polson and Sokolov 2018 Deep Learning for Energy Markets David Prentiss OR750-004

Modeling Jump Dependence using L evy copulas Andrea Krajina, Institut f ur Matematische

Quantum Theory and Social Choice Graciela Chichilnisky Columbia University and Stanford

F ACTOR graph [1], [2], or more often referred to as the classical factor graph (CFG) in this

Exchange clear Pure exchange: model where all of the Teemu Olvio economic agents are consumers

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling