Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - - PowerPoint PPT Presentation

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26, 2019 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30

Bayesian computation Goals: � E θ | y [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ � p ( y ) = p ( y | θ ) p ( θ ) dθ = E θ [ p ( y | θ )] Approaches: Deterministic approximation Monte Carlo approximation Theoretical justification Gridding Inverse CDF Accept-reject Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 2 / 30

Numerical integration Deterministic methods where S � � θ ( s ) � � θ ( s ) � � � E [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ ≈ w s h p � y � S =1 and θ ( s ) are selected points, w s is the weight given to the point θ ( s ) , and the error can be bounded. Monte Carlo (simulation) methods where S � � θ ( s ) � � E [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ ≈ w s h S =1 and θ ( s ) ind ∼ g ( θ ) (for some proposal distribution g ), w s = p ( θ ( s ) | y ) /g ( θ ( s ) ) , and we have SLLN and CLT. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 3 / 30

Deterministic methods Example: Normal-Cauchy model Let Y ∼ N ( θ, 1) with θ ∼ Ca (0 , 1) . The posterior is p ( θ | y ) ∝ p ( y | θ ) p ( θ ) ∝ exp( − ( y − θ ) 2 / 2) = q ( θ | y ) 1 + θ 2 which is not a known distribution. We might be interested in 1. normalizing this posterior, i.e. calculating � c ( y ) = q ( θ | y ) dθ 2. or in calculating the posterior mean, i.e. � � θq ( θ | y ) E [ θ | y ] = θp ( θ | y ) dθ = c ( y ) dθ. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 4 / 30

Deterministic methods Normal-Cauchy: marginal likelihood y = 1 # Data q = function(theta, y, log = FALSE) { out = -(y-theta)^2/2-log(1+theta^2) if (log) return(out) return(exp(out)) } # Find normalizing constant for q(theta|y) w = 0.1 theta = seq(-5,5,by=w)+y (cy = sum(q(theta,y)*w)) # gridding based approach [1] 1.305608 integrate(function(x) q(x,y), -Inf, Inf) # numerical integration 1.305609 with absolute error < 0.00013 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 5 / 30

Deterministic methods Normal-Cauchy: distribution curve(q(x,y)/cy, -5, 5, n=1001) points(theta,rep(0,length(theta)), cex=0.5, pch=19) segments(theta,0,theta,q(theta,y)/cy) 0.5 0.4 q(x, y)/cy 0.3 0.2 0.1 0.0 −4 −2 0 2 4 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 6 / 30

Deterministic methods Posterior expectation θ ( s ) � q S S � y θ ( s ) � � � θ ( s ) � � θ ( s ) � � � � � � E [ h ( θ ) | y ] ≈ w s h p � y = w s h � c ( y ) s =1 s =1 h = function(theta) theta sum(w*h(theta)*q(theta,y)/cy) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 7 / 30

Deterministic methods Posterior expectation as a function of observed data 5.0 2.5 Posterior expectation prior Cauchy 0.0 normal improper uniform −2.5 −5.0 −5.0 −2.5 0.0 2.5 5.0 y Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 8 / 30

Monte Carlo methods Convergence review Three main notions of convergence of a sequence of random variables X 1 , X 2 , . . . and a random variable X : d Convergence in distribution ( X n → X ): n →∞ F n ( X ) = F ( x ) . lim p Convergence in probability (WLLN, X n → X ): n →∞ P ( | X n − X | ≥ ǫ ) = 0 . lim a.s. Almost sure convergence (SLLN, X n − → X ): � � P n →∞ X n = X lim = 1 . Implications: Almost sure convergence implies convergence in probability. Convergence in probability implies convergence in distribution. Here, X n will be our approximation to an integral and X the true (constant) value of that integral or X n will be a standardized approximation and X will be N (0 , 1) . Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 9 / 30

Monte Carlo methods Monte Carlo integration Consider evaluating the integral � E [ h ( θ )] = h ( θ ) p ( θ ) dθ Θ using the Monte Carlo estimate S h S = 1 � θ ( s ) � ˆ � h S s =1 where θ ( s ) ind ∼ p ( θ ) . We know SLLN: ˆ a.s. h S → E [ h ( θ )] . − CLT: if h 2 has finite expectation, then ˆ h S − E [ h ( θ )] d → N (0 , 1) � v S /S where S v S = V ar [ h ( θ )] ≈ 1 � 2 � � θ ( s ) � − ˆ � h h S S s =1 or any other consistent estimator. Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 10 / 30

Monte Carlo methods Definite integral Definite integral Suppose you are interested in evaluating � 1 e − θ 2 / 2 dθ. I = 0 Then set h ( θ ) = e − θ 2 / 2 and p ( θ ) = 1 , i.e. θ ∼ Unif (0 , 1) . and approximate by a Monte Carlo estimate via 1. For s = 1 , . . . , S , a. sample θ ( s ) ind ∼ Unif (0 , 1) and � θ ( s ) � b. calculate h . 2. Calculate S I ≈ 1 � h ( θ ( s ) ) . S s =1 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 11 / 30

Monte Carlo methods Definite integral Monte Carlo sampling randomly infills 20 samples 200 samples 1.0 1.0 0.9 0.9 f(x) f(x) 0.8 0.8 0.7 0.7 0.6 0.6 0.0 0.4 0.8 0.0 0.4 0.8 x x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 12 / 30

Monte Carlo methods Definite integral Strong law of large numbers Monte Carlo estimate 0.95 0.90 Estimate 0.85 0 200 400 600 800 1000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 13 / 30

Monte Carlo methods Definite integral Central limit theorem Monte Carlo estimate 0.95 0.90 Estimate 0.85 0.80 Truth Estimate 0.75 95% CI 0 200 400 600 800 1000 Number of samples Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 14 / 30

Monte Carlo methods Infinite bounds Infinite bounds Suppose θ ∼ N (0 , 1) and you are interested in evaluating � ∞ 1 2 πe − θ 2 / 2 dθ √ E [ θ ] = θ −∞ Then set h ( θ ) = θ and g ( θ ) = φ ( θ ) , i.e. θ ∼ N (0 , 1) . and approximate by a Monte Carlo estimate via 1. For s = 1 , . . . , S , a. sample θ ( s ) ind ∼ N (0 , 1) and � θ ( s ) � b. calculate h . 2. Calculate S E [ θ ] ≈ 1 � h ( θ ( s ) ) . S s =1 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 15 / 30

Monte Carlo methods Infinite bounds Non-uniform sampling n=20 n=100 3 3 2 2 h(theta)=theta 1 h(theta)=theta 1 0 0 −1 −1 −2 −2 −3 −3 −3 −1 1 2 3 −3 −1 1 2 3 locations, theta locations, theta Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 16 / 30

Monte Carlo methods Infinite bounds Monte Carlo estimate Monte Carlo estimate 0.4 0.2 0.0 Estimate −0.2 −0.4 Truth Estimate 95% CI −0.6 0 200 400 600 800 1000 S Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 17 / 30

Monte Carlo methods Gridding Monte Carlo approximation via gridding Rather than determining c ( y ) and then E [ θ | y ] via deterministic gridding (all w i are equal), we can use the grid as a discrete approximation to the posterior, i.e. N q ( θ i | y ) � p ( θ | y ) ≈ p i δ θ i ( θ ) p i = � N s =1 q ( θ j | y ) i =1 where δ θ i ( θ ) is the Dirac delta function, i.e. � δ θ i ( θ ) = 0 ∀ θ � = θ i δ θ i ( θ ) dθ = 1 . This discrete approximation to p ( θ | y ) can be used to approximate the expectation E [ h ( θ ) | y ] deterministically or via simulation, i.e. N S E [ h ( θ ) | y ] ≈ 1 � θ ( s ) � � � E [ h ( θ ) | y ] ≈ p i h ( θ i ) h S i =1 s =1 where θ ( s ) ind ∼ � N i =1 p i δ θ i ( θ ) (with replacement). Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 18 / 30

Monte Carlo methods Gridding Example: Normal-Cauchy model y = 1 # Data # Small number of grid locations theta = seq(-5,5,length=1e2+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,prob=p,replace=TRUE)) [1] 0.6118812 # Large number of grid locations theta = seq(-5,5,length=1e6+1)+y; p = q(theta,y)/sum(q(theta,y)); sum(p*theta) [1] 0.5542021 mean(sample(theta,1e2,prob=p,replace=TRUE)) # But small MC sample [1] 0.598394 # Truth post_expectation(1) [1] 0.5542021 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 19 / 30

Monte Carlo methods Inverse CDF method Inverse cumulative distribution function Definition The cumulative distribution function (cdf) of a random variable X is defined by F X ( x ) = P X ( X ≤ x ) for all x. Lemma Let X be a random variable whose cdf is F ( x ) and you have access to the inverse cdf of X , i.e. if x = F − 1 ( u ) . u = F ( x ) = ⇒ If U ∼ Unif (0 , 1) , then X = F − 1 ( U ) is a simulation from the distribution for X . Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 20 / 30

Monte Carlo methods Inverse CDF method Inverse CDF Standard normal CDF 1.0 0.8 0.6 pnorm(x) 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 x Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 21 / 30

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - - PowerPoint PPT Presentation

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26, 2019 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30 Bayesian computation Goals: E | y [ h ( ) |

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Monte Carlo Tree Search Mark Maloof Department of Computer Science Georgetown University

Sensitivity Estimates Using a Toy Monte Carlo Dave Waters, University College London with Sean

Contents 1 Introduction 1 1.1 When We Dont Need Simulation . . . . . . . . . . . . . . . .

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

QUASI-EQUILIBRIUM MONTE-CARLO: OFF-LATTICE KINETIC MONTE CARLO SIMULATION OF HETEROEPITAXY

Example: Monte Carlo Simulation Marco Chiarandini (marco@imada.sdu.dk) Department of Mathematics

Computational Statistical Modeling of Dynamic Socioeconomic, Geopolitical and Financial Systems NYU

A Fast Estimation of SRAM Failure Rate Using Probability Collectives Fang Gong Electrical

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - - PowerPoint PPT Presentation

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26, 2019 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 26, 2019 1 / 30 Bayesian computation Goals: E | y [ h ( ) |

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Monte Carlo Tree Search Mark Maloof Department of Computer Science Georgetown University

Sensitivity Estimates Using a Toy Monte Carlo Dave Waters, University College London with Sean

Contents 1 Introduction 1 1.1 When We Dont Need Simulation . . . . . . . . . . . . . . . .

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage &amp; Optimization

QUASI-EQUILIBRIUM MONTE-CARLO: OFF-LATTICE KINETIC MONTE CARLO SIMULATION OF HETEROEPITAXY

Example: Monte Carlo Simulation Marco Chiarandini (marco@imada.sdu.dk) Department of Mathematics

Computational Statistical Modeling of Dynamic Socioeconomic, Geopolitical and Financial Systems NYU

A Fast Estimation of SRAM Failure Rate Using Probability Collectives Fang Gong Electrical

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization