Part 7 Bayesian hierarchical modelling, simulation and MCMC by - PowerPoint PPT Presentation

Factorization of the joint: Example m 1 m 2 t 1 t 2 omitting the fixed values in notation: the joint distribution β 1 β 2 f ( y 1 , . . . , y n , β 1 , β 2 , τ ) a b n x i � f ( y i | β 1 , β 2 , τ ) = f ( β 1 ) f ( β 2 ) f ( τ ) τ � �� i = 1 prior � �� likelihood y i is ∝ posterior f ( β 1 , β 2 , τ | y 1 , . . . , y n ) i = 1 , . . . , n ◮ it would be really useful to get posterior estimates based on the non-normalized density f ( y 1 , . . . , y n , β 1 , β 2 , τ ) ! 261

Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) 262

Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint 262

Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , 262

Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , ◮ marginalizing = use only, e.g., { β m 1 } , m = 1 , . . . , M 262

Bayesian Networks: Inference ◮ Markov Chain Monte Carlo: simulate samples from the joint ∝ posterior ( ◮ next block!) ◮ can get any distributions for any (set of) variables in the graph by conditioning and marginalizing of the joint ◮ for a set of M samples from the joint, { β m 1 , β m 2 , τ m } , m = 1 , . . . , M , ◮ marginalizing = use only, e.g., { β m 1 } , m = 1 , . . . , M ◮ conditioning = use only samples m with the right value of the conditioning parameter(s) (or redo the sampling with fixed conditioned values) 262

Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) 263

Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) 263

Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) ◮ conditional independence with IP gets very non-trivial (see, e.g., [2, §4] for the gory details) 263

Bayesian Networks with Imprecise Probability ◮ use sets of conditional distributions at nodes: credal networks (see, e.g., [2, §10], [17]) ◮ specific algorithms for discrete credal networks (see, e.g., [2, §10.5.3], or [14]) ◮ conditional independence with IP gets very non-trivial (see, e.g., [2, §4] for the gory details) ◮ here: do sensitivity analysis by varying prior distributions in sets: f ( β 1 ) ∈ M β 1 , . . . 263

Other Graph-Based Methods: SEM, Path Analysis Value PERV1 CUSL1 CUSA3 PERV2 Loyalty CUSL2 CUEX2 PERQ4 CUSA1 Satisfaction CUSL3 PERQ5 CUSA2 CUEX3 PERQ6 Complaints CUSCO Expectation Quality PERQ7 CUEX1 PERQ1 PERQ2 PERQ3 IMAG1 Image IMAG2 IMAG3 IMAG4 IMAG5 264

Value PERV1 CUSL1 CUSA3 PERV2 Loyalty CUSL2 CUEX2 PERQ4 CUSA1 Satisfaction CUSL3 PERQ5 CUSA2 CUEX3 PERQ6 Complaints CUSCO Expectation Quality PERQ7 CUEX1 PERQ1 PERQ2 PERQ3 IMAG1 Image IMAG2 IMAG3 IMAG4 IMAG5 Other Graph-Based Methods: SEM, Path Analysis 264

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) ◮ partial least squares ( R package semPLS ): iterative fitting of latent variable values and regression coefficients via least squares 265

Other Graph-Based Methods: SEM, Path Analysis ◮ Structural Equation Modeling (SEM, a.k.a. path modeling) uses graphs like BNs, but is something different ◮ used to estimate latent constructs by assuming linear relationships with measurements ( measurement / outer model ) and relationships between latent constructs ( structural model ) ◮ example: customer satisfaction is measured by survey questions 1 and 2 (measurement model); brand loyality is a function of customer satisfaction (structural model) ◮ estimation of factor loadings ( = regression coefficients) ◮ likelihood-based ( R package lavaan ): models expectations and (co)variances, not full distributions ( → multivariate normal) ◮ Bayesian SEM: ( R package blavaan ) ◮ partial least squares ( R package semPLS ): iterative fitting of latent variable values and regression coefficients via least squares ◮ path analysis: special case where a measurement can be linked to only one construct 265

Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models Exercises I Simulation & MCMC Exercises II 266

Exercise 1: Factorization of a Joint x 1 x 2 x 3 Which factorization of � � f { x i } i ∈ [ 1 ,..., 7 ] does this graph encode? x 4 x 5 x 6 x 7 267

Exercise 2: Naive Bayes Classifier The naive Bayes classifier from Part 6 assumes that the joint distribution of class c and attributes a 1 , . . . , a k can be factorized as k � p ( c , a ) = p ( c ) p ( a | c ) = p ( c ) p ( a i | c ) . i = 1 Draw the corresponding DAG! (Hint: use either a plate or consider two attributes a 1 and a 2 only.) 268

Exercise 3: Naive Bayes Classifier with Dirichlet Priors We can introduce parameters for p ( c ) and p ( a i | c ) : ( n ( c )) c ∈C ∼ Multinomal ( θ c ; c ∈ C ) (36) ∀ c ∈ C : ( n ( a i , c )) a i ∈A i | c ∼ Multinomal ( θ a i | c ; a i ∈ A i ) (37) where C denotes the set of all possible class values, and A i denotes the set of all possible values of attribute i . The θ parameters can be estimated using a Dirichlet prior: ( θ c ) c ∈C ∼ Dir ( s , ( t ( c )) c ∈C ) (38) ∀ c ∈ C : ( θ a i | c ) a i ∈A i | c ∼ Dir ( s , ( t ( a i , c )) a i ∈A i ) (39) where we must have that � a i ∈A i t ( a i , c ) = t ( c ) . [Note that t ( c ) is the prior expectation of θ c and t ( a i , c ) / t ( c ) is the prior expectation of θ a i | c .] Draw the corresponding graph! 269

Exercise 4: Sensitivity Analysis m 1 m 2 t 1 t 2 β 1 β 2 In the linear regression example there are 6 hyperparameters m 1 , t 1 , m 2 , t 2 , a , b . a b x i How would you do sensitivity analysis τ over the prior in that example? What problems do you foresee? y i i = 1 , . . . , n 270

Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models Exercises I Simulation & MCMC Exercises II 271

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent ◮ first: quick look at sampling from univariate distributions 272

Simulation & Markov Chain Monte Carlo: What? Why? ◮ BNs allow us to formulate complex models ◮ complex variance structures, . . . ◮ joint ∝ posterior usually intractable: how to do inference? ◮ simulate samples from joint / posterior: approximate . . . ◮ . . . posterior cdf by empirical cdf (density: kernel dens. est.) ◮ . . . posterior expectation by sample mean ◮ . . . any function of posterior parameters by sample equivalent ◮ first: quick look at sampling from univariate distributions ◮ then: MCMC for sampling from multivariate distributions 272

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) a.s. � − − → E ( g ( X )) (strong law of large numbers) lim E ( g ( X )) ◮ M →∞ 273

Monte Carlo Estimation: Why does it work? � ◮ want to estimate E ( g ( X )) = g ( x ) f ( x | . . . ) dx ◮ Monte Carlo sample x 1 , . . . , x M ( M samples drawn from f ( x | ... ) ) M E ( g ( X )) = 1 � � ◮ estimate E ( g ( X )) by g ( x i ) M i = 1 � � � ◮ unbiased: E E ( g ( X )) = E ( g ( X )) � � = 1 � ◮ variance: Var E ( g ( X )) M Var ( g ( X )) for independent samples only! ◮ precision of MC estimate increases with M , independent of parameter dimension! (numeric integration: number of evaluation points increases exponentially with dimension) a.s. � − − → E ( g ( X )) (strong law of large numbers) lim E ( g ( X )) ◮ M →∞ � � E ( g ( X )) a.s. � E ( g ( X )) , 1 ∼ N M Var ( g ( X )) (central limit thm) ◮ 273

Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG 274

Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG F ( a ) 1 u 0 a a 1 a 2 a 3 a 4 a 5 274

Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG F ( x ) 1 u 0 x 274

Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 u 0 x 274

Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 ◮ needs F − 1 ( · ) u 0 x 274

Simulation & MCMC: Univariate Sampling ◮ assumption for all sampling algorithms: we can sample from the uniform U ([ 0 , 1 ]) ◮ done by pseudo-random number generator (PNRG), in R : ?RNG ◮ does not work F ( x ) well in 1 dimensions > 1 ◮ needs F − 1 ( · ) u ◮ needs normalization factor ◮ rejection sampling 0 x 274

Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 275

Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) 275

Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) 2. sample u ( � ) from U ([ 0 , kq ( z )]) 275

Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 275

Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 3. reject all points in the grey area 275

Simulation & MCMC: Rejection Sampling kq ( z ) kq ( z 0 ) p ( z ) ∝ target density ˜ q ( z ) proposal density p ( z ) � u 0 z 0 z 1. sample z ( ↔ ) from q ( z ) sample points uniformly from union of white and grey areas 2. sample u ( � ) from U ([ 0 , kq ( z )]) 3. reject all points in the grey area 4. forget about u : z distributed ∝ ˜ p ( z ) ! 275

Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions 276

Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space 276

Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions 276

Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions ◮ if in each step we move in one dimension only: need to sample from one-dimensional distribution only, can use previous algorithms for that! 276

Markov Chain Monte Carlo: General Idea ◮ need to sample from high-dimensional distributions ◮ idea: produce samples by a Markov Chain: random walk over parameter space ◮ random walk spends more time in high-probability regions ◮ if in each step we move in one dimension only: need to sample from one-dimensional distribution only, can use previous algorithms for that! ◮ but: samples are not independent! 276

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) ◮ draw from the full conditionals f ( θ i | everything else ) ∝ joint 277

Markov Chain Monte Carlo: Algorithms ◮ Metropolis-Hastings: ◮ propose a step (draw from easy-to-sample-from proposal distribution ) ◮ accept step with certain probability (tailored to make chain approach the target distribution) ◮ Stan uses an improved variant called Hamiltonian MH ◮ Gibbs sampler: ◮ loop over parameter vector ( θ 1 , θ 2 , . . . ) ◮ draw from the full conditionals f ( θ i | everything else ) ∝ joint ◮ special case of MH where proposals are always accepted 277

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and . . . X T − 1 X 1 X 2 X T ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary , f ( x 2 | x 1 ) = · · · = f ( x n | x n − 1 ) ◮ irreducible and . . . X T − 1 X 1 X 2 X T ◮ aperiodic Markov chain which has the joint as its limiting (invariant) distribution 278

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 0 . 1 1 2 5 ◮ aperiodic 0 . 5 Markov chain 1 which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 278

Markov Chain Monte Carlo: Why does this work? Algorithms create a ◮ stationary, 1 1 ◮ irreducible and 0 . 4 1 2 ◮ aperiodic 0 . 5 Markov chain which has the joint as its 0 . 2 0 . 6 0 . 7 0 . 5 limiting (invariant) distribution 0 . 8 3 4 0 . 3 1 1 278

Part 7 Bayesian hierarchical modelling, simulation and MCMC by - PowerPoint PPT Presentation

Wednesday 14:00-17:30 Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian hierarchical modelling, simulation and MCMC Outline Bayesian hierarchical modelling / Bayesian networks / graphical models

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

The Modelling and Simulation Process 1. History of Modelling and Simulation 2. Modelling and

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Hierarchical Multidimensional Modelling Hierarchical Multidimensional Modelling in the Concept-

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian Zig Zag Developing probabilistic models using grid methods and MCMC Allen Downey ACM

Bayesian method probabilities Application of Bayesian methods Demo: McRobot (P . Lewis)

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

PageRank Google's PageRank algorithm. [Sergey Brin and Larry Page, 1998] Measure

Y TP YUKAWA INSTITUTE FOR THEORETICAL PHYSICS 1/21 Motivation Introduction Auxiliary Field

A Fast Estimation of SRAM Failure Rate Using Probability Collectives Fang Gong Electrical

Computational Statistical Modeling of Dynamic Socioeconomic, Geopolitical and Financial Systems NYU

Convergence of Random Processes DS GA 1002 Probability and Statistics for Data Science

Efficient space-filling and non-collapsing sequential design strategies for simulation-based

Monte Carlo Simulation for Pricing European and American Basket option Giuseppe Bruno Bank of

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION