Basics of Bayesian Inference A frequentist thinks of unknown - PowerPoint PPT Presentation

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of Bayesian Inference – p. 1

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed A Bayesian thinks of parameters as random, and thus coming from distributions (just like the data). Basics of Bayesian Inference – p. 1

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed A Bayesian thinks of parameters as random, and thus coming from distributions (just like the data). A Bayesian writes down a prior distribution for θ , and combines it with the likelihood for the observed data Y to obtain the posterior distribution of θ . All statistical inferences then follow from summarizing the posterior. Basics of Bayesian Inference – p. 1

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed A Bayesian thinks of parameters as random, and thus coming from distributions (just like the data). A Bayesian writes down a prior distribution for θ , and combines it with the likelihood for the observed data Y to obtain the posterior distribution of θ . All statistical inferences then follow from summarizing the posterior. This approach expands the class of candidate models, and facilitates hierarchical modeling, where it is important to properly account for various sources of uncertainty (e.g. spatial vs. nonspatial heterogeneity) Basics of Bayesian Inference – p. 1

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed A Bayesian thinks of parameters as random, and thus coming from distributions (just like the data). A Bayesian writes down a prior distribution for θ , and combines it with the likelihood for the observed data Y to obtain the posterior distribution of θ . All statistical inferences then follow from summarizing the posterior. This approach expands the class of candidate models, and facilitates hierarchical modeling, where it is important to properly account for various sources of uncertainty (e.g. spatial vs. nonspatial heterogeneity) The classical (frequentist) approach to inference will cause awkward interpretation and will struggle with uncertainty. Basics of Bayesian Inference – p. 1

Basics of Bayesian inference In the simplest form, we start with a model/distribution for the data given unknowns (parameters), f ( y | θ ) Since the data is observed, hence known, while θ is not, we equivalently view this as a function of θ given y and call it the likelihood, L ( θ ; y ) . We write the prior distribution for θ as π ( θ ) Then the joint model for the data and parameters becomes f ( y | θ ) π ( θ ) Conditioning in the opposite direction, we have π ( θ | y ) m ( y ) The first term is called the posterior distribution for θ , the second is the marginal distribution of the data We see that π ( θ | y ) ∝ f ( y | θ ) π ( θ ) . m ( y ) is the normalizing constant. Basics of Bayesian Inference – p. 2

Basics of Bayesian Inference More generally, we would have a prior distribution π ( θ | λ ) , where λ is a vector of hyperparameters. In fact, we can think of θ even more generally as the “process” of interest with some parts known and some parts unknown Then, we can write f ( y | process , θ )( f ( process , θ | λ ) π ( θ | λ ) π ( λ ) A hierarchical specification If λ known, the posterior distribution for θ is given by p ( y , θ | λ ) p ( y , θ | λ ) p ( θ | y , λ ) = = � p ( y | λ ) p ( y , θ | λ ) d θ f ( y | θ ) π ( θ | λ ) d θ = f ( y | θ ) π ( θ | λ ) f ( y | θ ) π ( θ | λ ) = . � m ( y | λ ) Basics of Bayesian Inference – p. 3

Basics of Bayesian Inference Since λ will not be known, a second stage (hyperprior) distribution h ( λ ) will be required, so that � f ( y | θ ) π ( θ | λ ) h ( λ ) d λ p ( θ | y ) = p ( y , θ ) = f ( y | θ ) π ( θ | λ ) h ( λ ) d θ d λ . � p ( y ) Alternatively, we might replace λ in p ( θ | y , λ ) by an estimate ˆ λ ; this is called empirical Bayes analysis So, p ( θ | y ) � = π ( θ ) This is referred to as Bayesian learning (the change in the posterior distribution compared with the prior). Basics of Bayesian Inference – p. 4

Illustration of Bayes’ Theorem Suppose f ( y | θ ) = N ( y | θ, σ 2 ) , θ ∈ ℜ and σ > 0 known Basics of Bayesian Inference – p. 5

Illustration of Bayes’ Theorem Suppose f ( y | θ ) = N ( y | θ, σ 2 ) , θ ∈ ℜ and σ > 0 known If we take π ( θ | λ ) = N ( θ | µ, τ 2 ) where λ = ( µ, τ ) ′ is fixed and known, then it is easy to show that σ 2 τ 2 σ 2 τ 2 � � p ( θ | y ) = N θ σ 2 + τ 2 µ + σ 2 + τ 2 y , . σ 2 + τ 2 Basics of Bayesian Inference – p. 5

Illustration of Bayes’ Theorem Suppose f ( y | θ ) = N ( y | θ, σ 2 ) , θ ∈ ℜ and σ > 0 known If we take π ( θ | λ ) = N ( θ | µ, τ 2 ) where λ = ( µ, τ ) ′ is fixed and known, then it is easy to show that σ 2 τ 2 σ 2 τ 2 � � p ( θ | y ) = N θ σ 2 + τ 2 µ + σ 2 + τ 2 y , . σ 2 + τ 2 Note that The posterior mean E ( θ | y ) is a weighted average of the prior mean µ and the data value y , with weights depending on our relative uncertainty the posterior precision (reciprocal of the variance) is equal to 1 /σ 2 + 1 /τ 2 , which is the sum of the likelihood and prior precisions. Basics of Bayesian Inference – p. 5

Illustration (continued) As a concrete example, let µ = 2 , τ = 1 , ¯ y = 6 , and σ = 1 : prior 1.2 posterior with n = 1 posterior with n = 10 1.0 0.8 density 0.6 0.4 0.2 0.0 -2 0 2 4 6 8 θ When n = 1 , prior and likelihood receive equal weight Basics of Bayesian Inference – p. 6

Illustration (continued) As a concrete example, let µ = 2 , τ = 1 , ¯ y = 6 , and σ = 1 : prior 1.2 posterior with n = 1 posterior with n = 10 1.0 0.8 density 0.6 0.4 0.2 0.0 -2 0 2 4 6 8 θ When n = 1 , prior and likelihood receive equal weight When n = 10 , the data begin to dominate the prior Basics of Bayesian Inference – p. 6

Illustration (continued) As a concrete example, let µ = 2 , τ = 1 , ¯ y = 6 , and σ = 1 : prior 1.2 posterior with n = 1 posterior with n = 10 1.0 0.8 density 0.6 0.4 0.2 0.0 -2 0 2 4 6 8 θ When n = 1 , prior and likelihood receive equal weight When n = 10 , the data begin to dominate the prior The posterior variance goes to zero as n → ∞ Basics of Bayesian Inference – p. 6

Notes on priors The prior here is conjugate: it leads to a posterior distribution for θ that is a member of the same distributional family as the prior. Note that setting τ 2 = ∞ corresponds to an arbitrarily vague (or noninformative) prior. The posterior is then θ | y, σ 2 /n � � p ( θ | y ) = N , the same as the likelihood! The limit of the conjugate (normal) prior here is a uniform (or “flat”) prior, and thus the posterior is the normalized likelihood. The flat prior is improper since � p ( θ ) dθ = + ∞ . However, as long as the posterior is integrable, i.e., � Θ f ( y | θ ) π ( θ ) d θ < ∞ an improper prior can be used! Basics of Bayesian Inference – p. 7

A linear model example Let Y be an n × 1 data vector, X an n × p matrix of covariates, and adopt the likelihood and prior structure, Y | β ∼ N n ( X β , Σ) and β ∼ N p ( A α , V ) Then the posterior distribution of β | Y is β | Y ∼ N ( D d , D ) , where D − 1 = X T Σ − 1 X + V − 1 and d = X T Σ − 1 Y + V − 1 A α . V − 1 = 0 delivers a “flat” prior; if Σ = σ 2 I p , we get � β , σ 2 ( X ′ X ) − 1 � ˆ β | Y ∼ N , where ˆ β = ( X ′ X ) − 1 X ′ y ⇐ ⇒ usual likelihood analysis! Basics of Bayesian Inference – p. 8

More on priors How do we choose priors? Prior robustness, sensitivity to prior Informative vs noninformative . Dangers with improper priors; appealing but... Always some prior information Prior elicitation Priors based upon previous experiments (previous posteriors can be current priors) Hyperpriors? Basics of Bayesian Inference – p. 9

More on priors Back to conjugacy: Y | µ ∼ N ( µ, σ 2 ) , µ ∼ N ( µ 0 , τ 2 ) then marginally, Y ∼ Normal and conditionally, µ | y ∼ Normal For vectors, Y | µ ∼ N ( µ , Σ) , µ ∼ N ( µ 0 , V ) then marginally, Y ∼ Normal and conditionally, µ | y ∼ Normal For variances, with Y | µ ∼ N ( µ, σ 2 ) , if σ 2 ∼ IG ( a, b ) , then σ 2 | y ∼ IG Never use IG ( ǫ, ǫ ) for small ǫ . Almost improper and, with variance components, leads to almost improper posteriors. Similar result for Σ but with inverse Wishart distributions Other conjugacies: Poisson with Gamma; Binomial with Beta Basics of Bayesian Inference – p. 10

Bayesian updating Often referred to as “crossing bridges as you come to them” Simplifies sequential data collection Simplest version: Y 1 , Y 2 indep given θ . So joint model is p ( y 2 | θ ) p ( y 1 | θ ) π ( θ ) ∝ p ( y 2 | θ ) π ( θ | y 1 ) , i.e., Y 1 updates π ( θ ) to π ( θ | y 1 ) before Y 2 arrives Works for more than two updates, for updating in blocks, for dependent as well as independent data Basics of Bayesian Inference – p. 11

CIHM Conditionally independent hierarchical model Model: Π i p ( y i | θ i )Π i p ( θ i | η ) π ( η ) η known - not interesting; separate model for each i . So, η unknown Lots of learning about η ; not much about θ i Model implies θ i are exchangeable ; learning about θ i takes the form of shrinkage Basics of Bayesian Inference – p. 12

Basics of Bayesian Inference A frequentist thinks of unknown - PowerPoint PPT Presentation

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of Bayesian Inference p. 1 Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed A Bayesian thinks of parameters as random,

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia

Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict

Introducing Bayes Rasmus Bth, rasmus.baath@gmail.com King Digital Entertainment Some ways to

Analysing identification issues in DSGE models Nikolai Iskrev, Marco Ratto Bank of Portugal,

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Announcements Piazza started Matlab Grader homework, email Friday, 2 (of 9) homeworks Due 21

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl