Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University - PowerPoint PPT Presentation

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 1 / 18

Normal approximation to the posterior Normal approximation to the posterior Suppose p ( θ | y ) is unimodal and roughly symmetric, then a Taylor series expansion of the logarithm of the posterior around the posterior mode ˆ θ is − d 2 θ | y ) − 1 � � log p ( θ | y ) = log p (ˆ 2( θ − ˆ θ ) ⊤ ( θ − ˆ dθ 2 log p ( θ | y ) θ ) + · · · θ =ˆ θ where the linear term in the expansion is zero because the derivative of the log-posterior density is zero at its mode. Discarding the higher order terms, this expansion provides a normal approximation to the posterior, i.e. d ≈ N (ˆ θ, J(ˆ θ ) − 1 ) p ( θ | y ) where J(ˆ θ ) is the sum of the prior and observed information, i.e. θ ) = − d 2 θ − d 2 J(ˆ dθ 2 log p ( θ ) | θ =ˆ dθ 2 log p ( y | θ ) | θ =ˆ θ . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 2 / 18

Normal approximation to the posterior Example Binomial probability Let y ∼ Bin ( n, θ ) and θ ∼ Be ( a, b ) , then θ | y ∼ Be ( a + y, b + n − y ) and the posterior mode is θ = y ′ a + y − 1 ˆ n ′ = a + b + n − 2 . Thus n ′ J(ˆ θ ) = . θ (1 − ˆ ˆ θ ) Thus � � θ (1 − ˆ ˆ θ ) d ˆ p ( θ | y ) ≈ N θ, . n ′ Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 3 / 18

Normal approximation to the posterior Example Binomial probability a = b = 1 n = 10 y = 3 par(mar=c(5,4,0.5,0)+.1) curve(dbeta(x,a+y,b+n-y), lwd=2, xlab=expression(theta), ylab=expression(paste("p(", theta,"|y)"))) # Approximation yp = a+y-1 np = a+b+n-2 theta_hat = yp/np curve(dnorm(x,theta_hat, sqrt(theta_hat*(1-theta_hat)/np)), add=TRUE, col="red", lwd=2) legend("topright",c("True posterior","Normal approximation"), col=c("black","red"), lwd=2) Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 4 / 18

Normal approximation to the posterior Example Binomial probability 3.0 True posterior Normal approximation 2.5 2.0 p( θ |y) 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 θ Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 5 / 18

Large-sample theory Large-sample theory iid Consider a model y i ∼ p ( y | θ 0 ) for some true value θ 0 . Does the posterior distribution converge to θ 0 ? Does a point estimator (mode) converge to θ 0 ? What is the limiting posterior distribution? Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 6 / 18

Large-sample theory Convergence of the posterior distribution Convergence of the posterior distribution iid Consider a model y i ∼ p ( y | θ 0 ) for some true value θ 0 . Theorem If the parameter space Θ is discrete and Pr ( θ = θ 0 ) > 0 , then Pr ( θ = θ 0 | y ) → 1 as n → ∞ . Theorem If the parameter space Θ is continuous and A is a neighborhood around θ 0 with Pr ( θ ∈ A ) > 0 , then Pr ( θ ∈ A | y ) → 1 as n → ∞ . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 7 / 18

Large-sample theory Convergence of the posterior distribution library(smcUtils) theta = seq(0.1,0.9, by=0.1); theta0 = 0.3 n = 1000 y = rbinom(n, 1, theta0) p = matrix(NA, n,length(theta)) p[1,] = renormalize(dbinom(y[1],1,theta, log=TRUE), log=TRUE) for (i in 2:n) { p[i,] = renormalize(dbinom(y[i],1,theta, log=TRUE)+log(p[i-1,]), log=TRUE) } plot(p[,1], ylim=c(0,1), type="l", xlab="n", ylab="Probability") for (i in 1:length(theta)) lines(p[,i], col=i) legend("right", legend=theta, col=1:9, lty=1) 1.0 0.8 0.1 0.2 0.3 Probability 0.6 0.4 0.5 0.6 0.4 0.7 0.8 0.2 0.9 0.0 0 200 400 600 800 1000 n Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 8 / 18

Large-sample theory Convergence of the posterior distribution a = b = 1 e = 0.05 p = rep(NA,n) for (i in 1:n) { yy = sum(y[1:i]) zz = i-yy p[i] = diff(pbeta(theta0+c(-e,e), a+yy, b+zz)) } plot(p, type="l", ylim=c(0,1), ylab="Posterior probability of neighborhood", xlab="n", main="Continuous parameter space") Continuous parameter space 1.0 Posterior probability of neighborhood 0.8 0.6 0.4 0.2 0.0 0 200 400 600 800 1000 n Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 9 / 18

Large-sample theory Consistency of Bayesian point estimates Consistency of Bayesian point estimates iid Suppose y i ∼ p ( y | θ 0 ) where θ 0 is a particular value for θ . p Recall that an estimator is consistent, i.e. ˆ θ → θ 0 , if n →∞ P ( | ˆ lim θ − θ 0 | < ǫ ) = 1 . p Recall, under regularity conditions that ˆ θ MLE → θ 0 . If Bayesian estimators converge to the MLE, then they have the same properties. Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 10 / 18

Large-sample theory Consistency of Bayesian point estimates Binomial example Consider y ∼ Bin ( n, θ ) with true value θ = θ 0 and prior θ ∼ Be ( a, b ) . Then θ | y ∼ Be ( a + y, b + n − y ) . Recall that ˆ θ MLE = y/n . The following estimators are all consistent a + y Posterior mean: a + b + n a + y − 1 / 3 Posterior median: ≈ a + b + n − 2 / 3 for α, β > 1 a + y − 1 Posterior mode: a + b + n − 2 since as n → ∞ , these all converge to ˆ θ MLE = y/n . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 11 / 18

Large-sample theory Consistency of Bayesian point estimates a = b = 1 n = 1000 theta0 = 0.5 y = rbinom(n, 1, theta0) yy = cumsum(y) nn = 1:n plot(0,0, type="n", xlim=c(0,n), ylim=c(0,1), xlab="Number of flips", ylab="Estimates") abline(h=theta0) lines((a+yy)/(a+b+nn), col=2) lines((a+yy-1/3)/(a+b+nn-2/3), col=3) lines((a+yy-1)/(a+b+nn-2), col=4) legend("topright",c("Truth","Mean","Median","Mode"), col=1:4, lty=1) 1.0 Truth Mean Median 0.8 Mode 0.6 Estimates 0.4 0.2 0.0 0 200 400 600 800 1000 Number of flips Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 12 / 18

Large-sample theory Consistency of Bayesian point estimates Normal example iid Consider Y i ∼ N ( θ, 1) with known and prior θ ∼ N ( c, 1) . Then � 1 n 1 � θ | y ∼ N n + 1 c + n + 1 y, n + 1 Recall that ˆ θ MLE = y . Since the posterior mean converges to the MLE, then the posterior mean (as well as the median and mode) are consistent. 10 9 Estimates 8 7 Truth 6 MLE Posterior mean 0 50 100 150 200 n Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 13 / 18

Asymptotic normality Asymptotic normality Consider the Taylor series expansion of the log posterior − d 2 θ | y ) − 1 � � log p ( θ | y ) = log p (ˆ 2( θ − ˆ ( θ − ˆ θ ) ⊤ dθ 2 log p ( θ | y ) θ ) + R θ =ˆ θ where the linear term is zero because the derivative at the posterior mode ˆ θ is zero and R represents all higher order terms. With iid observations, the coefficient for the quadratic term can be written as n − d 2 θ = − d 2 d 2 � dθ 2 [log p ( θ | y )] θ =ˆ dθ 2 log p ( θ ) θ =ˆ θ − dθ 2 [log p ( y i | θ )] θ =ˆ θ i =1 where � − d 2 � E y dθ 2 [log p ( y i | θ )] θ =ˆ = I( θ 0 ) θ where I( θ 0 ) is the expected Fisher information and thus, by the LLN, the second term converges to n I( θ 0 ) . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 14 / 18

Asymptotic normality Asymptotic normality For large n , we have θ | y ) − 1 θ ) ⊤ [ n I( θ 0 )] ( θ − ˆ log p ( θ | y ) ≈ log p (ˆ 2( θ − ˆ θ ) where ˆ θ is the posterior mode. If ˆ θ → θ 0 as n → ∞ , I(ˆ θ ) → I( θ 0 ) as n → ∞ and we have � � − 1 θ ) ⊤ � � 2( θ − ˆ n I(ˆ ( θ − ˆ p ( θ | y ) ∝ exp θ ) θ ) . Thus, as n → ∞ � � θ, 1 d ˆ n I(ˆ θ ) − 1 θ | y → N Thus, the posterior distribution is asymptotically normal. Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 15 / 18

Asymptotic normality Binomial example Suppose y ∼ Bin ( n, θ ) and θ ∼ Be ( a, b ) . a = b = 1 a = b = 10 a = b = 100 30 20 n = 10 10 0 30 Distribution 20 Density n = 100 Posterior 10 Normal approximation 0 30 20 n = 1000 10 0 0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00 x Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 16 / 18

Asymptotic normality What can go wrong? What can go wrong? Not unique to Bayesian statistics Unidentified parameters Number of parameters increase with sample size Aliasing Unbounded likelihoods Tails of the distribution True sampling distribution is not p ( y | θ ) Unique to Bayesian statistics Improper posterior Prior distributions that exclude the point of convergence Convergence to the edge of the parameter space (prior) Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 17 / 18

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University - PowerPoint PPT Presentation

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 1 / 18 Normal approximation to the posterior Normal approximation to the posterior Suppose p (

Asymptotics of symmetric functions with applications to Setup Asymptotics of statistical

Asymptotics Will Perkins January 22, 2013 Asymptotics In many theorems and questions in

Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur

Foundations of Computer Science Lecture 9 Sums And Asymptotics Computing Sums Asymptotics:

Wald Test Asymptotics of LRT Lecture 21 Biostatistics 602 - Statistical Inference . . . .

What is this talk about? Applied Asymptotics in R an R package bundle Examples of the use of

Asymptotics of radiation fields in asymptotically Minkowski spacetimes Dean Baskin joint with

Cauchy-Riemann (CR) Manifolds, Szeg o kernel asymptotics and Morse inequalities on CR-manifolds

Sum of matrix entries of representations of the symmetric group and its asymptotics Dario De

On Third-Order Asymptotics for DMCs Vincent Y. F. Tan Institute for Infocomm Research (I 2 R)

From the master equation to mean field game asymptotics Daniel Lacker Division of Applied

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Asymptotics of Pattern Classes of Set Partition and Permutation d -tuple Avoidance Benjamin Gunby

From the master equation to mean field game asymptotics Daniel Lacker Division of Applied

Exact asymptotics for linear processes Magda Peligrad University of Cincinnati October 2011

Higher order asymptotics from multivariate generating functions Mark C. Wilson, University of

Semantic Foundations for Probabilistic Programming Chris Heunen Ohad Kammar, Sam Staton, Frank

technique: assessing anthropogenic emissions of CO,NOx and CO2 and their impacts. J. Brioude

Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah

FEASIBLE JOINT POSTERIOR BELIEFS BAYESIAN COMMUNICATION N Receivers: POSTERIOR s 1 S 1 p

Thompson Sampling and Linear Bandits Instructor: Sham Kakade 1 Review The basic paradigm is as

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model Atlm

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning Ahmed Salem ,