data asymptotics

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University - PowerPoint PPT Presentation

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 1 / 18 Normal approximation to the posterior Normal approximation to the posterior Suppose p (


  1. Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 1 / 18

  2. Normal approximation to the posterior Normal approximation to the posterior Suppose p ( θ | y ) is unimodal and roughly symmetric, then a Taylor series expansion of the logarithm of the posterior around the posterior mode ˆ θ is − d 2 θ | y ) − 1 � � log p ( θ | y ) = log p (ˆ 2( θ − ˆ θ ) ⊤ ( θ − ˆ dθ 2 log p ( θ | y ) θ ) + · · · θ =ˆ θ where the linear term in the expansion is zero because the derivative of the log-posterior density is zero at its mode. Discarding the higher order terms, this expansion provides a normal approximation to the posterior, i.e. d ≈ N (ˆ θ, J(ˆ θ ) − 1 ) p ( θ | y ) where J(ˆ θ ) is the sum of the prior and observed information, i.e. θ ) = − d 2 θ − d 2 J(ˆ dθ 2 log p ( θ ) | θ =ˆ dθ 2 log p ( y | θ ) | θ =ˆ θ . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 2 / 18

  3. Normal approximation to the posterior Example Binomial probability Let y ∼ Bin ( n, θ ) and θ ∼ Be ( a, b ) , then θ | y ∼ Be ( a + y, b + n − y ) and the posterior mode is θ = y ′ a + y − 1 ˆ n ′ = a + b + n − 2 . Thus n ′ J(ˆ θ ) = . θ (1 − ˆ ˆ θ ) Thus � � θ (1 − ˆ ˆ θ ) d ˆ p ( θ | y ) ≈ N θ, . n ′ Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 3 / 18

  4. Normal approximation to the posterior Example Binomial probability a = b = 1 n = 10 y = 3 par(mar=c(5,4,0.5,0)+.1) curve(dbeta(x,a+y,b+n-y), lwd=2, xlab=expression(theta), ylab=expression(paste("p(", theta,"|y)"))) # Approximation yp = a+y-1 np = a+b+n-2 theta_hat = yp/np curve(dnorm(x,theta_hat, sqrt(theta_hat*(1-theta_hat)/np)), add=TRUE, col="red", lwd=2) legend("topright",c("True posterior","Normal approximation"), col=c("black","red"), lwd=2) Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 4 / 18

  5. Normal approximation to the posterior Example Binomial probability 3.0 True posterior Normal approximation 2.5 2.0 p( θ |y) 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 θ Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 5 / 18

  6. Large-sample theory Large-sample theory iid Consider a model y i ∼ p ( y | θ 0 ) for some true value θ 0 . Does the posterior distribution converge to θ 0 ? Does a point estimator (mode) converge to θ 0 ? What is the limiting posterior distribution? Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 6 / 18

  7. Large-sample theory Convergence of the posterior distribution Convergence of the posterior distribution iid Consider a model y i ∼ p ( y | θ 0 ) for some true value θ 0 . Theorem If the parameter space Θ is discrete and Pr ( θ = θ 0 ) > 0 , then Pr ( θ = θ 0 | y ) → 1 as n → ∞ . Theorem If the parameter space Θ is continuous and A is a neighborhood around θ 0 with Pr ( θ ∈ A ) > 0 , then Pr ( θ ∈ A | y ) → 1 as n → ∞ . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 7 / 18

  8. Large-sample theory Convergence of the posterior distribution library(smcUtils) theta = seq(0.1,0.9, by=0.1); theta0 = 0.3 n = 1000 y = rbinom(n, 1, theta0) p = matrix(NA, n,length(theta)) p[1,] = renormalize(dbinom(y[1],1,theta, log=TRUE), log=TRUE) for (i in 2:n) { p[i,] = renormalize(dbinom(y[i],1,theta, log=TRUE)+log(p[i-1,]), log=TRUE) } plot(p[,1], ylim=c(0,1), type="l", xlab="n", ylab="Probability") for (i in 1:length(theta)) lines(p[,i], col=i) legend("right", legend=theta, col=1:9, lty=1) 1.0 0.8 0.1 0.2 0.3 Probability 0.6 0.4 0.5 0.6 0.4 0.7 0.8 0.2 0.9 0.0 0 200 400 600 800 1000 n Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 8 / 18

  9. Large-sample theory Convergence of the posterior distribution a = b = 1 e = 0.05 p = rep(NA,n) for (i in 1:n) { yy = sum(y[1:i]) zz = i-yy p[i] = diff(pbeta(theta0+c(-e,e), a+yy, b+zz)) } plot(p, type="l", ylim=c(0,1), ylab="Posterior probability of neighborhood", xlab="n", main="Continuous parameter space") Continuous parameter space 1.0 Posterior probability of neighborhood 0.8 0.6 0.4 0.2 0.0 0 200 400 600 800 1000 n Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 9 / 18

  10. Large-sample theory Consistency of Bayesian point estimates Consistency of Bayesian point estimates iid Suppose y i ∼ p ( y | θ 0 ) where θ 0 is a particular value for θ . p Recall that an estimator is consistent, i.e. ˆ θ → θ 0 , if n →∞ P ( | ˆ lim θ − θ 0 | < ǫ ) = 1 . p Recall, under regularity conditions that ˆ θ MLE → θ 0 . If Bayesian estimators converge to the MLE, then they have the same properties. Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 10 / 18

  11. Large-sample theory Consistency of Bayesian point estimates Binomial example Consider y ∼ Bin ( n, θ ) with true value θ = θ 0 and prior θ ∼ Be ( a, b ) . Then θ | y ∼ Be ( a + y, b + n − y ) . Recall that ˆ θ MLE = y/n . The following estimators are all consistent a + y Posterior mean: a + b + n a + y − 1 / 3 Posterior median: ≈ a + b + n − 2 / 3 for α, β > 1 a + y − 1 Posterior mode: a + b + n − 2 since as n → ∞ , these all converge to ˆ θ MLE = y/n . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 11 / 18

  12. Large-sample theory Consistency of Bayesian point estimates a = b = 1 n = 1000 theta0 = 0.5 y = rbinom(n, 1, theta0) yy = cumsum(y) nn = 1:n plot(0,0, type="n", xlim=c(0,n), ylim=c(0,1), xlab="Number of flips", ylab="Estimates") abline(h=theta0) lines((a+yy)/(a+b+nn), col=2) lines((a+yy-1/3)/(a+b+nn-2/3), col=3) lines((a+yy-1)/(a+b+nn-2), col=4) legend("topright",c("Truth","Mean","Median","Mode"), col=1:4, lty=1) 1.0 Truth Mean Median 0.8 Mode 0.6 Estimates 0.4 0.2 0.0 0 200 400 600 800 1000 Number of flips Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 12 / 18

  13. Large-sample theory Consistency of Bayesian point estimates Normal example iid Consider Y i ∼ N ( θ, 1) with known and prior θ ∼ N ( c, 1) . Then � 1 n 1 � θ | y ∼ N n + 1 c + n + 1 y, n + 1 Recall that ˆ θ MLE = y . Since the posterior mean converges to the MLE, then the posterior mean (as well as the median and mode) are consistent. 10 9 Estimates 8 7 Truth 6 MLE Posterior mean 0 50 100 150 200 n Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 13 / 18

  14. Asymptotic normality Asymptotic normality Consider the Taylor series expansion of the log posterior − d 2 θ | y ) − 1 � � log p ( θ | y ) = log p (ˆ 2( θ − ˆ ( θ − ˆ θ ) ⊤ dθ 2 log p ( θ | y ) θ ) + R θ =ˆ θ where the linear term is zero because the derivative at the posterior mode ˆ θ is zero and R represents all higher order terms. With iid observations, the coefficient for the quadratic term can be written as n − d 2 θ = − d 2 d 2 � dθ 2 [log p ( θ | y )] θ =ˆ dθ 2 log p ( θ ) θ =ˆ θ − dθ 2 [log p ( y i | θ )] θ =ˆ θ i =1 where � − d 2 � E y dθ 2 [log p ( y i | θ )] θ =ˆ = I( θ 0 ) θ where I( θ 0 ) is the expected Fisher information and thus, by the LLN, the second term converges to n I( θ 0 ) . Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 14 / 18

  15. Asymptotic normality Asymptotic normality For large n , we have θ | y ) − 1 θ ) ⊤ [ n I( θ 0 )] ( θ − ˆ log p ( θ | y ) ≈ log p (ˆ 2( θ − ˆ θ ) where ˆ θ is the posterior mode. If ˆ θ → θ 0 as n → ∞ , I(ˆ θ ) → I( θ 0 ) as n → ∞ and we have � � − 1 θ ) ⊤ � � 2( θ − ˆ n I(ˆ ( θ − ˆ p ( θ | y ) ∝ exp θ ) θ ) . Thus, as n → ∞ � � θ, 1 d ˆ n I(ˆ θ ) − 1 θ | y → N Thus, the posterior distribution is asymptotically normal. Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 15 / 18

  16. Asymptotic normality Binomial example Suppose y ∼ Bin ( n, θ ) and θ ∼ Be ( a, b ) . a = b = 1 a = b = 10 a = b = 100 30 20 n = 10 10 0 30 Distribution 20 Density n = 100 Posterior 10 Normal approximation 0 30 20 n = 1000 10 0 0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00 x Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 16 / 18

  17. Asymptotic normality What can go wrong? What can go wrong? Not unique to Bayesian statistics Unidentified parameters Number of parameters increase with sample size Aliasing Unbounded likelihoods Tails of the distribution True sampling distribution is not p ( y | θ ) Unique to Bayesian statistics Improper posterior Prior distributions that exclude the point of convergence Convergence to the edge of the parameter space (prior) Jarad Niemi (STAT544@ISU) Data Asymptotics February 7, 2018 17 / 18

Recommend


More recommend