parameter estimation cont
play

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 1 / 32 Outline Normal model, unknown mean Jeffreys prior Natural


  1. Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 1 / 32

  2. Outline Normal model, unknown mean Jeffreys prior Natural conjugate prior Posterior Normal model, unknown variance Jeffreys prior Natural conjugate prior Posterior JAGS/Stan Binomial model, unknown probability Normal model, unknown mean Normal model, unknown variance Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 2 / 32

  3. Normal model, unknown mean Jeffreys prior for µ Jeffreys prior for µ Theorem iid ∼ N ( µ, s 2 ) ( s 2 known), Jeffreys prior for µ is p ( µ ) ∝ 1 . If Y i Proof. Since the normal distribution with unknown mean is an exponential family, use Casella & Berger Lemma 7.3.11 � ∂ 2 � � ∂ 2 i =1 ( y i − µ ) 2 �� � − log(2 πs 2 ) / 2 − 1 � n − E y ∂µ 2 log p ( y | µ ) = − E y ∂µ 2 2 s 2 � i − 2 µny + nµ 2 ��� ∂ 2 � �� n − log(2 πs 2 ) / 2 − 1 i =1 y 2 = − E y ∂µ 2 2 s 2 � � �� ∂ 1 = − E y − 2 s 2 ( − 2 ny + 2 nµ ) ∂µ � � 1 = − E y − 2 s 2 (2 n ) = n/s 2 � � n/s 2 p ( µ ) ∝ |I ( µ ) | = ∝ 1 So Jeffreys prior for µ is p ( µ ) ∝ 1 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 3 / 32

  4. Normal model, unknown mean Jeffreys prior for µ Posterior propriety � ∞ Since −∞ 1 dµ is not finite, we need to check posterior propriety. Theorem For n > 0 , the posterior for a normal mean (known variance) using Jeffreys prior is proper. Proof. The posterior is p ( µ | y ) ∝ p ( y | µ ) p ( µ ) � n � − 1 i =1 ( y i − µ ) 2 � ∝ exp × 1 2 s 2 � − 1 � − 2 µny + nµ 2 �� ∝ exp 2 s 2 � µ 2 − 2 µy �� 1 � = exp − . 2 s 2 /n This is the kernel of a normal distribution with mean y and variance s 2 /n which is proper if n > 0 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 4 / 32

  5. Normal model, unknown mean Natural conjugate prior Natural conjugate prior iid ∼ N ( µ, s 2 ) with s 2 known. The likelihood is Let Y i � �� µ 2 − 2 µy 1 � L ( µ ) = exp − 2 s 2 /n � n s 2 µ 2 − 2 µ n � − 1 �� ∝ exp s 2 y 2 This is the kernel of a normal distribution, so the natural conjugate prior is µ ∼ N ( m, C ) . p ( µ | y ) ∝ p ( y | µ ) p ( µ ) = L ( µ ) p ( µ ) � n � 1 s 2 µ 2 − 2 µ n C µ 2 − 2 µ 1 − 1 − 1 � �� � �� = exp s 2 y exp C m 2 �� 1 � 1 2 µ 2 − 2 µ � − 1 C + n � C m + n ��� = exp s 2 y s 2 2 � � ��� � 1 µ 2 − 2 µ 1 1 C m + n = exp − s 2 y − 1 ( 1 C + n s 2 ) 2 ( 1 C + n s 2 ) This is the kernel of a N ( m ′ , C ′ ) where C ′ = C − 1 + n/s 2 � − 1 m ′ = C ′ � C − 1 m + n/s 2 y � � Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 5 / 32

  6. Normal model, unknown mean Natural conjugate prior Normal mean posterior comments Let P = 1 /C , P ′ = 1 /C ′ , and Q = 1 /s 2 be the relevant precisions (inverse variances), then The posterior precision is the sum of the prior and observation precisions. n P ′ = P + � Q = P + nQ. i =1 The posterior mean is a precision weighted average of the prior and data. 1 m ′ = P ′ [ Pm + nQy ] = P P ′ m + n Q P ′ y P ′ m + � n Q = P P ′ y i i =1 Jeffreys prior/posterior are the limits of the conjugate prior/posterior as C → ∞ , i.e. C →∞ N ( m, C ) d C →∞ N ( m ′ , C ′ ) d → N ( y, s 2 /n ) lim → ∝ 1 lim Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 6 / 32

  7. Normal model, unknown mean Natural conjugate prior Example ind Consider Y i ∼ N ( µ, 1) and µ ∼ N (0 , 1) . # Prior m = 0 C = 1; P = 1/C # Data mu = 1 s2 = 1; Q = 1/s2 n = 3 set.seed(6); (y = rnorm(n,mu,sqrt(1/Q))) [1] 1.2696060 0.3700146 1.8686598 # Posterior nQ = n*Q Pp = P+nQ mp = (P*m+nQ*mean(y))/Pp Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 7 / 32

  8. Normal model, unknown mean Natural conjugate prior Normal model with unknown mean, normal prior 0.8 Prior Posterior Likelihood 0.6 Density 0.4 0.2 0.0 −2 −1 0 1 2 3 4 µ Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 8 / 32

  9. Normal model, unknown variance Theorem iid ∼ N ( m, σ 2 ) ( m known), Jeffreys prior for σ 2 is p ( σ 2 ) ∝ 1 /σ 2 . If Y i Proof. Since the normal distribution with unknown variance is an exponential family, use Casella & Berger Lemma 7.3.11. � ∂ 2 � � ∂ 2 � ∂ ( σ 2)2 log p ( y | σ 2 ) ∂ ( σ 2)2 − n log(2 πσ 2 ) / 2 − 1 � n i =1 ( y i − m ) 2 − E y = − E y 2 σ 2 � � i =1 ( y i − m ) 2 ∂ n 1 � n = − E y ∂ ( σ 2) − 2 σ 2 + 2( σ 2)2 � � n 1 � n i =1 ( y i − m ) 2 = − E y 2( σ 2)2 − ( σ 2)3 ( σ 2)3 σ 2 n n = − 2( σ 2)2 + = n 2 ( σ 2 ) − 2 p ( σ 2 ) |I ( σ 2 ) | ∝ 1 /σ 2 � ∝ So Jeffreys prior is p ( σ 2 ) ∝ 1 /σ 2 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 9 / 32

  10. Normal model, unknown variance Posterior propriety � ∞ 0 1 /σ 2 dσ 2 is not finite, we need to check posterior propriety. Since Theorem For n > 0 and at least one y i � = m , the posterior for a normal variance (known mean) using Jeffreys prior is proper. Proof. The posterior is p ( σ 2 | y ) ∝ p ( y | σ 2 ) p ( σ 2 ) = (2 πσ 2 ) − n/ 2 exp � n � − 1 i =1 [ y i − m ] 2 � ( σ 2 ) − 1 2 σ 2 ∝ ( σ 2 ) − n/ 2 − 1 exp � n � − 1 i =1 [ y i − m ] 2 � 2 σ 2 This is the kernel of an inverse gamma distribution with shape n/ 2 and scale � n i =1 [ y i − m ] 2 / 2 which will be proper so long as n > 0 and at least one y i � = m . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 10 / 32

  11. Normal model, unknown variance Natural conjugate prior iid ∼ N ( m, σ 2 ) with m known. The likelihood is Let Y i ∝ ( σ 2 ) − n/ 2 exp � n L ( σ 2 ) � 1 i =1 [ y i − m ] 2 � − 2 σ 2 This is the kernel of an inverse gamma distribution, so the natural conjugate prior is IG ( a, b ) . p ( σ 2 | y ) ∝ p ( y | σ 2 ) p ( σ 2 ) = ( σ 2 ) − n/ 2 exp ( σ 2 ) − a − 1 exp( − b/σ 2 ) � n 1 i =1 [ y i − m ] 2 � � − 2 σ 2 = ( σ 2 ) − ( a + n/ 2) − 1 exp b + � n − 1 � � i =1 [ y i − m ] 2 / 2 �� σ 2 This is the kernel of an inverse gamma distribution with shape a + n/ 2 and scale b + � n i =1 [ y i − m ] 2 / 2 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 11 / 32

  12. Normal model, unknown variance Example ind ∼ N (1 , σ 2 ) and σ 2 ∼ IG (1 , 1) . Suppose Y i # Prior a = b = 1 # Data m = 1 n = length(y) y [1] 1.2696060 0.3700146 1.8686598 # Posterior ap = a+n/2 (bp = b+sum((y-m)^2)/2) [1] 1.612069 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 12 / 32

  13. Normal model, unknown variance Normal model with unknown variance, inverse gamma prior Prior 1.0 Posterior Likelihood 0.8 Density 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 σ 2 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 13 / 32

  14. Normal model, unknown variance Summary Summary Suppose Y i ∼ N ( µ, σ 2 ) . µ unknown ( σ 2 known) Jeffreys prior: p ( µ ) ∝ 1 (think of this as N (0 , ∞ ) ) Natural conjugate prior: N ( m, C ) Posterior N ( m ′ , C ′ ) with C ′ = [1 /C + nσ − 2 ] − 1 m ′ = C ′ [ m/C + nσ − 2 y ] σ 2 unknown ( µ known) Jeffreys prior: p ( σ 2 ) ∝ 1 /σ 2 (think of this as IG (0 , 0) ) Natural conjugate prior IG ( a, b ) a + n/ 2 , b + � n � i =1 ( y i − µ ) 2 / 2 � Posterior IG Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 14 / 32

  15. JAGS JAGS Just another Gibbs sampler (JAGS) “is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS.” We will use JAGS through its R interface rjags . The basic workflow when using rjags is 1. Define model and priors in a string 2. Assign data 3. Run JAGS, i.e. simulate from the posterior 4. Summarize as necessary, e.g. mean, median, credible intervals, etc Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 15 / 32

  16. JAGS Binomial model Binomial model Let Y ∼ Bin ( n, θ ) and θ ∼ Be (1 , 1) and we observe y = 3 successes out of n = 10 attempts. model = " model { y ~ dbin(theta,n) # notice p then n theta ~ dbeta(a,b) } " dat = list(n=10, y=3, a=1, b=1) m = jags.model(textConnection(model), dat) Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 1 Unobserved stochastic nodes: 1 Total graph size: 5 Initializing model r = coda.samples(m, "theta", n.iter=1000) Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 16 / 32

  17. JAGS Binomial model Binomial model summary(r) Iterations = 1001:2000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE 0.324050 0.122776 0.003883 0.004139 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% 0.1110 0.2337 0.3103 0.4091 0.5760 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 17 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend