Introduction to Bayesian Statistics
Lecture 3: Single Parameter (II)
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
March 11, 2015
Introduction to Bayesian Statistics Lecture 3: Single Parameter (II) - - PowerPoint PPT Presentation
Introduction to Bayesian Statistics Lecture 3: Single Parameter (II) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University March 11, 2015 Conjugate Prior Distributions Definition of Conjugacy: If F is a class of sampling
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
March 11, 2015
Definition of Conjugacy: If F is a class of sampling distributions p(y|θ), and P is a class of prior distributions for θ, then the class P is conjugate for F if p(θ|y) ∈ P for all p(·|θ) ∈ F and p(θ) ∈ P.
θ|y ∼ Beta(α + y, β + n − y).
2 of 14
Definition: The class F is an exponential family if all its members have the form p(yi|θ) = f (yi)g(θ)eφ(θ)T u(yi), where φ(θ): the “natural parameter” of the family F. Exercise: Show that the binomial(n, θ) is an exponential family with natural parameter logit(θ), and the conjugate prior on θ are Beta distributions.
3 of 14
p(y|θ) = n
f (yi)
n
u(yi))
g(θ)nexp
where t(y) = n
i=1 u(yi): sufficient statistic for θ
p(θ) ∝ g(θ)ηexp
p(θ|y) ∝ g(θ)η+nexp
4 of 14
θ.
p(θ) ∝ exp
2τ 2 (θ − µ0)2
1 √ 2πσexp
2σ2 (y − θ)2
p(θ|y) ∝ p(θ)p(y|θ) ∝ exp
2 (y − θ)2 σ2 + (θ − µ0)2 τ 2
exp
2τ 2
1
(θ − µ1)2
that is, θ|y ∼ normal(µ1, τ 2
1 ), where
µ1 =
1 τ2
µ0+ 1
σ2 y 1 τ2
+ 1
σ2
and
1 τ 2
1 =
1 τ 2
0 + 1
σ2 .
5 of 14
0 ), y ∼ normal(θ, σ2) ⇒ θ|y ∼ normal(µ1, τ 2 1 )
1 τ 2
1
τ 2
1 =
1 τ 2
0 + 1
σ2 , i.e., the posterior precision equals the prior precision
plus the data precision.
1 τ2
µ0+ 1
σ2 y 1 τ2
+ 1
σ2
, i.e., the posterior mean is a weighted average of the prior mean and the observed value y, with weights proportional to the precision.
µ1 = µ0 + (y − µ0)
τ 2 σ2+τ 2
0 .
data shrunk toward the prior mean: µ1 = y − (y − µ0)
σ2 σ2+τ 2
6 of 14
Posterior predictive distribution p(˜ y|y) p(˜ y|y) =
y|θ)p(θ|y)dθ ∝
2σ2 (˜ y − θ)2
2τ 2
1
(θ − µ1)2
y|y ∼ normal(?, ?)
y|y) = E(E(˜ y|θ, y)|y) = E(θ|y) = µ1
y|y) = E(var(˜ y|θ, y)|y) + var(E(˜ y|θ, y)|y) = E(σ2|y) + var(θ|y) = σ2 + τ 2
1 .
y|θ) = θ, var(˜ y|θ) = σ2
7 of 14
iid
∼ normal(θ, σ2), σ2 known, use Bayesian approach to estimate θ.
2τ 2
0 (θ − µ0)2
i=1 1 √ 2πσexp
2σ2 (yi − θ)2
p(θ|y) ∝ p(θ)p(y|θ) ∝ exp
2 n
i=1(yi − θ)2
σ2 + (θ − µ0)2 τ 2
exp
2τ 2
n
(θ − µn)2
that is, θ|y ∼ normal(µn, τ 2
n ), where
µn =
1 τ2
µ0+ n
σ2 ¯
y
1 τ2
+ n
σ2
and
1 τ 2
n =
1 τ 2
0 + n
σ2 .
8 of 14
iid
∼ normal(θ, σ2), σ2 known, θ ∼ normal(µ0, τ 2
0 )
⇒ θ|y ∼ normal(µn, τ 2
n)
1 τ 2
n = 1
τ 2
0 + n
σ2 ; posterior mean µn =
1 τ2
µ0+ n
σ2 ¯
y
1 τ2
+ n
σ2
the sample value ¯ y.
0 fixed, we have
p(θ|y) ≈ normal(θ|¯ y, σ2 n ).
¯ y|θ, σ2 ∼ normal(θ, σ2
n ) leads to the use of ¯
y ± 1.96 σ
√n as a 95%
confidence interval for θ.
9 of 14
A random sample of n students is drawn from a large population, and their weights are measured. The average weight of the n sampled students is ¯ y = 150 pounds. Assume the weights in the population are normally distributed with unknown mean θ and known standard deviation 20 pounds. Suppose your prior distribution for θ is normal with mean 180 and standard deviation 40. (a) Give your posterior distribution for θ. (b) A new student is sampled at random from the same population and has a weight of ˜ y pounds. Give a posterior predictive distribution for ˜ y. (c) For n = 10, give a 95% posterior interval for θ and a 95% posterior predictive interval for ˜ y. (d) Do the same for n = 100.
10 of 14
iid
∼ normal(θ, σ2), θ known, use Bayesian approach to estimate σ2.
p(y|σ2) =
n
1 √ 2πσ exp
2σ2 (yi − θ)2
σ−nexp
2σ2 (yi − θ)2
(σ2)− n
2 exp(− n
2σ2 v) where v = 1
n
n
i=1(yi − θ)2
p(σ2) ∝ (σ2)−(α+1)e− β
σ2
11 of 14
iid
∼ normal(θ, σ2), θ known, estimate σ2.
2 exp(− n
2σ2 v)2
p(σ2) ∝ (σ2)−(α+1)e− β
σ2 , i.e., σ2 ∼ Inv-χ2(ν0, σ2
0)
0 and ν0 degrees of
freedom:
σ2
0ν0
X
∼ χ2
ν0, i.e., X ∼ Inv-χ2(ν0, σ2 0)
p(σ2) ∝ p(σ2)p(y|σ2) ∝ σ2 σ2 ν0/2+1 exp
0ν0
2σ2
2 exp(−n
2 v σ2 ) ∝ (σ2)−((n+ν0)/2+1)exp
2σ2 (ν0σ2
0 + nv)
that is, σ2|y ∼ Inv-χ2 ν0 + n, ν0σ2
0+nv
ν0+n
12 of 14
Year Fatal Passenger Death Year Fatal Passenger Death accidents death rate accidents death rate 1976 24 734 0.19 1981 21 362 0.06 1977 25 516 0.12 1982 26 764 0.13 1978 31 754 0.15 1983 20 809 0.13 1979 31 877 0.16 1984 16 223 0.03 1980 22 814 0.14 1985 22 1066 0.15
(a) Assume that the number of fatal accidents in each year are independent with a Poisson(θ) distribution. Set a prior distribution for θ and determine the posterior distribution based on the data from 1976 through 1985. Under this model, give a 95% predictive interval for the number of fatal accident in 1986. You can use normal approximation to the gamma and Poisson or compute using simulation. (b) Repeat (a) above, replacing ‘fatal accidents’ with ‘passenger deaths’.
13 of 14
(a) Suppose y|θ is exponentially distributed with rate θ, and the marginal (prior) distribution of θ is Gamma(α, β). Suppose we observe that y ≥ 100, but do not observe the exact value of y. What is the posterior distribution, p(θ|y ≥ 100), as a function of α and β? Write down the posterior mean and variance of θ. (b) In the above problem, suppose that we are now told that y is exactly
14 of 14