Introduction to Bayesian Statistics
Lecture 9: Hierarchical Models
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
May 6, 2015
Introduction to Bayesian Statistics Lecture 9: Hierarchical Models - - PowerPoint PPT Presentation
Introduction to Bayesian Statistics Lecture 9: Hierarchical Models Rung-Ching Tsai Department of Mathematics National Taiwan Normal University May 6, 2015 Example Data: Weekly weights of 30 young rats (Gelfand, Hills, Racine-Poon, &
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
May 6, 2015
Racine-Poon, & Smith, 1990).
Day 8 15 22 29 36 Rat 1 151 199 246 283 320 Rat 2 145 199 249 293 354 · · · Rat 30 153 200 244 286 324
Yij = α + βxj + ǫij, where Yij: weight of i-th rat on day xj; ǫij ∼ Normal(0, σ2)
2 of 22
power plant pumps (George, Makov, & Smith, 1993).
Pump 1 2 3 4 5 6 7 8 9 10 time 94.5 15.7 62.9 126 5.24 31.4 1.05 1.05 2.1 10.5 failure 5 1 5 14 3 19 1 1 4 22
Xij ∼ Poisson(λti) where Xij is the number of power failures, λ is the failure rate, and ti is the length of operation time of pump i (in 1000s of hours).
pumps in this model?
3 of 22
a common failure rate for all the power plant pumps may not be suitable.
are likely to overfit the data. Some information about the parameters of one rat or one pump can be obtained from
4 of 22
are samples from a common population distribution. The distribution of observed outcomes are conditional on parameters which themselves have a probability specification, known as a hierarchical or multilevel model.
distribution of the parameters are called hyperparameters.
the population distribution of (αi, βi) rather than each (αi, βi) separately.
5 of 22
hyperparameter
The hyperprior distribution at highest level is often chosen to be non-informative
hyperparameter φ: p(θ, φ|y) ∝ p(θ, φ)p(y|θ, φ) = p(θ, φ)p(y|θ) ∝ p(φ)p(θ|φ)p(y|θ)
y
6 of 22
p(θ, φ|y) ∝ p(φ)p(θ|φ)p(y|θ)
φ: p(θ|φ, y)
p(φ|y) =
p(φ|y) = p(θ, φ|y) p(θ|φ, y).
7 of 22
distribution of θ and φ: p(θ, φ|y)
y from the posterior predictive distribution given the drawn θ
8 of 22
0/20 0/20 0/20 0/20 0/20 0/20 0/20 0/19 0/19 0/19 0/19 0/18 0/18 0/17 1/20 1/20 1/20 1/20 1/19 1/19 1/18 1/18 2/25 2/24 2/23 2/20 2/20 2/20 2/20 2/20 2/20 1/10 5/49 2/19 5/46 3/27 2/17 7/49 7/47 3/20 3/20 2/13 9/48 10/50 4/20 4/20 4/20 4/20 4/20 4/20 4/20 10/48 4/19 4/19 4/19 5/22 11/46 12/49 5/20 5/20 6/23 5/19 6/22 6/20 6/20 6/20 16/52 15/47 15/46 9/24
9 of 22
the hyperparameters.
hyperparameters α and β: p(θ, α, β|y) ∝ p(α, β)p(θ|α, β)p(y|θ, α, β) ∝ p(α, β)
J
Γ(α + β) Γ(α)Γ(β)θα−1
j
(1 − θj)β−1
J
θyi
j (1 − θj)nj−yj
10 of 22
p(θ, α, β|y) ∝ p(α, β)
J
Γ(α + β) Γ(α)Γ(β)θα−1
j
(1 − θj)β−1
J
θyi
j (1 − θj)nj−yj
p(θ|α, β, y) =
J
Γ(α + β + nj) Γ(α + yj)Γ(β + nj − yj)θα+yj−1
j
(1 − θj)β+nj−yj−1
p(α, β|y) = p(θ, α, β|y) p(θ|α, β, y) ∝ p(α, β)
J
Γ(α + β) Γ(α)Γ(β) Γ(α + yj)Γ(β + nj − yj) Γ(α + β + nj)
11 of 22
α α+β ) = log( α β ), log(α + β)
NO GOOD because it leads to improper posterior.
α+β , α + β
NO GOOD because the posterior density is not integrable in the limit.
α + β , (α + β)−1/2
⇐ ⇒ p(α, β) ∝ (α + β)−5/2 ⇐ ⇒ p
β ), log(α + β)
OK because it leads to proper posterior.
12 of 22
β ), log(α + β)
β ), log(α + β)
β ), log(α + β)
function over a grid and setting total probability in the grid to 1.
(log( α
β ), log(α + β)). For example, E(α|y) is estimated by
β ),log(α+β)
= αp(log(α β ), log(α + β)|y)
13 of 22
β ), log(α + β)) from their posterior
distribution using the discrete-grid sampling procedure.
β ), log(α + β)) to the scale of (α, β)
to yield a draw of the hyperparameters from their marginal posterior distribution.
distribution θj|α, β, y ∼ Beta(α + yj, β + nj − yj).
14 of 22
nj , towards
the population distribution, with approximate mean.
posterior variances.
analysis, reflecting posterior uncertainty in the hyperparameters.
15 of 22
yij|θj ∼ Normal(θj, σ2), i = 1, · · · , nj, j = 1, 2, · · · , J. σ2 known
are the hyperparameters. That is, p(θ1, · · · , θJ|µ, τ) =
J
N(θj|µ, τ 2)
[N(θj|µ, τ 2)]p(µ, τ)d(µ, τ).
16 of 22
p(µ, τ) = p(µ|τ)p(τ) ∝ p(τ)
hyperparameters µ and τ: p(θ, µ, τ|y) ∝ p(µ, τ)p(θ|µ, τ)p(y|θ) ∝ p(µ, τ)
J
N(θj|µ, τ 2)
J
N(¯ y.j|θj, σ2/nj)
17 of 22
θj, Vj), where
θj =
nj σ2 ¯
y.j + 1
τ 2 µ nj σ2 + 1 τ 2
1
nj σ2 + 1 τ 2
18 of 22
p(µ, τ|y) ∝ p(µ, τ)p(y|µ, τ) ¯ y.j|µ, τ ∼ Normal(µ, σ2 nj + τ 2) Therefore, p(µ, τ|y) ∝ p(µ, τ)
J
N(¯ y.j|µ, σ2 nj + τ 2)
19 of 22
p(µ, τ|y) = p(µ|τ, y)p(τ|y) ⇒ p(µ|τ, y) = p(µ, τ|y) p(τ|y) Therefore, µ|τ, y ∼ Normal(ˆ µ, Vµ), where ˆ µ = J
j=1 1
σ2 nj +τ 2 ¯
y.j J
j=1 1
σ2 nj +τ 2
and V −1
µ
=
J
1
σ2 nj + τ 2
20 of 22
p(τ|y) = p(µ, τ|y) p(µ|τ, y ∝ p(τ) J
j=1 N(¯
y.j|µ, σ2
nj + τ 2)
N(µ|ˆ µ, Vµ) ∝ p(τ) J
j=1 N(¯
y.j|ˆ µ, σ2
nj + τ 2)
N(ˆ µ|ˆ µ, Vµ) ∝ p(τ)V 1/2
µ J
(σ2 nj + τ 2)−1/2exp − (¯ y.j − ˆ µ)2 2( σ2
nj + τ 2)
21 of 22
p(τ|y) = p(µ, τ|y) p(µ|τ, y ∝ p(τ) J
j=1 N(¯
y.j|µ, σ2
nj + τ 2)
N(µ|ˆ µ, Vµ) ∝ p(τ) J
j=1 N(¯
y.j|ˆ µ, σ2
nj + τ 2)
N(ˆ µ|ˆ µ, Vµ) ∝ p(τ)V 1/2
µ J
(σ2 nj + τ 2)−1/2exp − (¯ y.j − ˆ µ)2 2( σ2
nj + τ 2)
22 of 22