Introduction to Bayesian Statistics
Lecture 7: Multiparameter models (III)
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
April 15, 2015
Introduction to Bayesian Statistics Lecture 7: Multiparameter models - - PowerPoint PPT Presentation
Introduction to Bayesian Statistics Lecture 7: Multiparameter models (III) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University April 15, 2015 Multiparameter model: the multinomial model y = ( y 1 , , y J )
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
April 15, 2015
j=1 yj = n,
use Bayesian approach to estimate θ = (θ1, · · · , θJ). i.e.,
p(y|θ) ∝
J
θ
yj j
Dirichlet(α1, · · · , αJ), for θ: p(θ|α) ∝
J
θ
αj −1 j
with
J
θj = 1. where Dirichlet is a multivariate generalization of the beta distribution.
p(θ|y) = p(θ)p(y|θ) ∝
J
θ
αj +yj −1 j
, i.e., θ|y ∼ Dirichlet(α1 + y1, · · · , αJ + yJ)
2 of 13
iid
∼ MVN(µ, Σ), Σ known, use Bayesian approach to estimate µ.
p(µ) ∝ |Λ0|−1/2exp
2(µ − µ0)TΛ−1
0 (µ − µ0)
p(y1, · · · , yn|µ, Σ) ∝ |Σ|−n/2exp
2
n
(yi − µ)TΣ−1(yi − µ)
|Σ|−n/2exp
2tr(Σ−1S0)
i=1(yi − µ)(yi − µ)T
3 of 13
iid
∼ MVN(µ, Σ), Σ known, use Bayesian approach to estimate µ.
p(µ|y1, · · · , yn, Σ) ∝ p(µ)p(y1, · · · , yn|µ) ∝ |Σ|−n/2exp(−1 2[(µ − µ0)TΛ−1
0 (µ − µ0)
+
n
(yi − µ)TΣ−1(yi − µ)]) ∝ exp
2(µ − µn)TΛ−1
n (µ − µn)
µn = (Λ−1 + nΣ−1)−1(Λ−1
0 µ0 + nΣ−1¯
y) and Λ−1
n
= Λ−1 + nΣ−1
4 of 13
µn = (Λ−1 + nΣ−1)−1(Λ−1
0 µ0 + nΣ−1¯
y) and Λ−1
n
= Λ−1 + nΣ−1
µ(1) µ(2)
n
µ(2)
n
n
Λ(12)
n
Λ(21)
n
Λ(22)
n
p(µ(1)|y1, · · · , yn, Σ) ∼ MVN(µ(1)
n , Λ(11) n
)
p(µ(1)|µ(2), y1, · · · , yn, Σ) ∼ MVN(µ(1)
n
+ β1|2(µ(2) − µ(2)
n ), Λ1|2)
where β1|2 = Λ(12)
n
(Λ(22)
n
)−1, and Λ1|2 = Λ(11)
n
− Λ(12)
n
(Λ(22)
n
)−1Λ(21)
n
.
5 of 13
µn = (Λ−1 + nΣ−1)−1(Λ−1
0 µ0 + nΣ−1¯
y) and Λ−1
n
= Λ−1 + nΣ−1
y ∼ MVN(µ, Σ), new observation.
y, Σ known p(˜ y, µ|y1, · · · , yn) = N(˜ y|µ, Σ)N(µ|µn, Λn) is the exponential of a quadratic form in (˜ y, µ), hence ˜ y ∼ N(µn, Σ + Λn) where E(˜ y|y) = E(E(˜ y|µ, y)|y) = E(µ|y) = µn var(˜ y|y) = E(Var(˜ y|µ, y)|y) + var(E(˜ y|µ, y)|y)) = E(Σ|y) + var(µ|y) = Σ + Λn
6 of 13
iid
∼ MVN(µ, Σ), Σ known, use Bayesian approach to estimate µ.
p(y1, · · · , yn|µ, Σ) ∝ |Σ|−n/2exp
2
n
(yi − µ)TΣ−1(yi − µ)
|Σ|−n/2exp
2tr(Σ−1S0)
i=1(yi − µ)(yi − µ)T
p(µ|y1, · · · , yn, Σ) ∝ p(µ)p(y1, · · · , yn|µ, Σ) ∝ p(y1, · · · , yn|µ, Σ), i.e., µ|Σ, y1, · · · , yn ∼ MVN(¯ y, Σ n ).
7 of 13
iid
∼ MVN(µ, Σ), both µ and Σ known, use Bayesian approach to estimate µ.
Σ ∼ Inv-Wishartν0(Λ−1
0 )
µ|Σ ∼ MVN(µ0, Σ/κ0) i.e., the joint prior density p(µ, Σ)
p(µ, Σ) ∝ |Σ|−((ν0+d)/2+1)exp
2tr(Λ0Σ−1) − κ0 2 (µ − µ0)TΣ−1(µ − µ0)
We label this the N-Inverse-Wishart(µ0, Λ0/κ0; ν0, Λ0)
p(y1, · · · , yn|µ, Σ) ∝ |Σ|−n/2exp
2tr(Σ−1S0)
i=1(yi − µ)(yi − µ)T 8 of 13
iid
∼ MVN(µ, Σ)
p(µ, Σ|y1, · · · , yn) ∝ p(µ, Σ)p(y1, · · · , yn|µ, Σ) ∝ |Σ|− (ν0+d)
2
+1exp
2tr(Λ0Σ−1) − κ0 2 (µ − µ0)TΣ−1(µ − µ0)
|Σ|−n/2exp
2tr(Σ−1S0)
N-Inv-Wishart(µn, Λn/κn; νn, Λn). (1) where
κ0 κ0+nµ0 + n κ0+n¯
y
κ0n κ0+n(¯
y − µ0)(¯ y − µ0)T with S = n
i=1(yi − ¯
y)(yi − ¯ y)T
9 of 13
the joint posterior density (1) with Σ held constant, µ|Σ, y1, · · · , yn ∼ MVN(µn, Σ κn )
10 of 13
p(µ, Σ|y1, · · · , yn) over µ, as a result, we have Σ|y1, · · · , yn ∼ Inv-Wishartνn(Λ−1
n )
where Λn = Λ0 + S +
κ0n κ0+n(¯
y − µ0)(¯ y − µ0)T with S = n
i=1(yi − ¯
y)(yi − ¯ y)T
11 of 13
multivariate case: µ|y1, · · · , yn ∼ tνn−d+1(µn, Λn/(κn(νn − d + 1))) where
κ0 κ0+nµ0 + n κ0+n¯
y
κ0n κ0+n(¯
y − µ0)(¯ y − µ0)T with S = n
i=1(yi − ¯
y)(yi − ¯ y)T
Σ|y1, · · · , yn ∼ Inv-Wishartνn(Λ−1
n ),
µ|Σ, y1, · · · , yn ∼ MVN(µn, Σ κn ).
12 of 13
iid
∼ MVN(µ, Σ), both µ and Σ known, use Bayesian approach to estimate µ.
p(µ, Σ) ∝ |Σ|−(d+1)/2, which is the limit of the conjugate prior density as κ0 → 0, ν0 → −1, |Λ0| → 0.
Σ|y1, · · · , yn ∼ Inv-Wishartn−1(S), µ|Σ, y1, · · · , yn ∼ MVN(¯ y, Σ n ).
µ|y1, · · · , yn ∼ tn−d(¯ y, S/(n(n − d))).
13 of 13