Introduction to Bayesian Statistics Lecture 7: Multiparameter models - - PowerPoint PPT Presentation

introduction to bayesian statistics
SMART_READER_LITE
LIVE PREVIEW

Introduction to Bayesian Statistics Lecture 7: Multiparameter models - - PowerPoint PPT Presentation

Introduction to Bayesian Statistics Lecture 7: Multiparameter models (III) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University April 15, 2015 Multiparameter model: the multinomial model y = ( y 1 , , y J )


slide-1
SLIDE 1

Introduction to Bayesian Statistics

Lecture 7: Multiparameter models (III)

Rung-Ching Tsai

Department of Mathematics National Taiwan Normal University

April 15, 2015

slide-2
SLIDE 2

Multiparameter model: the multinomial model

  • y = (y1, · · · , yJ)∼multinomial(n; θ1, · · · , θJ) with J

j=1 yj = n,

use Bayesian approach to estimate θ = (θ1, · · · , θJ). i.e.,

  • Likelihood:

p(y|θ) ∝

J

  • j=1

θ

yj j

  • Prior of θ: choose the conjugate prior of a Dirichlet distribution,

Dirichlet(α1, · · · , αJ), for θ: p(θ|α) ∝

J

  • j=1

θ

αj −1 j

with

J

  • j=1

θj = 1. where Dirichlet is a multivariate generalization of the beta distribution.

  • Posterior of θ

p(θ|y) = p(θ)p(y|θ) ∝

J

  • j=1

θ

αj +yj −1 j

, i.e., θ|y ∼ Dirichlet(α1 + y1, · · · , αJ + yJ)

2 of 13

slide-3
SLIDE 3

Multiparameter model: the multivariate normal model

  • y1, · · · , yn

iid

∼ MVN(µ, Σ), Σ known, use Bayesian approach to estimate µ.

  • choose a conjugate prior for µ, µ ∼ MVN(µ0, Λ0)

p(µ) ∝ |Λ0|−1/2exp

  • −1

2(µ − µ0)TΛ−1

0 (µ − µ0)

  • likelihood of µ:

p(y1, · · · , yn|µ, Σ) ∝ |Σ|−n/2exp

  • −1

2

n

  • i=1

(yi − µ)TΣ−1(yi − µ)

  • =

|Σ|−n/2exp

  • −1

2tr(Σ−1S0)

  • where S0 = n

i=1(yi − µ)(yi − µ)T

3 of 13

slide-4
SLIDE 4

Multiparameter model: multivariate normal, Σ known

  • y1, · · · , yn

iid

∼ MVN(µ, Σ), Σ known, use Bayesian approach to estimate µ.

  • find the posterior distribution of µ:

p(µ|y1, · · · , yn, Σ) ∝ p(µ)p(y1, · · · , yn|µ) ∝ |Σ|−n/2exp(−1 2[(µ − µ0)TΛ−1

0 (µ − µ0)

+

n

  • i=1

(yi − µ)TΣ−1(yi − µ)]) ∝ exp

  • −1

2(µ − µn)TΛ−1

n (µ − µn)

  • that is, p(µ|y1, · · · , yn, Σ) ∼ MVN(µn, Λn), where

µn = (Λ−1 + nΣ−1)−1(Λ−1

0 µ0 + nΣ−1¯

y) and Λ−1

n

= Λ−1 + nΣ−1

4 of 13

slide-5
SLIDE 5

Multiparameter model: multivariate normal, Σ known

  • p(µ|y1, · · · , yn, Σ) ∼ MVN(µn, Λn), where

µn = (Λ−1 + nΣ−1)−1(Λ−1

0 µ0 + nΣ−1¯

y) and Λ−1

n

= Λ−1 + nΣ−1

  • Let µ =

µ(1) µ(2)

  • , µn =
  • µ(1)

n

µ(2)

n

  • and Λn =
  • Λ(11)

n

Λ(12)

n

Λ(21)

n

Λ(22)

n

  • .
  • posterior marginal distribution of subvectors of µ:

p(µ(1)|y1, · · · , yn, Σ) ∼ MVN(µ(1)

n , Λ(11) n

)

  • posterior conditional distribution of subvectors of µ:

p(µ(1)|µ(2), y1, · · · , yn, Σ) ∼ MVN(µ(1)

n

+ β1|2(µ(2) − µ(2)

n ), Λ1|2)

where β1|2 = Λ(12)

n

(Λ(22)

n

)−1, and Λ1|2 = Λ(11)

n

− Λ(12)

n

(Λ(22)

n

)−1Λ(21)

n

.

5 of 13

slide-6
SLIDE 6

Multiparameter model: multivariate normal, Σ known

  • p(µ|y1, · · · , yn, Σ) ∼ MVN(µn, Λn), where

µn = (Λ−1 + nΣ−1)−1(Λ−1

0 µ0 + nΣ−1¯

y) and Λ−1

n

= Λ−1 + nΣ−1

  • Let ˜

y ∼ MVN(µ, Σ), new observation.

  • posterior predictive distribution of ˜

y, Σ known p(˜ y, µ|y1, · · · , yn) = N(˜ y|µ, Σ)N(µ|µn, Λn) is the exponential of a quadratic form in (˜ y, µ), hence ˜ y ∼ N(µn, Σ + Λn) where E(˜ y|y) = E(E(˜ y|µ, y)|y) = E(µ|y) = µn var(˜ y|y) = E(Var(˜ y|µ, y)|y) + var(E(˜ y|µ, y)|y)) = E(Σ|y) + var(µ|y) = Σ + Λn

6 of 13

slide-7
SLIDE 7

Multiparameter model: multivariate normal, Σ known

  • y1, · · · , yn

iid

∼ MVN(µ, Σ), Σ known, use Bayesian approach to estimate µ.

  • prior for µ: choose a non-informative prior, p(µ) ∼ 1
  • likelihood of µ:

p(y1, · · · , yn|µ, Σ) ∝ |Σ|−n/2exp

  • −1

2

n

  • i=1

(yi − µ)TΣ−1(yi − µ)

  • =

|Σ|−n/2exp

  • −1

2tr(Σ−1S0)

  • where S0 = n

i=1(yi − µ)(yi − µ)T

  • posterior for µ:

p(µ|y1, · · · , yn, Σ) ∝ p(µ)p(y1, · · · , yn|µ, Σ) ∝ p(y1, · · · , yn|µ, Σ), i.e., µ|Σ, y1, · · · , yn ∼ MVN(¯ y, Σ n ).

7 of 13

slide-8
SLIDE 8

Multivariate normal model, Σ unknown

  • y1, · · · , yn

iid

∼ MVN(µ, Σ), both µ and Σ known, use Bayesian approach to estimate µ.

  • take a conjugate prior for (µ, Σ): p(µ, Σ) = p(Σ)p(µ|Σ)

Σ ∼ Inv-Wishartν0(Λ−1

0 )

µ|Σ ∼ MVN(µ0, Σ/κ0) i.e., the joint prior density p(µ, Σ)

p(µ, Σ) ∝ |Σ|−((ν0+d)/2+1)exp

  • −1

2tr(Λ0Σ−1) − κ0 2 (µ − µ0)TΣ−1(µ − µ0)

  • .

We label this the N-Inverse-Wishart(µ0, Λ0/κ0; ν0, Λ0)

  • likelihood:

p(y1, · · · , yn|µ, Σ) ∝ |Σ|−n/2exp

  • −1

2tr(Σ−1S0)

  • where S0 = n

i=1(yi − µ)(yi − µ)T 8 of 13

slide-9
SLIDE 9

Joint posterior distribution, p(µ, Σ|y1, · · · , yn)

  • y1, · · · , yn

iid

∼ MVN(µ, Σ)

  • prior of (µ, Σ): µ, Σ ∼ N-Inverse-Wishart(µ0, Λ0/κ0; ν0, Λ0)
  • the joint posterior distribution of (µ, Σ):

p(µ, Σ|y1, · · · , yn) ∝ p(µ, Σ)p(y1, · · · , yn|µ, Σ) ∝ |Σ|− (ν0+d)

2

+1exp

  • −1

2tr(Λ0Σ−1) − κ0 2 (µ − µ0)TΣ−1(µ − µ0)

  • ×

|Σ|−n/2exp

  • −1

2tr(Σ−1S0)

  • =

N-Inv-Wishart(µn, Λn/κn; νn, Λn). (1) where

  • µn =

κ0 κ0+nµ0 + n κ0+n¯

y

  • κn = κ0 + n
  • νn = ν0 + n
  • Λn = Λ0 + S +

κ0n κ0+n(¯

y − µ0)(¯ y − µ0)T with S = n

i=1(yi − ¯

y)(yi − ¯ y)T

9 of 13

slide-10
SLIDE 10

Conditional posterior distribution, p(µ|Σ, y1, · · · , yn)

  • p(µ, Σ|y1, · · · , yn) = p(µ|Σ, y1, · · · , yn)p(Σ|y1, · · · , yn)
  • the conditional posterior density of µ given Σ is proportional to

the joint posterior density (1) with Σ held constant, µ|Σ, y1, · · · , yn ∼ MVN(µn, Σ κn )

10 of 13

slide-11
SLIDE 11

Marginal posterior distribution, p(Σ|y1, · · · , yn)

  • p(µ, Σ|y1, · · · , yn) = p(µ|Σ, y1, · · · , yn)p(Σ|y1, · · · , yn)
  • p(Σ|y1, · · · , yn) requires averaging the joint distribution

p(µ, Σ|y1, · · · , yn) over µ, as a result, we have Σ|y1, · · · , yn ∼ Inv-Wishartνn(Λ−1

n )

where Λn = Λ0 + S +

κ0n κ0+n(¯

y − µ0)(¯ y − µ0)T with S = n

i=1(yi − ¯

y)(yi − ¯ y)T

11 of 13

slide-12
SLIDE 12

Marginal posterior distribution of µ, p(µ|y1, · · · , yn)

  • Estimand of interest: µ
  • To obtain the marginal posterior distribution of µ:
  • our results from the univariate normal is generalized to the

multivariate case: µ|y1, · · · , yn ∼ tνn−d+1(µn, Λn/(κn(νn − d + 1))) where

  • µn =

κ0 κ0+nµ0 + n κ0+n¯

y

  • κn = κ0 + n, νn = ν0 + n
  • Λn = Λ0 + S +

κ0n κ0+n(¯

y − µ0)(¯ y − µ0)T with S = n

i=1(yi − ¯

y)(yi − ¯ y)T

  • By simulation:
  • first draw Σ from p(Σ|y1, · · · , yn) with

Σ|y1, · · · , yn ∼ Inv-Wishartνn(Λ−1

n ),

  • then draw µ from p(µ|Σ, y1, · · · , yn) with

µ|Σ, y1, · · · , yn ∼ MVN(µn, Σ κn ).

12 of 13

slide-13
SLIDE 13

the multivariate normal model: Non-informative prior

  • y1, · · · , yn

iid

∼ MVN(µ, Σ), both µ and Σ known, use Bayesian approach to estimate µ.

  • a common non-informative prior is the Jeffreys prior density:

p(µ, Σ) ∝ |Σ|−(d+1)/2, which is the limit of the conjugate prior density as κ0 → 0, ν0 → −1, |Λ0| → 0.

  • the marginal and conditional densities can be written as

Σ|y1, · · · , yn ∼ Inv-Wishartn−1(S), µ|Σ, y1, · · · , yn ∼ MVN(¯ y, Σ n ).

  • marginal posterior of µ

µ|y1, · · · , yn ∼ tn−d(¯ y, S/(n(n − d))).

13 of 13