Hierarchical Models Applied Bayesian Statistics Dr. Earvin - - PowerPoint PPT Presentation

hierarchical models
SMART_READER_LITE
LIVE PREVIEW

Hierarchical Models Applied Bayesian Statistics Dr. Earvin - - PowerPoint PPT Presentation

Hierarchical Models Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago November 9, 2017 Hierarchical Models 1 Last edited November 9, 2017 by <ebalderama@luc.edu>


slide-1
SLIDE 1

Hierarchical Models

Applied Bayesian Statistics

  • Dr. Earvin Balderama

Department of Mathematics & Statistics Loyola University Chicago

November 9, 2017

1

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-2
SLIDE 2

Layers of hierarchy

The hierarchical modeling framework is popular in the Bayesian literature because MCMC is conducive to hierarchical models. Bayesian models can be written with the following basic layers of hierarchy:

1

Data layer [Y |θ, α] is the likelihood for the observed data Y.

2

Process layer [θ |α] is the model for the parameters θ that define the latent data generating process.

3

Prior layer [α] priors for hyperparameters.

2

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-3
SLIDE 3

Hierarchical models and MCMC

Example: One-way random effects model Yij ∼ Normal

  • θi, σ2

and θi ∼ Normal

  • µ, τ 2

where Yij is the jth replicate for unit i and α = (µ, σ2, τ 2) has an uninformative prior. This hierarchy can be written using a directed acyclic graph (DAG), also called a Bayesian network or belief network.

3

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-4
SLIDE 4

Hierarchical models and MCMC

MCMC is efficient even if the number of parameters or levels of hierarchy is large. You only need to consider connected nodes when updating each parameter, e.g.,

1

[θi |·]

2

[µ |·]

3

[σ2 |·]

4

[τ 2 |·] Each of these updates is a draw from a standard one-dimensional normal or inverse gamma.

4

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-5
SLIDE 5

Two-way random effects model

Example: Ozone measurements

  • zone <- read.csv("http://math.luc.edu/~ebalderama/

bayes_resources/data/ozone.csv") For the Ozone measurement data, we fit the model Yij ∼ Normal

  • µ + αi + γj, σ2

where µ is the overall mean. αi is the random effect for spatial location i and γj is the random effect of day j.

5

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-6
SLIDE 6

Two-way random effects model

The likelihood model is Yij ∼ Normal

  • µ + αi + γj, σ2

Priors for the fixed effects model: αi ∼ Normal

  • 0, 102

and γj ∼ Normal

  • 0, 102

Priors for the random effects model: αi ∼ Normal

  • 0, σ2

a

  • and

γj ∼ Normal

  • 0, σ2

g

  • 6

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-7
SLIDE 7

Random slopes model

Example: Jaw bone height data load(url("http://math.luc.edu/~ebalderama/bayes_resources /data/jaw.RData")) Let Yij be the bone density for child i at age Xj. We may specify a different regression for each child to capture variability over the population of children: Yij ∼ Normal

  • γ0i + Xjγ1i, σ2

where γi = (γ0i, γ1i)T controls the growth curve for child i. These separate regressions are tied together in the prior, γi ∼ Normal(θ, Σ) which borrows strength across children. This is called a linear mixed model: γi are random effects specific to one child and θ are the fixed effects common to all children.

7

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-8
SLIDE 8

Bone height data

8.0 8.5 9.0 9.5 46 48 50 52 54 Age Bone height

  • 8

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-9
SLIDE 9

Prior for covariance matrix

The random-effects covariance matrix is Σ =

  • σ2

1

σ12 σ12 σ2

2

  • where

σ2

1 is the variance of the intercepts across children,

σ2

2 is the variance of the slopes across children,

σ12 is the covariance between the intercepts and slopes. Several ways to specify the prior:

1

σ2

1, σ2 2 ∼ InverseGamma

and ρ =

σ12 σ1σ2 ∼ Uniform(−1, 1),

2

Inverse Wishart, which works better in higher dimensions.

9

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-10
SLIDE 10

Inverse Wishart distribution

The Inverse Wishart distribution is the conjugate prior for a p × p covariance matrix of a multivariate normal distribution. It reduces to the (univariate) inverse gamma distribution if p = 1. Σ ∼ InverseWishart(ν, S) implies Σ−1 ∼ Wishart(ν, S−1), where hyperparameters ν > p − 1 is the degrees of freedom, and S is a p × p positive-definite scale matrix. The Inverse Wishart PDF is f(Σ) ∝ |Σ|−(ν+p+1)/2 exp 1 2trace

  • SΣ−1

and the mean is E(Σ) =

1 ν−p−1S.

10

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>

slide-11
SLIDE 11

Full conditional distributions

The hierarchical model is then: Yij ∼ Normal

  • γ0i + Xjγ1i, σ2

γi ∼ Normal(θ, Σ) f(θ) ∝ 1 σ2 ∼ InverseGamma(a, b) Σ ∼ InverseWishart(ν, S) The full conditionals are all conjugate!

11

Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>