Hierarchical Models
Applied Bayesian Statistics
- Dr. Earvin Balderama
Department of Mathematics & Statistics Loyola University Chicago
November 9, 2017
1
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
Hierarchical Models Applied Bayesian Statistics Dr. Earvin - - PowerPoint PPT Presentation
Hierarchical Models Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago November 9, 2017 Hierarchical Models 1 Last edited November 9, 2017 by <ebalderama@luc.edu>
Applied Bayesian Statistics
Department of Mathematics & Statistics Loyola University Chicago
November 9, 2017
1
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
The hierarchical modeling framework is popular in the Bayesian literature because MCMC is conducive to hierarchical models. Bayesian models can be written with the following basic layers of hierarchy:
1
Data layer [Y |θ, α] is the likelihood for the observed data Y.
2
Process layer [θ |α] is the model for the parameters θ that define the latent data generating process.
3
Prior layer [α] priors for hyperparameters.
2
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
Example: One-way random effects model Yij ∼ Normal
and θi ∼ Normal
where Yij is the jth replicate for unit i and α = (µ, σ2, τ 2) has an uninformative prior. This hierarchy can be written using a directed acyclic graph (DAG), also called a Bayesian network or belief network.
3
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
MCMC is efficient even if the number of parameters or levels of hierarchy is large. You only need to consider connected nodes when updating each parameter, e.g.,
1
[θi |·]
2
[µ |·]
3
[σ2 |·]
4
[τ 2 |·] Each of these updates is a draw from a standard one-dimensional normal or inverse gamma.
4
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
Example: Ozone measurements
bayes_resources/data/ozone.csv") For the Ozone measurement data, we fit the model Yij ∼ Normal
where µ is the overall mean. αi is the random effect for spatial location i and γj is the random effect of day j.
5
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
The likelihood model is Yij ∼ Normal
Priors for the fixed effects model: αi ∼ Normal
and γj ∼ Normal
Priors for the random effects model: αi ∼ Normal
a
γj ∼ Normal
g
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
Example: Jaw bone height data load(url("http://math.luc.edu/~ebalderama/bayes_resources /data/jaw.RData")) Let Yij be the bone density for child i at age Xj. We may specify a different regression for each child to capture variability over the population of children: Yij ∼ Normal
where γi = (γ0i, γ1i)T controls the growth curve for child i. These separate regressions are tied together in the prior, γi ∼ Normal(θ, Σ) which borrows strength across children. This is called a linear mixed model: γi are random effects specific to one child and θ are the fixed effects common to all children.
7
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
8.0 8.5 9.0 9.5 46 48 50 52 54 Age Bone height
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
The random-effects covariance matrix is Σ =
1
σ12 σ12 σ2
2
σ2
1 is the variance of the intercepts across children,
σ2
2 is the variance of the slopes across children,
σ12 is the covariance between the intercepts and slopes. Several ways to specify the prior:
1
σ2
1, σ2 2 ∼ InverseGamma
and ρ =
σ12 σ1σ2 ∼ Uniform(−1, 1),
2
Inverse Wishart, which works better in higher dimensions.
9
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
The Inverse Wishart distribution is the conjugate prior for a p × p covariance matrix of a multivariate normal distribution. It reduces to the (univariate) inverse gamma distribution if p = 1. Σ ∼ InverseWishart(ν, S) implies Σ−1 ∼ Wishart(ν, S−1), where hyperparameters ν > p − 1 is the degrees of freedom, and S is a p × p positive-definite scale matrix. The Inverse Wishart PDF is f(Σ) ∝ |Σ|−(ν+p+1)/2 exp 1 2trace
and the mean is E(Σ) =
1 ν−p−1S.
10
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>
The hierarchical model is then: Yij ∼ Normal
γi ∼ Normal(θ, Σ) f(θ) ∝ 1 σ2 ∼ InverseGamma(a, b) Σ ∼ InverseWishart(ν, S) The full conditionals are all conjugate!
11
Hierarchical Models Last edited November 9, 2017 by <ebalderama@luc.edu>