Hierarchical models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State - - PowerPoint PPT Presentation

hierarchical models cont
SMART_READER_LITE
LIVE PREVIEW

Hierarchical models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State - - PowerPoint PPT Presentation

Hierarchical models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019 Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 1 / 21 Outline Theoretical justification for hierarchical models


slide-1
SLIDE 1

Hierarchical models (cont.)

  • Dr. Jarad Niemi

STAT 544 - Iowa State University

February 21, 2019

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 1 / 21

slide-2
SLIDE 2

Outline

Theoretical justification for hierarchical models

Exchangeability de Finetti’s theorem Application to hierarchical models

Normal hierarchical model

Posterior Simulation study Shrinkage

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 2 / 21

slide-3
SLIDE 3

Theoretical justification for hierarchical models Exchangability

Exchangeability

Definition The set Y1, Y2, . . . , Yn is exchangeable if the joint probability p(y1, . . . , yn) is invariant to permutation of the indices. That is, for any permutation π, p(y1, . . . , yn) = p(yπ1, . . . , yπn). An exchangeable but not iid example: Consider an urn with one red ball and one blue ball with probability 1/2 of drawing either. Draw without replacement from the urn. Let Yi = 1 if the ith ball is red and otherwise Yi = 0. Since 1/2 = P(Y1 = 1, Y2 = 0) = P(Y1 = 0, Y2 = 1) = 1/2, Y1 and Y2 are exchangeable. But 0 = P(Y2 = 1|Y1 = 1) = P(Y2 = 1) = 1/2 and thus Y1 and Y2 are not independent.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 3 / 21

slide-4
SLIDE 4

Theoretical justification for hierarchical models Exchangability

Exchangeability

Theorem All independent and identically distributed random variables are exchangeable. Proof. Let yi

iid

∼ p(y), then p(y1, . . . , yn) =

n

  • i=1

p(yi) =

n

  • i=1

p(yπi) = p(yπ1, . . . , yπn) Definition The sequence Y1, Y2, . . . is infinitely exchangeable if, for any n, Y1, Y2, . . . , Yn are exchangeable.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 4 / 21

slide-5
SLIDE 5

Theoretical justification for hierarchical models de Finetti’s theorem

de Finetti’s theorem

Theorem A sequence of random variables (y1, y2, . . .) is infinitely exchangeable iff, for all n, p(y1, y2, . . . , yn) =

  • n
  • i=1

p(yi|θ)P(dθ), for some measure P on θ. If the distribution on θ has a density, we can replace P(dθ) with p(θ)dθ. This means that there must exist a parameter θ, a likelihood p(y|θ) such that yi

ind

∼ p(y|θ), and a distribution P on θ.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 5 / 21

slide-6
SLIDE 6

Theoretical justification for hierarchical models Hierarchical models

Application to hierarchical models

Assume (y1, y2, . . .) are infinitely exchangeable, then by de Finetti’s theorem for the (y1, . . . , yn) that you actually observed, there exists a parameter θ, a distribution p(y|θ) such that yi

ind

∼ p(y|θ), and a distribution P on θ. Assume θ = (θ1, θ2, . . .) with θi infinitely exchangeable. By de Finetti’s theorem for (θ1, . . . , θn), there exists a parameter φ, a distribution p(θ|φ) such that θi

ind

∼ p(θ|φ), and a distribution P on φ. Assume φ = φ with φ ∼ p(φ).

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 6 / 21

slide-7
SLIDE 7

Theoretical justification for hierarchical models Covariate information

Exchangeability with covariates

Suppose we observe yi observations and xi covariates for each unit i. Now we assume (y1, y2, . . .) are infinitely exchangeable given xi, then by de Finetti’s theorem for the (y1, . . . , yn), there exists a parameter θ, a distribution p(y|θ, x) such that yi

ind

∼ p(y|θ, xi), and a distribution P on θ given x. Assume θ = (θ1, θ2, . . .) with θi infinitely exchangeable given x. By de Finetti’s theorem for (θ1, . . . , θn), there exists a parameter φ, a distribution p(θ|φ, x) such that θi

ind

∼ p(θ|φ, xi), and a distribution P on φ given x. Assume φ = φ with φ ∼ p(φ|x).

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 7 / 21

slide-8
SLIDE 8

Summary

Summary

Hierarchical model: yi

ind

∼ p(y|θi), θi

ind

∼ p(θ|φ), φ ∼ p(φ) Hierarchical linear model: yi

ind

∼ p(y|θi, xi), θi

ind

∼ p(θ|φ, xi), φ ∼ p(φ|x) Although hierarchical models are typically written using the conditional independence notation above, the assumptions underlying the model are exchangeability and functional forms for the priors.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 8 / 21

slide-9
SLIDE 9

Normal hierarchical models

Normal hierarchical models

Suppose we have the following model yij

ind

∼ N(θi, σ2) θi

iid

∼ N(µ, τ 2) with j = 1, . . . , ni, i = 1, . . . , I, and n = I

i=1 ni. This is a normal

hierarchical model. Make the following assumptions for computational reasons: Let σ2 = s2 be known. Assume p(µ, τ) ∝ p(µ|τ)p(τ) ∝ p(τ), i.e. assume an improper uniform prior on µ.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 9 / 21

slide-10
SLIDE 10

Normal hierarchical models Posterior

Posterior distribution

The posterior is p(θ, µ, τ|y) ∝ p(y|θ)p(θ|µ, τ)p(µ|τ)p(τ) but the decomposition p(θ, µ, τ|y) = p(θ|µ, τ, y)p(µ|τ, y)p(τ|y) where p(θ|µ, τ, y) ∝ p(y|θ)p(θ|µ, τ) p(µ|τ, y) ∝

  • p(y|θ)p(θ|µ, τ)dθ p(µ|τ)

p(τ|y) ∝

  • p(y|θ)p(θ|µ, τ)p(µ|τ)dθdµ p(τ)

will aide computation via

  • 1. τ (k) ∼ p (τ|y)
  • 2. µ(k) ∼ p
  • µ|τ (k), y
  • 3. θ(k)

i

∼ p

  • θ|µ(k), τ (k), y
  • for i = 1, . . . , I.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 10 / 21

slide-11
SLIDE 11

Normal hierarchical models Posterior

Posterior distributions

The necessary conditional and marginal posteriors are presented in section 5.4 of BDA. Let yi· = 1 ni

ni

  • j=1

yij and s2

i = s2/ni

Then p(τ|y) ∝ p(τ)V 1/2

µ

I

i=1(s2 i + τ 2)−1/2 exp

  • − (yi·−ˆ

µ)2 2(s2

i +τ 2)

  • µ|τ, y

∼ N(ˆ µ, Vµ) θi|µ, τ, y ∼ N(ˆ θi, Vi) V −1

µ

= I

i=1 1 s2

i +τ 2

ˆ µ = Vµ I

i=1 y·i s2

i +τ 2

  • V −1

i

= 1

s2

i + 1

τ 2

ˆ θi = Vi

  • yi·

s2

i + µ

τ 2

  • Jarad Niemi (STAT544@ISU)

Hierarchical models (cont.) February 21, 2019 11 / 21

slide-12
SLIDE 12

Normal hierarchical models Simulation study

Simulation study

Common to both simulation scenarios: I = 10 ni = 9 for all i s = 1 thus si = 1/3 for all i Scenarios:

  • 1. Common mean: θi = 0 for all i
  • 2. Group-specific means: θi = i − (I/2 + .5)

Use τ ∼ Ca+(0, 1).

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 12 / 21

slide-13
SLIDE 13

Normal hierarchical models Simulation study

Simulation study

J = 10 n_per_group = 9 n = rep(n_per_group,J) sigma = 1 N = sum(n) group = rep(1:J, each=n_per_group) set.seed(1) df = rbind(data.frame(group = factor(group), simulation = "common_mean", y = rnorm(N )), # All means are the same data.frame(group = factor(group), simulation = "group_specific_mean", y = rnorm(N, group-(J/2+.5)))) # Each group has its own mean Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 13 / 21

slide-14
SLIDE 14

Normal hierarchical models Simulation study common_mean group_specific_mean 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 −4 4

group y group

1 2 3 4 5 6 7 8 9 10 Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 14 / 21

slide-15
SLIDE 15

Normal hierarchical models Simulation study

Summary statistics

simulation group n mean sd 1 common mean 1 9 0.18 0.81 2 common mean 2 9 0.09 1.11 3 common mean 3 9 0.18 0.91 4 common mean 4 9

  • 0.19

0.89 5 common mean 5 9 0.17 0.62 6 common mean 6 9 0.02 0.70 7 common mean 7 9 0.61 1.14 8 common mean 8 9 0.14 1.19 9 common mean 9 9

  • 0.31

0.60 10 common mean 10 9 0.20 0.81 11 group specific mean 1 9

  • 4.32

1.10 12 group specific mean 2 9

  • 3.40

0.88 13 group specific mean 3 9

  • 2.41

0.89 14 group specific mean 4 9

  • 1.38

0.60 15 group specific mean 5 9

  • 0.76

0.61 16 group specific mean 6 9

  • 0.16

0.95 17 group specific mean 7 9 1.21 1.12 18 group specific mean 8 9 2.23 1.15 19 group specific mean 9 9 3.97 1.26 20 group specific mean 10 9 5.08 0.77 Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 15 / 21

slide-16
SLIDE 16

Normal hierarchical models Sampling on a grid

Sampling on a grid

Consider samping from an arbitrary unnormalized density f(τ) ∝ p(τ|y) using the following approach

  • 1. Construct a step-function approximation to this density:
  • a. Determine an interval [L, U] such that outside this interval f(τ) is

small.

  • b. Set an interval half-width h to generate a grid of M points

(x1, . . . , xM) in this interval, i.e. x1 = L + h and xm = xm−1 + 2h ∀ 1 < m ≤ M.

  • c. Evaluate the density on this grid, i.e. f(xm).
  • d. Normalize interval weights, i.e. wm = f(xm)

M

i=1 f(xi)

(to constructed a normalized density, divide each wm by 2h.).

  • 2. Sampling from this approximation:
  • a. Sample an interval m with probability wm.
  • b. Sample uniformly within this interval, i.e. τ ∼ Unif(xm − h, xm + h).

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 16 / 21

slide-17
SLIDE 17

Normal hierarchical models Sampling on a grid

Approximation to p(τ|y) when τ ∼ Ca+(0, 1)

common_mean group_specific_mean 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 1 2

x − h p

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 17 / 21

slide-18
SLIDE 18

Normal hierarchical models Sampling on a grid

Hyperparameters: group-to-group mean variability

Recall θi

ind

∼ N(µ, τ 2):

tau mu common_mean group_specific_mean 0.0 2.5 5.0 7.5 −6 −3 3 1 2 3 4 1 2 3 4

value density

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 18 / 21

slide-19
SLIDE 19

Normal hierarchical models Sampling on a grid

Group-specific means

Recall θi

ind

∼ N(µ, τ 2):

common_mean group_specific_mean hierarchical independent −6 −3 3 6 −6 −3 3 6 1 2 1 2

value density variable

theta.1 theta.2 theta.3 theta.4 theta.5 theta.6 theta.7 theta.8 theta.9 theta.10

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 19 / 21

slide-20
SLIDE 20

Normal hierarchical models Summary

Extensions

Unknown data variance: yij ∼ N(θi, σ2), θi ∼ N(µ, τ 2)

  • r

yij ∼ N(θi, σ2), θi ∼ N(µ, σ2τ 2) Alternative distributions:

Heavy-tailed: yij ∼ N(θi, σ2), θi ∼ tν(µ, τ 2) Peak at zero: yij ∼ N(θi, σ2), θi ∼ Laplace(µ, τ 2) Point mass at zero: yij ∼ N(θi, σ2), θi ∼ πδ0 + (1 − π)N(µ, τ 2)

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 20 / 21

slide-21
SLIDE 21

Normal hierarchical models Summary

Summary

Hierarchical models allow the data to inform us about similarities across groups provide data driven shrinkage toward a grand mean

lots of shrinkage when means are similar little shrinkage when means are different

Computation used the decomposition p(θ, µ, τ|y) = p(θ|µ, τ, y)p(µ|τ, y)p(τ|y) which allowed for simulation from τ then µ and then θ to obtain samples from the posterior.

Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 21 / 21