Introduction to General and Generalized Linear Models Mixed effects - - PowerPoint PPT Presentation

introduction to general and generalized linear models
SMART_READER_LITE
LIVE PREVIEW

Introduction to General and Generalized Linear Models Mixed effects - - PowerPoint PPT Presentation

Introduction to General and Generalized Linear Models Mixed effects models - Part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby January 2011 Henrik Madsen Poul


slide-1
SLIDE 1

Introduction to General and Generalized Linear Models

Mixed effects models - Part III Henrik Madsen Poul Thyregod

Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

January 2011

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 1 / 28

slide-2
SLIDE 2

This lecture

Bayesian Interpretations Posterior distributions for multivariate normal distributions Random effects for multivariate measurements

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 2 / 28

slide-3
SLIDE 3

Bayesian interpretations

Bayesian interpretations

In settings where fX(x) expresses a so-called “subjective probability distribution” (possibly degenerate), the expression fX|Y =y(x) = fY |X=x(y)fX(x)

  • fY |X=x(y)fX(x)dx

for the conditional distribution of X for given Y = y is termed Bayes’

  • theorem. In such settings, the distribution fX(·) of X is called the prior

distribution and the conditional distribution with density function fX|Y =y(x) is called the posterior distribution after observation of Y = y.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 3 / 28

slide-4
SLIDE 4

Bayesian interpretations

Bayesian interpretations

Bayes theorem is useful in connection with hierarchical models where the variable X denotes a non-observable state (or parameter) that is associated with the individual experimental object, and Y denotes the observed quantities. In such situations one may often describe the conditional distribution of Y for given state (X = x), and one will have observations of the marginal distribution of Y . In general it is not possible to observe the states (x), and therefore the distribution fX(x) is not observed directly. This situation arises in many contexts such as hidden Markov models (HMM), or state space models, where inference about the state (X) can be

  • btained using the so-called Kalman Filter

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 4 / 28

slide-5
SLIDE 5

Bayesian interpretations

A Bayesian formulation

We will discuss the use of Bayes’ theorem in situations where the “prior distribution”, fX(x), has a frequency interpretation. The one-way random effects model may be formulated in a Bayesian framework. We may identify the N(·, σ2

u)-distribution of µi = µ + Ui as the prior

distribution. The statistical model for the data is such that for given µi, are the Yij’s independent and distributed like N(µi, σ2). In a Bayesian framework, the conditional distribution of µi given Y i = yi is termed the posterior distribution for µi.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 5 / 28

slide-6
SLIDE 6

Bayesian interpretations

A Bayesian formulation

Theorem (The posterior distribution of µi) Consider the one-way model with random effects

Yij|µi ∼ N(µi, σ2) µi ∼ N(µ, σ2

u)

where µ, σ2 and σ2

u are known. The posterior distribution of µi after observation

  • f yi1, yi2, . . . , yin is a normal distribution with mean and variance

E[µi|Y i = yi] = µ/σ2

u + ni¯

yi/σ2 1/σ2

u + ni/σ2

= wµ + (1 − w)yi Var[µi|Y i = yi] = 1 1 σ2

u

+ n σ2

where

w = 1 σ2

u

n σ2 + 1 σ2

u

= 1 1 + nγ with γ = σ2

u/σ2. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 6 / 28

slide-7
SLIDE 7

Bayesian interpretations

A Bayesian formulation

We observe that the posterior mean is a weighted average of the prior mean µ, and sample result yi with the corresponding precisions (reciprocal variances) as weights. Note that the weights only depend on the signal/noise ratio γ, and not on the numerical values of σ2 and σ2

u;

Therefore we may express the posterior mean as E[µi|Y i = ¯ yi] = µ/γ + ni¯ yi 1/γ + ni The expression for the posterior variance simplifies, if instead we consider the precision, i.e. the reciprocal variance 1 σ2

post

= 1 σ2

u

+ ni σ2

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 7 / 28

slide-8
SLIDE 8

Bayesian interpretations

A Bayesian formulation

We have that the precision in the posterior distribution is the sum of the precision in the prior distribution and the sampling precision. In terms of the signal/noise ratio, γ, with γprior = σ2

u/σ2 and

γpost = σ2

post/σ2 we have

1 γpost = 1 γprior + ni and µpost = wµprior + (1 − w)¯ yi with w = 1 1 + nγprior in analogy with the BLUP-estimate.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 8 / 28

slide-9
SLIDE 9

Bayesian interpretations

Estimation under squared error loss

The squared error loss function measures the discrepancy between a set of estimates di(y) and the true parameter values µi, i = 1, . . . , k and is defind by L(µ, d(y)) = k

  • i=1
  • di(y) − µi

2

  • .

Averaging over the distribution of Y for given value of µ we obtain the risk of using the estimator d(Y ) when the true parameter is µ R(µ, d(.)) = 1 k EY |µ k

  • i=1
  • di(y) − µi

2

  • .

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 9 / 28

slide-10
SLIDE 10

Bayesian interpretations

Estimation under squared error loss

Theorem (Risk of the ML-estimator in the one-way model) Let dML(Y ) denote the maximum likelihood estimator for µ in the

  • ne-way model with fixed effects with µ arbitrary,

dML

i

(Y ) = Y i = 1 n

n

  • j=1

Yij. The risk of this estimator is R(µ, dML) = σ2 n regardless of the value of µ.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 10 / 28

slide-11
SLIDE 11

Bayesian interpretations

Estimation under squared error loss

Bayes risk for the ML-estimator Introducing the further assumption that µ may be considered as a random variable with the (prior) distribution we may determine Bayes risk of dML(·) under this distribution as r((µ, γ), dML) = Eµ(R(µ, dML)) Clearly, as R(µ, dML) does not depend on µ we have that the Bayes risk is r((µ, γ), dML) = σ2 n .

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 11 / 28

slide-12
SLIDE 12

Bayesian interpretations

Estimation under squared error loss

The Bayes estimator dB(Y ) is the estimator that minimizes the Bayes risk. dB

i (Y ) = E[µi|Y i]

It may be shown that the Bayes risk of this estimator is the posterior variance, r((µ, γ), dB) = 1 1 σ2

u

+ n σ2 = σ2/n 1 + 1/(nγ) The Bayes risk of the Bayes estimator is less than that of the maximum likelihood estimator.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 12 / 28

slide-13
SLIDE 13

Bayesian interpretations

The empirical Bayes approach

When the parameters (µ, γ) in the prior distribution are unknown, one may utilize the whole set of observations Y for estimating µ, γ and σ2. We have

  • µ = Y .. = 1

k

k

  • i=1

Y i.,

  • σ2 =

SSE k(n − 1) with SSE ∼ σ2χ2(k(n − 1)) and SSB ∼ σ2(1 + nγ)χ2(k − 1). As SSE and SSB are independent with E k − 3 SSB

  • =

1 σ2(1 + nγ) we find that E

  • σ2

SSB /(k − 3)

  • =

1 1 + nγ = w.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 13 / 28

slide-14
SLIDE 14

Bayesian interpretations

The empirical Bayes approach

Looking at the estimator

  • σ2 =

SSE k(n − 1) + 2 and utilize that

  • w =
  • σ2

SSB /(k − 3) We observe that w may be expressed by the usual F-test statistic as

  • w = k − 3

k − 1 k(n − 1) k(n − 1) + 2 1 F Substituting µ and w by the estimates µ and w for the posterior mean dB

i (Y ) = E[µi|Y i.] = wµ + (1 − w)Y i.

we obtain the estimator dEB

i

(Y ) = w µ + (1 − w)Y i. The estimator is called an empirical Bayes estimator.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 14 / 28

slide-15
SLIDE 15

Bayesian interpretations

The empirical Bayes approach

Theorem (Bayes risk of the empirical Bayes estimator) Under certain assumptions we have that r((µ, γ), dEB) = 1 n

  • 1 −

2(k − 3) {k(n − 1) + 2}(1 + nγ)

  • When k > 3, then the prior risk for the empirical Bayes estimator

dEB is smaller than for the maximum likelihood estimator dML. The smaller the value of the signal/noise ratio γ, the larger the difference in risk for the two estimators.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 15 / 28

slide-16
SLIDE 16

Posterior distributions for multivariate normal distributions

Posterior distributions for multivariate normal distributions

Theorem (Posterior distribution for multivariate normal distributions) Let Y | µ ∼ Np(µ, Σ) and let µ ∼ Np(m, Σ0), where Σ and Σ0 are of full rank, p, say. Then the posterior distribution of µ after observation of Y = y is given by µ | Y = y ∼ Np(W m + (I − W )y, (I − W )Σ) with W = Σ(Σ0 + Σ)−1 and I − W = Σ0(Σ0 + Σ)−1

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 16 / 28

slide-17
SLIDE 17

Posterior distributions for multivariate normal distributions

Posterior distributions for multivariate normal distributions

If we let Ψ = Σ0Σ−1 denote the generalized ratio between the variation between groups, and the variation within groups, in analogy with the signal to noise ratio, then we can express the weight matrices W and I − W as W = (I + Ψ)−1 and I − W = (I + Ψ)−1Ψ

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 17 / 28

slide-18
SLIDE 18

Posterior distributions for multivariate normal distributions

Posterior distributions for multivariate normal distributions

Theorem (Posterior distribution in regression model) Let Y denote a n × 1 dimensional vector of observations, and let X denote a n × p dimensional matrix of known coefficients. Assume that Y | β ∼ Nn(Xβ, σ2V ) and that the prior distribution of β is β ∼ Np(β0, σ2Λ) where Λ is of full rank.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 18 / 28

slide-19
SLIDE 19

Posterior distributions for multivariate normal distributions

Posterior distributions for multivariate normal distributions

Theorem (Posterior distribution in regression model, continued) Then the posterior distribution of β after observation of Y = y given by β | Y = y ∼ Np(β1, σ2Λ1) with β1 = W β0 + W ΛXT V −1y where W = (I + Γ)−1, Γ = ΛXT V −1X and Λ1 = (I + Γ)−1Λ = W Λ

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 19 / 28

slide-20
SLIDE 20

Posterior distributions for multivariate normal distributions

Posterior distributions for multivariate normal distributions

The posterior mean expressed as a weighted average If X is of full rank, then XT V −1X may be inverted, and we find the posterior mean β1 = W β0 + (I − W ) β where β denotes the usual least squares estimate for β,

  • β = (XT V −1X)−1XT V −1y

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 20 / 28

slide-21
SLIDE 21

Random effects for multivariate measurements

Random effects for multivariate measurements

Lets consider the situation where the individual observations are p-dimensional vectors, Xij = µ + αi + ǫij, i = 1, 2, . . . , k; j = 1, 2, . . . , nij where µ, αi and ǫij denotes p-dimensional vectors and where ǫij are mutual independent and normally distributed, ǫij ∈ Np(0, Σ), and where Σ denotes the p × p-dimensional covariance matrix. For simplicity we will assume that Σ has full rank. For the fixed effects model we further assume

k

  • i=1

niαi = 0 Given these assumptions we find Zi =

j Xij ∼ Np(ni(µ + αi), niΣ).

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 21 / 28

slide-22
SLIDE 22

Random effects for multivariate measurements

Random effects for multivariate measurements

Let us introduce the notation Xi+ =

ni

  • j=1

Xij/ni X++ =

k

  • i=1

ni

  • j=1

Xij/N =

k

  • i=1

niXi+/

k

  • i=1

ni as descriptions of the group averages and the total average, respectively.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 22 / 28

slide-23
SLIDE 23

Random effects for multivariate measurements

Random effects for multivariate measurements

The variation within groups (SSE), between groups (SSB), and the total variation (SST) is described by SSE =

k

  • i=1

ni

  • j=1

(Xij − Xi+)(Xij − Xi+)T SSB =

k

  • i=1

ni(Xi+ − X++)(Xi+ − X++)T SST =

k

  • i=1

ni

  • j=1

(Xij − X++)(Xij − X++)T with SST = SSE + SSB

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 23 / 28

slide-24
SLIDE 24

Random effects for multivariate measurements

Random effects model

The random effects model for the p-dimensional observations is Xij = µ + ui + ǫij, i = 1, . . . , k; j = 1, 2, . . . , ni. where ui now are independent, ui ∼ Np(0, Σ0), and where ǫij is independent, ǫij ∼ Np(0, Σ). Finally, u and ǫ are independent.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 24 / 28

slide-25
SLIDE 25

Random effects for multivariate measurements

Random effects model

Theorem (The marginal distribution in the case of multivariate p-dimensional observations) Consider the model above. Then the marginal density of Zi =

j Xij is

Np(niµ, niΣ + n2

i Σ0)-distribution

and the marginal density for Xi+ is Np(µ, 1 ni Σ + Σ0) Finally, we have that SSE follows a Wishart distribution SSE ∈ Wisp(N − k, Σ) and SSE is independent of Xi+, i = 1, 2, . . . , k.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 25 / 28

slide-26
SLIDE 26

Random effects for multivariate measurements

Random effects model

Definition (Generalized signal to noise ratio) Let us introduce the generalized signal to noise ratio as the p-dimensional matrix Γ representing the ratio between the variation between groups and the variation within groups Γ = Σ0Σ−1.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 26 / 28

slide-27
SLIDE 27

Random effects for multivariate measurements

Random effects model

Theorem (Moment estimates for the multivariate random effects model) Given the assumptions above we find the moment estimates for µ, Σ and Σ0 as ˜ µ = x++ ˜ Σ = 1 N − k SSE ˜ Σ0 = 1 n0 SSB k − 1 − ˜ Σ

  • Henrik Madsen Poul Thyregod (IMM-DTU)

Chapman & Hall January 2011 27 / 28

slide-28
SLIDE 28

Random effects for multivariate measurements

Random effects model

Theorem (MLE for the multivariate random effects model) Still under the assumptions as above, we find the maximum likelihood estimates (MLEs) for µ , Σ and Σ0 by maximizing the log-likelihood ℓ(µ, Σ, Σ0; x1+, . . . , xk+) = −N − k 2 log(det(Σ)) − 1 2tr((SSE)Σ−1) −

k

  • i=1
  • log
  • det

Σ ni + Σ0

  • +1

2(xi+ − µ)T Σ ni + Σ0 −1 (xi+ − µ)

  • with respect to µ ∈ Rp and Σ and Σ0 in the space of non-negative

definite p × p matrices. Since no explicit solution exists the maximum likelihood estimates must be found using numerical procedures.

Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall January 2011 28 / 28