Multivariate Responses In the general mean-variance specification E - - PowerPoint PPT Presentation

multivariate responses
SMART_READER_LITE
LIVE PREVIEW

Multivariate Responses In the general mean-variance specification E - - PowerPoint PPT Presentation

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Multivariate Responses In the general mean-variance specification E ( Y j | x ) = f ( x j , ) , var ( Y j | x j ) = 2 g ( , , x j ) 2 , we have assumed that


slide-1
SLIDE 1

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Multivariate Responses

In the general mean-variance specification E (Yj| x) = f (xj, β) , var (Yj| xj) = σ2g (β, θ, xj)2 , we have assumed that the responses Y1, Y2, . . . , Yn are conditionally independent, conditioning on x1, x2, . . . , xn. In many situations, this assumption may fail: clusters of observations, such as pups born to mother rats; serial correlation in repeated measurements on each experimental unit.

1 / 37 Multivariate Responses

slide-2
SLIDE 2

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Recall Example 1.7: Developmental toxicology studies Developmental toxicology studies in rodents are used in testing and regulation of potentially toxic substances that may pose danger to developing fetuses. A total of m pregnant rats are exposed to different doses of a toxic agent, and each mother rat gives birth to ni pups. The response Yi,j, i = 1, 2, . . . , ni is birthweight, and the objective is to characterize the effect on birthweight of different doses of the agent across the population of all exposed mothers and their pups.

2 / 37 Multivariate Responses

slide-3
SLIDE 3

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Recall Example 1.8: Pharmacokinetics of theophylline Subject i receives an oral dose Di of theophylline. Response Yi,j is the subject’s level of the drug at time ti,j after administration, j = 1, 2, . . . , ni. For a given subject, a pharmacokinetic model may explain the time-variation in the response. A broader objective is to understand pharmacokinetic behavior in the entire population of subjects.

3 / 37 Multivariate Responses

slide-4
SLIDE 4

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Why worry? If we ignore dependence: parameter estimates are generally inefficient; standard errors are generally wrong, hence inferences (confidence intervals, hypothesis tests) do not have nominal properties (coverage probability, size); statistical framework may be inappropriate for scientific

  • bjectives.

Inefficiency may not be important, invalidity is always important, but relevance to the science is paramount.

4 / 37 Multivariate Responses

slide-5
SLIDE 5

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

We limit discussion to situations where groups of observations may unambiguously be assumed to be independent: m response vectors Yi, i = 1, 2, . . . , m; ni observations on subject i Yi =      Yi,1 Yi,2 . . . Yi,ni      .

5 / 37 Multivariate Responses

slide-6
SLIDE 6

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Covariates Within-individual covariates: describe conditions under which Yi,j was observed; needed even if inference were restricted to individual i; e.g., ti,j = time of jth observation on individual i. Among-individual covariates: same value for all observations on individual i; e.g., treatment assigned to this individual, or individual characteristics such as gender.

6 / 37 Multivariate Responses

slide-7
SLIDE 7

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Covariate notation Within-individual covariate vector zi,j; Stacked: zi =      zi,1 zi,2 . . . zi,ni      Among-individual covariate vector ai. Combined: xi = zi ai

  • .

7 / 37 Multivariate Responses

slide-8
SLIDE 8

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Sources of Dependence Dependence simply means that fi (Yi| xi) =

ni

  • j=1

fi,j (Yi,j| xi) . Very general, hence difficult to specify. It is helpful to distinguish: “individual-level” sources; “population-level” sources.

8 / 37 Multivariate Responses

slide-9
SLIDE 9

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Individual-Level Sources of Dependence For example, suppose we model repeated measurements of a subject’s blood pressure using a within-individual linear regression: Yi,j = β0,i + β1,iti,j + ei,j, where ti,j is the time of the jth measurement on the ith subject. The linear trend β0,i + β1,it represents the mean response for that subject.

9 / 37 Multivariate Responses

slide-10
SLIDE 10

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The subject’s actual blood pressure at time t at the time of testing is β0,i + β1,it + eP(t) where eP(t) is random variation around that mean response, perhaps a stationary stochastic process. If measurement error eM,i,j is non-negligible, then ei,j = eP (ti,j) + eM,i,j.

10 / 37 Multivariate Responses

slide-11
SLIDE 11

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Then var (Yi,j) = var (eP,i,j) + var (eM,i,j) and, for j′ = j, cov (Yi,j, Yi,j′) = cov (eP,i,j, eP,i,j′) . We would need to specify a model for these variances and covariances in order to make inferences about β0,i and β1,i.

11 / 37 Multivariate Responses

slide-12
SLIDE 12

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The conceptual representation says that the response vector for a single individual is intermittent observations on a stochastic process, whose realizations fluctuate to some extent about a smooth inherent trend, possibly subject to additional measurement error. Here the frame of inference is the individual subject.

12 / 37 Multivariate Responses

slide-13
SLIDE 13

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Population-Level Sources of Dependence If the subjects are themselves a random sample from some population, then the “parameters” βi = β0,i β1,i

  • associated with the ith subject are a random sample from the

corresponding population of parameter vectors. We shall be interested in the mean and dispersion in this population, which describe the average across subjects and the variation among subjects. Here the frame of inference is the population of subjects.

13 / 37 Multivariate Responses

slide-14
SLIDE 14

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Subject-Specific Modeling

Example: Theophylline concentration-time profiles

14 / 37 Multivariate Responses

slide-15
SLIDE 15

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Pharmacokinetics suggests the subject-specific model E (Yi,j| zi,j, βi) = Diβ3,i

  • e

β1,i ti,j β2,i

− e−β3,iti,j

  • (β2,iβ3,i − β1,i)

. where zi,j contains Di = dose for ith subject, and ti,j = time of jth measurement for ith subject (Di is the same for all measurements, but is included in zi,j instead of ai for convenience); the vector βi =   β1,i β2,i β3,i   consists of parameters specific to subject i.

15 / 37 Multivariate Responses

slide-16
SLIDE 16

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

This may be written E (Yi,j| zi,j, βi) = f (zi,j, βi) . Because βi is associated with the randomly selected ith subject, it is a random variable, and the model for that subject is conditional on the value of βi. We also need some assumptions about how βi varies from subject to subject. We might assume βi ∼ N(β, D), or equivalently βi = β + bi, where bi ∼ N(0, D). Here β is the mean parameter vector across the population of

16 / 37 Multivariate Responses

slide-17
SLIDE 17

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

If the subjects were in two groups, e.g. smokers vs non-smokers, the mean might depend on the group to which the subject belongs: βi = β(0) + δiβ(1) + bi =

  • I

δiI β(0) β(1)

  • + bi

where δi is an indicator variable for smokers. More generally: βi = Aiβ + bi, where Ai is a subject-specific design matrix, which is a function of the individual-level (or among-individual) covariate vector ai, and β is the vector of all relevant parameters.

17 / 37 Multivariate Responses

slide-18
SLIDE 18

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

For instance, for the theophylline data, βi =   β1,i β2,i β3,i   =   1 wi ci 1 wi 1 wi             βCl,0 βCl,w βCl,c βV,0 βV,w βka,0 βka,w           + bi. Here wi = ith subject’s body weight and ci = ith subject’s creatinine clearance rate (a measure of kidney function), both of which are components of ai. The 7 βs have pharmacokinetic interpretation.

18 / 37 Multivariate Responses

slide-19
SLIDE 19

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Summary of two-stage modeling Stage 1: Individual model. E (Yi,j| zi,j, βi) = f (zi,j, βi) . Stage 2: Population model. βi = Aiβ + bi.

19 / 37 Multivariate Responses

slide-20
SLIDE 20

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Within-individual variation Need to specify var (Yi| zi, βi). Following the earlier discussion, var (Yi,j|zi,j, βi) = var (eP,i,j|zi,j, βi) + var (eM,i,j|zi,j, βi) = σ2

P + σ2g (βi, θ, zi,j)2 ,

where: σ2

P represents natural variation in the true concentration;

σ2g (βi, θ, zi,j)2 represents measurement error.

20 / 37 Multivariate Responses

slide-21
SLIDE 21

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

In the context of pharmacokinetics, it is often thought that the measurement error due to the assay is the predominant source of variation, whereas biological fluctuations about the trajectory are very small by comparison. This would lead to the familiar variance model var(Yi,j|zi,j, βi) = σ2g(βi, θ, zi,j)2.

21 / 37 Multivariate Responses

slide-22
SLIDE 22

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Also need to specify correlations in var (Yi| zi, βi). We might assume that the biological variations have some correlation matrix Γi (α, zi), and that measurement errors are uncorrelated. Then var (Yi| zi, βi) = σ2

PΓi (α, zi) + σ2W (βi, θ, zi,j)−1 ,

where W (βi, θ, zi,j) = diag

  • 1

g (βi, θ, zi,1)2, 1 g (βi, θ, zi,2)2, . . . , 1 g (βi, θ, zi,ni)2,

  • .

22 / 37 Multivariate Responses

slide-23
SLIDE 23

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

A general subject-specific model Stage 1: Individual model E (Yi| zi, βi) = fi (zi, βi) . With βi = Aiβ + bi, and ai the among-subject covariates on which Ai depends, we may also write E (Yi| zi, ai, bi) = E (Yi| xi, bi) = fi (xi, β, bi) . For the variance: var (Yi| zi, ai, bi) = Ri (βi, γ, zi) = Ri (β, γ, xi, bi) .

23 / 37 Multivariate Responses

slide-24
SLIDE 24

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Stage 2: Population model βi = Aiβ + bi where b1, b2, . . . , bm, called the random effects, are usually assumed to be i.i.d., independent of xi, with E (bi) = 0, var (bi) = D.

24 / 37 Multivariate Responses

slide-25
SLIDE 25

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Remarks Note that observations in the same data vector Yi share the same random effect bi. Thus, the model takes into account naturally the population-level phenomenon that observations on the same individual tend to be “more alike” than observations from different individuals. This model is ideally suited to the scientific objectives stated in the case of pharmacokinetics: β and D characterize mean and variation in the population of parameters, which is of direct scientific interest. Known as nonlinear mixed effects models – “mixed effects” to recognize the presence of both fixed parameters (β and D) and random effects (bi) in the model.

25 / 37 Multivariate Responses

slide-26
SLIDE 26

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Remarks on moments The subject-specific model tells us about E (Yi| xi, bi) and var (Yi| xi, bi). But how about E (Yi| xi) and var (Yi| xi)? The 2-stage model implies that E (Yi| xi) = E {E (Yi| xi, bi)| xi} = E {fi (xi, β, bi)| xi} . If fi(·) is linear in bi, this is just fi (xi, β, 0). That is, for a (sub)population of subjects all with the same covariates xi, the mean response across the population satisfies the same model as the mean response for a single subject, with the population average values of the parameters.

26 / 37 Multivariate Responses

slide-27
SLIDE 27

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

When fi is nonlinear, E (Yi| xi) cannot be written in closed form. Similarly, var (Yi| xi) is complicated. Alternative approach Write down a model for the marginal conditional mean and covariance matrix E (Yi| xi) and var (Yi| xi) directly.

27 / 37 Multivariate Responses

slide-28
SLIDE 28

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Population-averaged (marginal) modeling

Example Six-Cities study Yi,j = indicator of wheezing in ith child at jth observation time, ti,j. within-individual covariates: δi,j = indicator of mother’s smoking status at time ti,j. among-individual covariates: location, gender, ...

28 / 37 Multivariate Responses

slide-29
SLIDE 29

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

We could use a logistic regression model: E (Yi,j| xi) = eβ0+β1ti,j+β2δi,j+aT

i β3

1 + eβ0+β1ti,j+β2δi,j+aT

i β3 .

Since Yi,j is Bernouilli, var (Yi,j| xi) = E (Yi,j| xi) {1 − E (Yi,j| xi)} , and we need only to specify correlations.

29 / 37 Multivariate Responses

slide-30
SLIDE 30

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The objective is to understand the “typical (average) response vector” as a function of covariates, i.e., the dependence of E (Yi| xi)

  • n xi. In the above example, it is the probability of wheezing at each

age as a function of covariates that is of scientific interest. This application is an example of a cohort study, in which a group of individuals is followed and information is recorded on each, with the

  • bjective of understanding behavior over time.

Such studies are common in epidemiology, which seeks to understand the interplay between different potential risk factors and outcomes that are important to public health.

30 / 37 Multivariate Responses

slide-31
SLIDE 31

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

On the basis of observation of associations among risk factors and responses, epidemiologists would like to make public policy recommendations. In such circumstances, individual behavior is not the focus; rather, the objective is to understand the phenomenon of interest at the population level, so that broad recommendations can be made. This is different from the situation in pharmacokinetics. Thus, it is not routine to build a model for individual behavior, but to model the average behavior E (Yi| xi) directly.

31 / 37 Multivariate Responses

slide-32
SLIDE 32

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

A general form For the mean: E (Yi| xi) = fi (xi, β) (ni × 1) Note: fi(·) and β do not have the same interpretation as in a subject-specific model. For the variances and covariances: var (Yi| xi) = Vi (β, ξ, xi) (ni × ni) Specify models for the variances and correlations separately. Failure to take correlation into account might result in inefficient and potentially misleading inferences.

32 / 37 Multivariate Responses

slide-33
SLIDE 33

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Subject-Specific versus Population-Averaged models

Subject-specific approach Parameters often have direct subject-matter interpretation. Interest typically focuses on the distribution of particular aspects

  • f individual behavior.

Interest may well focus on the population of responses. The approach may be adopted solely as a mechanism for modeling correlation: the random effects induce a correlation structure. Population-averaged approach Fitting and making inferences about population-averaged models are direct extensions of univariate methods.

33 / 37 Multivariate Responses

slide-34
SLIDE 34

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The two approaches may or may not lead to the same model. If f is linear, the two give the same models: E (Yi| xi) = E {E (Yi| xi, bi)| xi} = E {fi (xi, β, bi)| xi} = E(Xiβi|xi) = E(Xiβ + Xibi|xi) = Xiβ.

34 / 37 Multivariate Responses

slide-35
SLIDE 35

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

The interpretations are still different: From the subject-specific perspective, β has the interpretation as the mean of the population of individual regression parameters βi that dictate individual-specific mean models. Thus, β may be interpreted as the “typical parameter value.” From the population-averaged perspective, β has the interpretation as the parameter producing the “typical response vector.” If f is nonlinear: the two give different models; which approach is preferred usually depends on the application and scientific objective.

35 / 37 Multivariate Responses

slide-36
SLIDE 36

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Specifying Variance-Covariance Structure

Variances may be suggested by the nature of the response, or by exploration as in the univariate case. Covariance structure is usually completed by specifying correlations: unstructured; compound symmetry; m-dependent; AR(1); exponential (simplifies to AR(1) for equally-spaced times); Gaussian.

36 / 37 Multivariate Responses

slide-37
SLIDE 37

ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response

Remarks For many types of response, the correlation matrix need only be nonnegative definite. For others, notably binary (Bernouilli) responses, the means may impose constraints on the correlations. We often ignore these constraints, and may in fact use a “working” variance-covariance structure that is impossible for the type of response.

37 / 37 Multivariate Responses