Trajectory Modeling by Shape Nicholas P. Jewell Departments of - - PowerPoint PPT Presentation

trajectory modeling by shape
SMART_READER_LITE
LIVE PREVIEW

Trajectory Modeling by Shape Nicholas P. Jewell Departments of - - PowerPoint PPT Presentation

Trajectory Modeling by Shape Nicholas P. Jewell Departments of Statistics & School of Public Health (Biostatistics) University of California, Berkeley Victorian Centre for Biostatistics Murdoch Childrens Research Institute March 6,


slide-1
SLIDE 1

Trajectory Modeling by Shape

Nicholas P. Jewell Departments of Statistics & School of Public Health (Biostatistics) University of California, Berkeley Victorian Centre for Biostatistics Murdoch Children’s Research Institute March 6, 2014

1

slide-2
SLIDE 2

Thanks

  • Joint work with Brianna Heggeseth (Williams College)
  • Heggeseth, BC and Jewell, NP. The impact of covariance misspecification in multivariate

Gaussian mixtures on estimation and inference: an application to longitudinal modeling. Statistics in Medicine, 2013, 32, 2790-2803 .

  • Heggeseth, BC and Jewell, NP. Vertically shifted mixture models for clustering

longitudinal data by shape. Submitted for publication.

References

slide-3
SLIDE 3

3

“Understanding our world requires conceptualizing the similarities and differences between the entities that compose it” Robert Tryon and Daniel Bailey, 1970

slide-4
SLIDE 4

4

How does BMI change with age?

National Longitudinal Study of Youth (NLSY) from 1979 - 2008.

slide-5
SLIDE 5

5

How does BMI change with age?

National Longitudinal Study of Youth (NLSY) from 1979 - 2008.

slide-6
SLIDE 6

6

How does BMI change with age?

National Longitudinal Study of Youth (NLSY) from 1979 - 2008.

slide-7
SLIDE 7

7

Typical Longitudinal Analysis

  • Use Generalized Estimating Equations (GEE) to estimate the mean
  • utcome, and how it changes over time, adjusting for covariates
  • regression parameter estimation is consistent despite potential

covariance misspecification

  • efficiency can be gained through use of a more appropriate

working correlation structure

  • robust (sandwich) standard error estimators available
  • But, with a heterogeneous population,
  • BMI does not change much for some people as they age
  • BMI changes considerably for some people as they age
  • We don’t wish to average out these separate trajectories by

modeling the mean over time

slide-8
SLIDE 8

8

Finite Mixture Models

  • Data for n individuals: measured at times
  • We assume K latent trajectories in the population that are distributed

with frequencies: where and .

  • The (conditional) mixture density is , a multivariate

Gaussian with mean and covariance .

  • In most trajectory software, (conditional) independence is assumed

as a working correlations structure:

π1, . . . , πK

Σk

(Σk = σ2

kI).

yi = (yi1, . . . , yim)

f(y|t, θ) = π1f(y|t, β1, Σ1) + · · · + πKf(y|t, βK, ΣK)

πk > 0

ΣK

k=1πk = 1

f(y|t, βk, Σk)

µk

ti = (ti1, . . . , timi)

θ = (π1, . . . , πK; β1, . . . , βK; Σ1, . . . , ΣK)

slide-9
SLIDE 9

9

Finite Mixture Models

  • The mean vector is related to the observation times as follows:
  • Linear:
  • Quadratic:
  • Splines in observation times

where the regression model (and coefficients) are assumed the same for each cluster, and is the jth observation for the ith individual where

µk

(µk)j = β0 + β1tij (µk)j = β0 + β1tij + β2t2

ij

tij

1 ≤ j ≤ mi

slide-10
SLIDE 10

10

Finite Mixture Models

  • Group membership: πk =

exp(γkz) ΣK

j=1exp(γjz)

Z is set of same or different covariates This expands to include the s also

θ

γ

slide-11
SLIDE 11

11

Estimation for Mixture Models

  • Maximum likelihood estimation for θ via the

EM algorithm

  • K is pre-specified; can be chosen using the BIC
  • Parameter estimators are not consistent under covariance misspecification

(White, 1982; Heggeseth and Jewell, 2013).

  • Robust (sandwich) standard error estimators are available.
  • How bad can the bias in regression estimators be? What influences its size?
slide-12
SLIDE 12

12

Mispecified Covariance Structure Bias and Separation of Trajectories

  • Separated components lead to little bias even when you wrongly

assume independence. Black dashed -- true means, Solid lines – estimated means ˆ SEI(β01) = 0.02, ˆ SER(β01) = 0.06

ˆ SEI(β01) = 0.01, ˆ SER(β01) = 0.01

slide-13
SLIDE 13

13

Mispecified Covariance Structure Bias and Level of Dependence

  • Components with little dependence lead to small bias

even when you wrongly assume independence. Black dashed -- true means, Solid lines – estimated means ˆ SEI(β01) = 0.02, ˆ SER(β01) = 0.06 ˆ SEI(β01) = 0.03, ˆ SER(β01) = 0.04

slide-14
SLIDE 14

14

NLSY Data Analysis

Covariance makes a difference to the trajectories

  • hard to estimate bias from mispecified covariance
slide-15
SLIDE 15

15

How Do We Group These Blocks?

slide-16
SLIDE 16

16

Group by Color

slide-17
SLIDE 17

17

Group by Shape

slide-18
SLIDE 18

18

How Do We Group These Blocks?

slide-19
SLIDE 19

19

Group by Color or Shape

slide-20
SLIDE 20

20

How Do We Group These (Regression) Lines?

2 4 6 8 10 5 10 15 20 x y

slide-21
SLIDE 21

21

Group by Intercept

2 4 6 8 10 5 10 15 20 x y

slide-22
SLIDE 22

22

Group by Level

2 4 6 8 10 5 10 15 20 x y

slide-23
SLIDE 23

23

Group by Shape (Slope)

2 4 6 8 10 5 10 15 20 x y

slide-24
SLIDE 24

24

Simulated Data

2 4 6 8 10 5 10 15 Time Y

How could we group these individuals?

slide-25
SLIDE 25

25

Simulated Data

How could we group these individuals?

2 4 6 8 10 5 10 15 Time Y

slide-26
SLIDE 26

26

Real Longitudinal Data

  • Center for the Health Assessment of Assessment of Mothers and Children of

Salinas (CHAMACOS) Study

  • In 1999-2000, enrolled 601 pregnant women in agricultural Salinas

Valley, CA.

  • Mostly Hispanic, agricultural workers.
  • Determine if exposure to pesticides and other chemicals impact

children's growth patterns (BMI, neurological measures etc_.

  • First, focus on studying/estimating the growth patterns of children.
  • Second, determine if early life predictors are related to the

patterns

  • pesticide/chemical exposure in utero
  • ODT, PDT, PDE, BPA (bisphenol A)
slide-27
SLIDE 27

27

CHAMACOS Data

20 40 60 80 100 120 10 15 20 25 30 35 40 Age in months BMI

How could we group these individuals?

slide-28
SLIDE 28

28

Cluster Analyses

  • Clustering is the task of assigning a set of objects into groups

so that the objects in the same group are more similar to each

  • ther than to those in other groups.
  • What does it mean for objects to be more similar or more dissimilar?
  • Distance matrix
  • Why do we cluster objects?
slide-29
SLIDE 29

29

Standard Clustering Methods

  • Partition methods
  • Partition objects into K groups so that an objective function
  • f dissimilarities is minimized or maximized.
  • Example: K-means Algorithm
  • Model-based methods
  • Assume a model that includes a grouping structure and

estimate parameters.

  • Example: Finite Mixture Models
slide-30
SLIDE 30

30

K-means algorithm

  • Input: Data for n individuals in vector form. For individual i , the
  • bserved data vector is
  • Measure of Dissimilarity: Squared Euclidean distance. The

dissimilarity between the 1st and 2nd individuals is

yi = (y1i, . . . , yim).

d(y1 y2) = ky1 y2k2 = (y11 y12)2 + · · · + (yim y2m)2

slide-31
SLIDE 31

31

K-means Algorithm

  • Goal: Partition individuals into K sets so as

to minimize the within-cluster sum of squares where is the mean vector of individuals in .

C = {C1, C2, . . . , CK}

ΣK

k=1Σyi∈Ckkyi µkk2

µk

Ck

(K must be known before starting K-means. There are many ways to choose K from the data that try to minimize the dissimilarity within each cluster while maximizing the dissimilarity between clusters: for example, the use of silhouettes.)

slide-32
SLIDE 32

32

Application to Simulated Data

2 4 6 8 10 5 10 15

K−means

Time Y

slide-33
SLIDE 33

33

Application to Simulated Data

2 4 6 8 10 5 10 15

K−means

Time Y

  • How would you describe—interpret—the group trajectories?
slide-34
SLIDE 34

34

Finite Mixture Model Applied to CHAMACOS Data

20 40 60 80 100 120 10 15 20 25 30 35 40

Mixture Model with Independence

Age in months BMI

slide-35
SLIDE 35

35

Finite Mixture Model Applied to CHAMACOS Data

20 40 60 80 100 120 10 15 20 25 30 35 40

Mixture Model with Independence

Age in months BMI

slide-36
SLIDE 36

36

Finite Mixture Model Applied to CHAMACOS Data

20 40 60 80 100 120 10 15 20 25 30 35 40

Mixture Model with Exponential

Age in months BMI

slide-37
SLIDE 37

37

Clustering by Shape

  • Interested in shape not just level (which appears to dominate

clustering techniques)

  • Want a method that:
  • Works with irregularly sampled data
  • Includes a way to estimate the relationship between baseline

risk factors and group membership

  • Groups individuals according to the outcome pattern over

time ignoring the level

slide-38
SLIDE 38

38

Clustering by Shape Options

  • Estimate slopes between neighboring observations and cluster on the

“derived” observations

  • Fit splines for each individual, differentiate, and cluster on coefficients of

resulting derivative

  • Use partition based cluster methods (like PAM) but use (i) the Pearson

coefficient as a distance or dissimilarity measure

  • r the cosine-angle measure of dissimilarity
  • Vertical shifting individual trajectories

dcorr(x, y) = 1 − Corr(x, y)

dcos(x, y) = 1 − Σm

j=1xjyj

(Σm

j=1x2 j)(Σm j=1y2 j )

slide-39
SLIDE 39

39

Vertical Shifting

  • For each individual, calculate
  • Each individual now has mean zero and so level is removed from

any resulting clustering

  • Apply clustering technique to shifted data, e.g. finite mixture model

y∗

i = yi − m−1 i Σmi j=1yij

slide-40
SLIDE 40

40

Correlation Models for Vertical Shifted Data

  • Without specifying group, suppose

where is an length vector of 1s, and is the jth element of the vector of mean values for the kth group evaluated at the observation times ti . Thus,

Imi mi µij = µk(tij) y∗

i = Aiyi = µi − ¯

µi + ✏i − ¯ ✏i y∗

i = iImi + µi + ✏i, ∼ Fλ, ✏ ∼ N(), Σ)

slide-41
SLIDE 41

41

Correlation Models for Vertical Shifted Data

Cov(Y∗

i − µi) = Cov((A − Imi)µi + A✏)

Two components of the covariance

  • One induced by the averaging process
  • One induced by (random) observation times
slide-42
SLIDE 42

42

Correlation Models for Vertical Shifted Data Observation Times Fixed

suppressing the individual/group indices for simplicity (Σ is allowed to vary across clusters) This covariance matrix is singular since This naturally reflects the “loss” of one dimension

Cov(Y∗ − µ) = AΣAT

det(A) = 0

slide-43
SLIDE 43

43

Correlation Models for Vertical Shifted Data Observation Times Fixed

  • If (conditional independence with constant variance,

then the induced covariance is exchangeable with negative correlation given by and variance decreases to

  • If original covariance is exchangeable with constant variance and

correlation ρ then the induced covariance remains exchangeable with negative correlation and reduced variance

Σ = σ2I

−1/(m − 1)

σ2m − 1 m )

σ2(1 − ρ) m − 1 m )

Cov(Y∗ − µ) = AΣAT

slide-44
SLIDE 44

44

Correlation Models for Vertical Shifted Data Observation Times Fixed

If (conditional independence with constant variance, then the induced covariance is exchangeable with negative correlation given by and variance decreases to This induced exchangeable correlation is the lower bound for correlation in an exchangeable matrix Thus, if “estimated”, the (true) parameter is on the boundary of the parameter space

Σ = σ2I

−1/(m − 1)

σ2m − 1 m )

Cov(Y∗ − µ) = AΣAT

slide-45
SLIDE 45

45

Correlation Models for Vertical Shifted Data Observation Times Random (µ is random)

Sum of two non-invertible matrices, but the positive magnitude of the first matrix may counteract the negative correlations of the second.

Cov(Y⇤ − µ) = m2 Σm

j=1V ar(tj)[µ0(E(tj))]2

11T + AΣAT

slide-46
SLIDE 46

46

Correlation Models for Vertical Shifted Data Observation Times Random (µ is random)

500 simulations of where the error covariance matrix is of exponential form with range ρ

Cov(Y⇤ − µ) = m2 Σm

j=1V ar(tj)[µ0(E(tj))]2

11T + AΣAT

Yi = µi + ✏i ✏i ∼ N(0, Σρ) t = T + τ

τ ∼ N(0, σ2

τI)

T = (1, 2, . . . , 9, 10)

µij = µ(tij)

slide-47
SLIDE 47

47

Correlation Models for Vertical Shifted Data

slide-48
SLIDE 48

48

Simulated Data

2 4 6 8 10 5 10 15 Time Y

How could we group these individuals?

slide-49
SLIDE 49

49

Application to Simulated Data

2 4 6 8 10 5 10 15

K−means

Time Y

  • How would you describe—interpret—the group trajectories?
slide-50
SLIDE 50

50

Vertical Shifting Applied to Simulated Data

2 4 6 8 10 5 10 15

Vertically Shifted Mixture Model with Exponential

Time Y

slide-51
SLIDE 51

51

Vertical Shifting Applied to Simulated Data

2 4 6 8 10 5 10 15

Vertically Shifted Mixture Model with Exponential

Time Y

slide-52
SLIDE 52

52

500 Simulations

negative slope, low level negative slope, high level zero slope middle level positive slope, low level positive slope, high level

µ1(t) = −1 − t µ2(t) = 11 − t µ3(t) = 0 µ4(t) = −11 + t µ5(t) = 1 + t

Mean functions evaluated at five equidistant points that span [1,10} Including ends of the interval

slide-53
SLIDE 53

53

500 Simulations

negative slope, low level negative slope, high level zero slope middle level positive slope, low level positive slope, high level

µ1(t) = −1 − t µ2(t) = 11 − t µ3(t) = 0 µ4(t) = −11 + t µ5(t) = 1 + t

Two components to noise: random individual level perturbation random measurement error across times (exchangeable correlation)

N(0, σ2

λ)

N(0, σ2

✏ )

slide-54
SLIDE 54

54

500 Simulations

slide-55
SLIDE 55

55

Vertical Shifting with CHAMACOS

  • Two-part models
  • First, use standard regression models to relate baseline predictors

to BMI

  • Then, use vertically shifted shape clustering with (same or different

baseline predictors for shape groups)

  • For BMI in the CHAMACOS data
  • Works with irregularly sampled data
  • Includes a way to estimate the relationship between baseline

risk factors and group membership

  • Groups individuals according to the outcome pattern over

time ignoring the level

slide-56
SLIDE 56

56

Vertical Shifting with CHAMACOS

slide-57
SLIDE 57

57

Vertical Shifting with CHAMACOS

slide-58
SLIDE 58

58

Further Thoughts

  • Time-dependent covariates