Trajectory Modeling by Shape
Nicholas P. Jewell Departments of Statistics & School of Public Health (Biostatistics) University of California, Berkeley Victorian Centre for Biostatistics Murdoch Children’s Research Institute March 6, 2014
1
Trajectory Modeling by Shape Nicholas P. Jewell Departments of - - PowerPoint PPT Presentation
Trajectory Modeling by Shape Nicholas P. Jewell Departments of Statistics & School of Public Health (Biostatistics) University of California, Berkeley Victorian Centre for Biostatistics Murdoch Childrens Research Institute March 6,
1
Gaussian mixtures on estimation and inference: an application to longitudinal modeling. Statistics in Medicine, 2013, 32, 2790-2803 .
longitudinal data by shape. Submitted for publication.
3
4
National Longitudinal Study of Youth (NLSY) from 1979 - 2008.
5
National Longitudinal Study of Youth (NLSY) from 1979 - 2008.
6
National Longitudinal Study of Youth (NLSY) from 1979 - 2008.
7
covariance misspecification
working correlation structure
modeling the mean over time
8
with frequencies: where and .
Gaussian with mean and covariance .
as a working correlations structure:
kI).
k=1πk = 1
9
where the regression model (and coefficients) are assumed the same for each cluster, and is the jth observation for the ith individual where
ij
1 ≤ j ≤ mi
10
j=1exp(γjz)
Z is set of same or different covariates This expands to include the s also
11
(White, 1982; Heggeseth and Jewell, 2013).
12
assume independence. Black dashed -- true means, Solid lines – estimated means ˆ SEI(β01) = 0.02, ˆ SER(β01) = 0.06
ˆ SEI(β01) = 0.01, ˆ SER(β01) = 0.01
13
even when you wrongly assume independence. Black dashed -- true means, Solid lines – estimated means ˆ SEI(β01) = 0.02, ˆ SER(β01) = 0.06 ˆ SEI(β01) = 0.03, ˆ SER(β01) = 0.04
14
Covariance makes a difference to the trajectories
15
16
17
18
19
20
2 4 6 8 10 5 10 15 20 x y
21
2 4 6 8 10 5 10 15 20 x y
22
2 4 6 8 10 5 10 15 20 x y
23
2 4 6 8 10 5 10 15 20 x y
24
2 4 6 8 10 5 10 15 Time Y
How could we group these individuals?
25
How could we group these individuals?
2 4 6 8 10 5 10 15 Time Y
26
Salinas (CHAMACOS) Study
Valley, CA.
children's growth patterns (BMI, neurological measures etc_.
patterns
27
20 40 60 80 100 120 10 15 20 25 30 35 40 Age in months BMI
How could we group these individuals?
28
so that the objects in the same group are more similar to each
29
estimate parameters.
30
dissimilarity between the 1st and 2nd individuals is
31
to minimize the within-cluster sum of squares where is the mean vector of individuals in .
(K must be known before starting K-means. There are many ways to choose K from the data that try to minimize the dissimilarity within each cluster while maximizing the dissimilarity between clusters: for example, the use of silhouettes.)
32
2 4 6 8 10 5 10 15
K−means
Time Y
33
2 4 6 8 10 5 10 15
K−means
Time Y
34
20 40 60 80 100 120 10 15 20 25 30 35 40
Mixture Model with Independence
Age in months BMI
35
20 40 60 80 100 120 10 15 20 25 30 35 40
Mixture Model with Independence
Age in months BMI
36
20 40 60 80 100 120 10 15 20 25 30 35 40
Mixture Model with Exponential
Age in months BMI
37
clustering techniques)
risk factors and group membership
time ignoring the level
38
“derived” observations
resulting derivative
coefficient as a distance or dissimilarity measure
dcorr(x, y) = 1 − Corr(x, y)
dcos(x, y) = 1 − Σm
j=1xjyj
(Σm
j=1x2 j)(Σm j=1y2 j )
39
any resulting clustering
40
where is an length vector of 1s, and is the jth element of the vector of mean values for the kth group evaluated at the observation times ti . Thus,
41
42
suppressing the individual/group indices for simplicity (Σ is allowed to vary across clusters) This covariance matrix is singular since This naturally reflects the “loss” of one dimension
43
then the induced covariance is exchangeable with negative correlation given by and variance decreases to
correlation ρ then the induced covariance remains exchangeable with negative correlation and reduced variance
σ2m − 1 m )
σ2(1 − ρ) m − 1 m )
44
If (conditional independence with constant variance, then the induced covariance is exchangeable with negative correlation given by and variance decreases to This induced exchangeable correlation is the lower bound for correlation in an exchangeable matrix Thus, if “estimated”, the (true) parameter is on the boundary of the parameter space
σ2m − 1 m )
45
Sum of two non-invertible matrices, but the positive magnitude of the first matrix may counteract the negative correlations of the second.
j=1V ar(tj)[µ0(E(tj))]2
46
500 simulations of where the error covariance matrix is of exponential form with range ρ
j=1V ar(tj)[µ0(E(tj))]2
τI)
47
48
2 4 6 8 10 5 10 15 Time Y
How could we group these individuals?
49
2 4 6 8 10 5 10 15
K−means
Time Y
50
2 4 6 8 10 5 10 15
Vertically Shifted Mixture Model with Exponential
Time Y
51
2 4 6 8 10 5 10 15
Vertically Shifted Mixture Model with Exponential
Time Y
52
negative slope, low level negative slope, high level zero slope middle level positive slope, low level positive slope, high level
Mean functions evaluated at five equidistant points that span [1,10} Including ends of the interval
53
negative slope, low level negative slope, high level zero slope middle level positive slope, low level positive slope, high level
Two components to noise: random individual level perturbation random measurement error across times (exchangeable correlation)
λ)
✏ )
54
55
to BMI
baseline predictors for shape groups)
risk factors and group membership
time ignoring the level
56
57
58