Arnoˇ st Kom´ arek
- Dept. of Probability and Mathematical Statistics
Regression modelling of misclassified correlated interval-censored - - PowerPoint PPT Presentation
Arno st Kom arek Dept. of Probability and Mathematical Statistics Regression modelling of misclassified correlated interval-censored data Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics
1
2
a
b
3
4
5
6
7
st Kom´ arek .
Longitudinal dental study, Flanders (Belgium), 1996 – 2001. 2 315 boys, 2 153 girls followed from 7 until 12 years old (primary
Annual dental examinations. Sixteen trained dental examiners.
Each child examined in general by different examiner in each year.
Data on oral hygiene and dietary habits.
st Kom´ arek
Gender (boys vs. girls). Presence of sealants. Frequency of brushing (daily / not daily). Geographical location.
st Kom´ arek
st Kom´ arek
st Kom´ arek
st Kom´ arek
CE is a progressive disease
CE status checked only at discrete occasions
Teeth in one mouth share common environment, genetical
st Kom´ arek
st Kom´ arek
T(i,j): event (CE) time of tooth j on subject i,
Y(i,j)(t): 0/1 CE status of tooth (i, j) at time t. x(i,j): potential risk factors, covariates to explain T(i,j)
0 = v(i,0) < v(i,1) < v(i,2) < · · · < v(i,Ki) < v(i,Ki+1) = ∞:
Y(i,j) =
st Kom´ arek
st Kom´ arek
Not easy and somehow subjective diagnosis of CE
st Kom´ arek
Not easy and somehow subjective diagnosis of CE
st Kom´ arek
st Kom´ arek
st Kom´ arek
st Kom´ arek
Longitudinal follow-up. Event status checked at pre-specified time points.
Assumption here: visit times independent of the event time.
Occurrence of event is determined by a diagnostic test (with possi-
Frequent for many non-death events. Nevertheless, data are mostly analyzed as if both sensitivity and speci-
Follow-up is not scheduled to stop after the first positive result.
Frequent in longitudinal studies where the event is not the primary study
st Kom´ arek
1
No external information on the sensitivity/specificity values.
st Kom´ arek
1
No external information on the sensitivity/specificity values.
2
Can we estimate sensitivity/specificity of the event
st Kom´ arek
1
No external information on the sensitivity/specificity values.
2
Can we estimate sensitivity/specificity of the event
3
st Kom´ arek
p
p(Ti): survival model for (correlated) event times
st Kom´ arek
p
p(Ti): survival model for (correlated) event times
+
st Kom´ arek
Independence among subjects (children):
N
+
st Kom´ arek
Event classification Y(i,j,k) for given unit (tooth j) at given time (k) is
J
Ki
In the rest: form of p(Y(i,j,k) | T(i,j)) for given j (tooth) and k (visit time).
st Kom´ arek
α: examiner’s sensitivity
η: examiner’s specificity
st Kom´ arek
Different examiners have different ability to detect event (caries)
st Kom´ arek
Different examiners have different ability to detect event (caries)
It is not necessarily as easy to detect caries on all teeth
st Kom´ arek
index (id) of examiner who scored (all) teeth of the ith child during his/her
st Kom´ arek
α =
1 , . . . , α⊤ Q
η =
1 , . . . , η⊤ Q
Y(i,j,k) (ξ(i,k),j) (1 − α(ξ(i,k),j))1−Y(i,j,k),
1−Y(i,j,k) (ξ(i,k),j) ,
st Kom´ arek
age of examiner, gender of examiner, tooth position in the mouth,
st Kom´ arek
+
+
st Kom´ arek
frailty Cox model, random intercept accelerated failure time (AFT) model,
st Kom´ arek
(i,j)β + bi + ε(i,j)
β: regression coefficients. ε(1,1), . . . , ε(N,J): i.i.d. with zero-mean density gε(·). b1, . . . , bN: i.i.d. with density gb(·) .
bi (common for all j) induces dependence between T(i,1), . . . , T(i,J),
st Kom´ arek
ε).
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
st Kom´ arek
ε),
l=−M wl N(κl, ζ2)
Model parameters: σ2
ε, w =
Penalized Gaussian mixture:
Regularization using penalized differences of (transformed) weights
st Kom´ arek
(i,j)β + bi + ε(i,j)
convolution of a full parametric Normal and a semi-parametric
Kom´
Kom´
Kom´
st Kom´ arek
N
N
+
N
+
unknown parameters: α =
random intercept AFT model with a PGM distribution of random intercept; unknown parameters: regression coefficients β, intercept µ, variances τ 2
ε, mixture weights w.
st Kom´ arek
+
j=1
Ki
k=1 p
Y(i,j,k)
1−Y(i,j,k)
k
Y(i,j,k)
Ki +1
1−Y(i,j,k)
st Kom´ arek
+
p
Unknown parameters: β, σ2
ε.
p(bi): normal mixture following from the PGM model.
Unknown parameters: w, α, τ 2.
st Kom´ arek
Possible. All integrals in the likelihood disappear in calculations if Bayesian
st Kom´ arek
AFT regression parameters:
(Inverted) variance of the AFT error terms:
ε
Location of the random intercepts:
(Inverted) squared scale of the random intercepts:
st Kom´ arek
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
κ−9 κ−6 κ−3 κ0 κ3 κ6 κ9
st Kom´ arek
Remember: bi ∼ µ + τ M
l=−M wl N(κl, ζ2), where M is relatively
Weights w should sum-up to one. It is primarily worked with the transformed weights
m=−M exp(am)
Regularization prior for the (transformed) weights.
st Kom´ arek
M
∆o: difference operator of order o
λ: smoothing hyperparameter
st Kom´ arek
For each q (examiners) and j (unit – tooth)
Identification constraint: α(q,j) + η(q,j) > 1. Prior:
st Kom´ arek
Parameters of the event-time model (β, σ2
ε, PGM parameters
Nothing new compared to the situation without misclassification, see
st Kom´ arek
Parameters of the event-time model (β, σ2
ε, PGM parameters
Nothing new compared to the situation without misclassification, see
Augmented event times T(i,j):
Sampling from a mixture of truncated log-normals. Truncation: intervals between the visit times. Mixture weights:
st Kom´ arek
Parameters of the event-time model (β, σ2
ε, PGM parameters
Nothing new compared to the situation without misclassification, see
Augmented event times T(i,j):
Sampling from a mixture of truncated log-normals. Truncation: intervals between the visit times. Mixture weights:
st Kom´ arek
Parameters of the event-time model (β, σ2
ε, PGM parameters
Nothing new compared to the situation without misclassification, see
Augmented event times T(i,j):
Sampling from a mixture of truncated log-normals. Truncation: intervals between the visit times. Mixture weights:
Sensitivities (α’s) and specificities (η’s):
Sampling from truncated Beta distributions.
st Kom´ arek
var(ε(i,j)) = σb
st Kom´ arek
st Kom´ arek
gb: bimodal two-component N mixture
0.60 0.65 N α11
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: Gumbel
0.60 0.65 N α11
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: bimodal two-component N mixture
0.88 0.90 0.92 0.94 N α44
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: Gumbel
0.86 0.88 0.90 0.92 0.94 0.96 N α44
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: bimodal two-component N mixture
0.15 0.20 0.25 0.30 N β1
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: Gumbel
0.20 0.25 0.30 N β1
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: bimodal two-component N mixture
−0.05 0.00 0.05 0.10 0.15 0.20 N β1
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: Gumbel
−0.05 0.00 0.05 0.10 0.15 0.20 N β1
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
σb/σε = 5 gb : bimodal two-component N mixture
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
gb : Gumbel
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
st Kom´ arek
σb/σε = 0.5 gb : bimodal two-component N mixture
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
gb : Gumbel
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
st Kom´ arek
σb/σε = 5
gb : bimodal two-component N mixture
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
gb : Gumbel
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
st Kom´ arek
σb/σε = 0.5
gb : bimodal two-component N mixture
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
gb : Gumbel
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
st Kom´ arek
st Kom´ arek
gb: bimodal two-component N mixture
0.96 0.97 0.98 0.99 1.00 N α11
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: Gumbel
0.996 0.997 0.998 0.999 N α11
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: bimodal two-component N mixture
0.20 0.25 N β1
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
gb: Gumbel
0.16 0.18 0.20 0.22 0.24 0.26 N β1
500 1000 2000 500 1000 2000 500 1000 2000 500 1000 2000
0.5 1 2 5 σb σε:
st Kom´ arek
σb/σε = 5 gb : bimodal two-component N mixture
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
gb : Gumbel
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
st Kom´ arek
σb/σε = 0.5 gb : bimodal two-component N mixture
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
gb : Gumbel
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 500
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 1000
Time S(t) 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
N = 2000
Time S(t)
st Kom´ arek
May differ in specification of the event-time and/or the misclassification
N
J
Y[−(i,j)]: data without observation of unit (tooth) j of subject (child) i; pM(· | ·): posterior predictive distribution.
st Kom´ arek
ε | Y
ε
B
i
ε
st Kom´ arek
ε
Ki +1
v(i,k−1)
(i,j)β + bi, σ2 ε
st Kom´ arek
(i,j)β + ε(i,j)
T(i,j) Age at getting caries on tooth j (∈ {1, 2, 3, 4}) of a child i. x(i,j): gender, presence of sealants, frequency of brushing, x and y
16 examiners. Model M1: sensitivities/specificities both examiner and tooth specific
Model M2: sensitivities/specificities only examiner specific
st Kom´ arek
st Kom´ arek
(posterior means and 95% HPD credible intervals)
0.70 0.75 0.80 0.85 0.90 0.95 1.00 Examiner Sensitivity 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
st Kom´ arek
(posterior means and 95% HPD credible intervals)
0.70 0.75 0.80 0.85 0.90 0.95 1.00 Examiner Specificity 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
st Kom´ arek
(standardized, pointwise posterior means)
−3 −2 −1 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 b g(b)
st Kom´ arek
st Kom´ arek
(pointwise posterior means)
5 10 15 0.0 0.2 0.4 0.6 0.8 1.0 Age S(t) Boy: Seal:More freq. Girl: Seal:More freq. Boy: Seal:Less freq. Boy: No seal:More freq. Girl: Seal:Less freq. Girl: No seal:More freq. Boy: No seal:Less freq. Girl: No seal:Less freq.
st Kom´ arek
(pointwise posterior means)
5 10 15 0.0 0.1 0.2 0.3 0.4 Age h(t) Girl: No seal:Less freq. Boy: No seal:Less freq. Girl: No seal:More freq. Girl: Seal:Less freq. Boy: No seal:More freq. Boy: Seal:Less freq. Girl: Seal:More freq. Boy: Seal:More freq.
st Kom´ arek
st Kom´ arek
Not only human examiners but also labo procedures have usually sen-
st Kom´ arek
Not only human examiners but also labo procedures have usually sen-
the event-time process (survival functions, regression parameters, . . . ); the misclassification process (sensitivities, specificities).
st Kom´ arek
Only small parts of the MCMC scheme would have to be modified.
Logit model.
st Kom´ arek
Only small parts of the MCMC scheme would have to be modified.
Logit model.
Useful if a learning-by-doing can be expected in event-classification. Likely not possible with our “joint” approach due to identifiability prob-
External (validation) data needed to estimate parameters of the mis-
st Kom´ arek
st Kom´ arek
st Kom´ arek
IA-ZATTERA, JARA, KOM´ AREK (2015+). A flexible AFT model for mis-
KOM´
AREK, LESAFFRE, HILTON (2005). Accelerated failure time model for arbitrarily
censored data with smoothed error distribution. Journal of Computational and Graphi- cal Statistics, 14(3), 726–745. KOM´
AREK, LESAFFRE, LEGRAND (2007). Baseline and treatment effect heterogene-
ity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26(30), 5457–5472. KOM´
AREK, LESAFFRE (2008). Bayesian accelerated failure time model with multivariate
doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103(482), 523–533. GARC´
IA-ZATTERA, MUTSVARI, JARA, DECLERCK, LESAFFRE (2010). Correcting for
misclassification for a monotone disease process with an application in dental research. Statistics in Medicine, 29(30), 3103–3117. GARC´
IA-ZATTERA, JARA, LESAFFRE, MARSHALL (2012).
Modeling of multivariate monotone disease processes in the presence of misclassification. Journal of the Amer- ican Statistical Association, 107(499), 976–989.
st Kom´ arek