Devils, Details, and Data: Measurement Models and Analysis [PDF]

SLIDE 1

Devils, Details, and Data: Measurement Models and Analysis Strategies for Novel Technology-Based Clinical Outcome Assessments

ISCT SCTM 2018 Au Autumn Mee eeting

Robert M Bilder, UCLA

Michael E. Tennenbaum Family Professor Psychiatry & Biobehavioral Sciences and Psychology David Geffen School of Medicine Semel Institute for Neuroscience and Human Behavior

SLIDE 2

SLIDE 3

New clinical outcomes assessment methods require new strategies

Changes compared to old-fashioned RCTs
Traditional RCT - primary endpoint was usually:
A test summary score…
Reflecting performance across a fixed bunch of items…
From a single test instrument…
That was administered by a trained human…
At one point in time…
With results recorded on a clinical record form and…
Then transcribed into a database for analysis…

SLIDE 4

New behavior sampling methods require new strategies

Changes compared to old-

fashioned RCTs with primary endpoint include:

Dense temporal sampling
Multivariate sampling
Passive sampling
Machine sampling
More direct sampling of biological

variables

SLIDE 5

Temporal sampling density

Increased density of observations (from

mobile, wearable or IOT)

Sampling may occur more than 1 per second
consider: 8 weeks x 7 days x 24 hours x 60

minutes x 60 seconds = 4.84M measures

Analyze trajectories rather than simple

changes from baseline to endpoint

SLIDE 6

Pros and Cons of Laboratory Assessments

SLIDE 7

SLIDE 8

SLIDE 9

0.2 0.4 0.6 0.8 1 1-occasion 1-day (x5) 2-day (x10) 3-day (x15) 14-day (x70)

SS

0.2 0.4 0.6 0.8 1 1-occasion 1-day (x5) 2-day (x10) 3-day (x15) 14-day (x70)

Dot

0.2 0.4 0.6 0.8 1 1-occasion 1-day (x5) 2-day (x10) 3-day (x15) 14-day (x70)

n-Back

Are the advantages of repeated measures over time any greater than you would expect simply from having more items? Reliability predicted from estimated c-bar is correlated with observed reliability over repeated measures (across 3 tasks x 4 time points: r = .96)

Reliability (alpha) is a function of average inter-item covariance (c-bar), average item variance, and N of items. 12 items/test 12 items/test 2 items/test

SLIDE 10

Multivariate sampling

Single mobile device yields multiple
utputs in different modalities
GPS
Motion
Voice
Video: light/dark, facial affect, oxygenation
EMA
GSR
HR, HRV
Or data may be integrated across

multiple devices

Smart watch or actigraphy
Skin patch sensor
Sleep respiration monitor
EEG, EKG, etc…
Methods to aggregate all these data

types into composite COAs under development…

SLIDE 11

Passive sampling = direct, more objective

Less censoring and bias
f data related to:
Compliance
Effort
Intent
Examinee less prepared

for assessment

Measures less likely to

be affected by expectancy biases

Presumably better at
vercoming placebo

effects

Overall, correlations were low-to-moderate with a mean of 0.37 (SD = 0.25) and a range of -0.71 to 0.98 A comparison of direct versus self-report measures for assessing physical activity in adults: a systematic review; Prince et al 2008

SLIDE 12

Machine sampling

Increased precision
Probably decreased flexibility
All flexibility must be programmed in advance (there is no “on

the fly” flexibility that occurs with humans, for better or worse)

Interaction monitoring still early (e.g., interactive video

monitoring of engagement during assessment)

Unclear impacts on human responders
Tech naïve older adults vs early adopters
Consider “rod & frame” studies…

SLIDE 13

BUT – we still face the same reliability and validity concerns

Reliability
Internal consistency, construct validity
Test-retest reliability: stability, bias, effects of repeated

measurement

Inter-rater, Inter-site, Inter-national reliability
At least as good as conventional measures?
Criterion validity
With respect to existing measures
With respect to clinical outcomes
At least as good as conventional measures?

SLIDE 14

Using IRT for co-calibration of tests and longitudinal assessment

Test linking
Quantify shared latent trait that both

instruments measure

Typically requires at least some linking or

“anchor” items

Examine differential item functioning (DIF) for

anchor items

Summaries include:
Test characteristic curves: plot most likely score

for each level of ability

Test information curves: plot measurement

precision at each level of ability

Assumption that test characteristics are

constant over time is probably wrong

Regression and change score approaches all

assume linearity across scale – not true for virtually any test

SLIDE 15

SLIDE 16

From Crane et al 2008

SLIDE 17

SLIDE 18

Methods to Assure Equivalency

General measurement invariance issues, using multiple group

confirmatory factor analysis (CFA)

Equal form: The number of factors and the pattern of factor-indicator

relationships are identical across groups (aka configural equivalence).

Equal loadings: Factor loadings are equal across groups (aka metric

equivalence).

Equal intercepts: When observed scores are regressed on each factor, the

intercepts are equal across groups (aka scalar equivalence).

Equal residual variances: The residual variances of the observed scores not

accounted for by the factors are equal across groups (aka uniqueness equivalence).

SLIDE 19

Measurement Invariance Methods for Introducing New Methods into Clinical Trials

Assessment of measurement invariance typically requires:
Shared “linking” items across instruments that serve as “anchors” against

which other aspects of covariance can be judged

Absent linking items, comparability can be established by studying the

same people with both methods. This is the conventional criterion validity approach or assessment of “concurrent validity.”

Other strategies are possible for integrative data analysis, sometimes

even without linking items and without having a shared sample:

Variable network harmonization
Covariance structure harmonization
Factor alignment

SLIDE 20

Classical psychometric and network approaches to measurement invariance

Psychometric

Major depression

Dysphoria Anhedonia ↓ Appetite ↑ Appetite Insomnia Dysphoria Anhedonia ↓ Appetite ↑ Appetite Insomnia

Network

Psychometric model Assumes latent variable

Constrains correlations

Network model No constraints on correlations

Saturated model

If networks harmonize…

… so will factor model
… so will composites

SLIDE 21

Method – Harmonize matching symptoms bottom – up or backward-search method)

1. No initial constraint on correlations (“fully saturated” model) 2. Add constraints until fit is maximized

CFI: scale from worst (0) to best (1) possible fit; > .95
RMSEA: misfit per degree of freedom; < .05
SRMR: size of model residuals; < .05
Backwards search algorithm, minimizing loss function:
𝑀𝑃𝑇𝑇 = 𝑁𝐵𝑌 𝑆𝑁𝑇𝐹𝐵, 𝑇𝑆𝑁𝑆, 2 ∗ 1 − 𝐷𝐺𝐽

.

3. Identify and diagnose non-harmonized symptoms

Content/wording differences
Language differences
Measurement scale/response option differences
Population differences in symptom expression

SLIDE 22

Depression – Matching symptoms

Model fit: CFI=.992, RMSEA=.061, SRMR=.089

SCID N=1290 DI-PAD N=3344

Symptom Name SCID DI-PAD Dysphoria (Depression) A52 OPCRIT37 Loss of pleasure A53 OPCRIT39 Weight loss/decreased appetite A55 OPCRIT489 Weight gain/increased appetite A56 OPCRIT501 Insomnia A58 OPCRIT4456 Excessive sleep A59 OPCRIT47 Slowed activity A62 OPCRIT24 Loss of energy or fatigue A63 OPCRIT25 Inappropriate guilt A66 OPCRIT42 Impaired Concentration A68 OPCRIT41 Suicidal ideation A72 OPCRIT43

SLIDE 23

Depression – Matching symptoms

Model fit: CFI=.999, RMSEA=.032, SRMR=.038

SCID N=1290 DI-PAD N=3344

Symptom Name SCID DI-PAD MADr Dysphoria (Depression) A52 OPCRIT37 Loss of pleasure A53 OPCRIT39 .256 Weight loss/decreased appetite A55 OPCRIT489 Weight gain/increased appetite A56 OPCRIT501 Insomnia A58 OPCRIT4456 .114 Excessive sleep A59 OPCRIT47 .161 Slowed activity A62 OPCRIT24 Loss of energy or fatigue A63 OPCRIT25 Inappropriate guilt A66 OPCRIT42 Impaired Concentration A68 OPCRIT41 Suicidal ideation A72 OPCRIT43

SLIDE 24

Depression – Non-matching symptoms

SCID N=1290 DI-PAD N=3344

Symptom Name SCID DI-PAD Psychomotor agitation A61 Feelings of worthlessness A65 Indecisiveness A69 Recurrent thoughts of death A71 Specific plan A73 Suicide attempts A74 Altered libido OPCRIT40 Diurnal variation OPCRIT38

Residual variance Residual correlation Low residual variance

SLIDE 25

DI-PAD (Bipolar) SCID (Dutch bipolar) IRT-Based Harmonization

SLIDE 26

SLIDE 27

SLIDE 28

𝐺𝐻𝑆𝑁

∗

= ෍

𝑕1<𝑕2

෍

𝑞∈𝐽1,𝑞∈𝐽2

𝑥

𝑕1,𝑕2𝑔 𝜇𝑞𝑕1,1 − 𝜇𝑞𝑕2,1

+ ෍

𝑕1<𝑕2

෍

𝑞∈𝐽1,𝑞∈𝐽2

𝑥

𝑕1,𝑕2𝑔 𝜉𝑞0𝑕1,1 − 𝜉𝑞0𝑕2,1

With these modifications, the final alignment complexity function is given by As described above, measurement non-invariance is only minimized for items which appear in each pair of instruments, and

nly the first measurement intercept is considered.

Next, alignment proceeds as in the continuous case by minimizing the graded response model (GRM) complexity function: 𝐺𝐻𝑆𝑁 = ෍

𝑞

෍

𝑕1<𝑕2

𝑥

𝑕1,𝑕2𝑔 𝜇𝑞𝑕1,1 − 𝜇𝑞𝑕2,1 + ෍ 𝑞

෍

𝑕1<𝑕2

෍

𝑟

𝑥

𝑕1,𝑕2𝑔 𝜉𝑞𝑟𝑕1,1 − 𝜉𝑞𝑟𝑕2,1

Note the extra summation in the second term, which accounts for multiple measurement intercepts in the graded response model. After the model parameters are aligned in the factor analytic metric, the aligned IRT model parameters are given by the following transformations: 𝑏𝑞𝑕1,1 = 1.7 ∗ 𝜇𝑞𝑕1,1 ∗ 𝜔𝑞𝑕 1 − 𝜇𝑞𝑕1,1

2

𝜔𝑞𝑕 𝑒𝑞𝑟𝑕1,1 = 𝑒𝑞𝑟𝑕1,1 − 𝑏𝑞𝑕1,1 ∗ 𝛽𝑕

SLIDE 29

~20,000 cases with schizophrenia, schizoaffective disorder, bipolar disorder, major depressive disorder and autism spectrum disorder, relatives and controls from >10 cohorts

SLIDE 30

SLIDE 31

Many thanks!

rbilder@mednet.ucla.edu

http://www.semel.ucla.edu/creativity http://healthy.ucla.edu

Consortium for Neuropsychiatric Phenomics (52 investigators); Investigators in current RDoC projects, and Whole Genome Sequencing in Psychiatric Disorders (WGSPD; Freimer et al.). Special thanks to Steve Reise, Max Mansolf, Annabel Vreeker, Catherine Sugar, Gerhard Helleman, and Ariana Anderson. Supported by NIH Grants: (CNP) UL1-DE019580, RL1MH083268, RL1MH083269, RL1DA024853, RL1MH083270, RL1LM009833, PL1MH083271, and PL1NS062410; (Cognitive Atlas) RO1NS061771; (Multilevel WM/RDoC) R01MH101478; (Modeling/RDoC) R03MH106922; (WGSPD) U01 MH105578.