Devils, Details, and Data: Measurement Models and Analysis - - PowerPoint PPT Presentation

devils details and data
SMART_READER_LITE
LIVE PREVIEW

Devils, Details, and Data: Measurement Models and Analysis - - PowerPoint PPT Presentation

Devils, Details, and Data: Measurement Models and Analysis Strategies for Novel Technology-Based Clinical Outcome Assessments ISCT SCTM 2018 Au Autumn Mee eeting Robert M Bilder, UCLA Michael E. Tennenbaum Family Professor Psychiatry &


slide-1
SLIDE 1

Devils, Details, and Data: Measurement Models and Analysis Strategies for Novel Technology-Based Clinical Outcome Assessments

ISCT SCTM 2018 Au Autumn Mee eeting

Robert M Bilder, UCLA

Michael E. Tennenbaum Family Professor Psychiatry & Biobehavioral Sciences and Psychology David Geffen School of Medicine Semel Institute for Neuroscience and Human Behavior

slide-2
SLIDE 2
slide-3
SLIDE 3

New clinical outcomes assessment methods require new strategies

  • Changes compared to old-fashioned RCTs
  • Traditional RCT - primary endpoint was usually:
  • A test summary score…
  • Reflecting performance across a fixed bunch of items…
  • From a single test instrument…
  • That was administered by a trained human…
  • At one point in time…
  • With results recorded on a clinical record form and…
  • Then transcribed into a database for analysis…
slide-4
SLIDE 4

New behavior sampling methods require new strategies

  • Changes compared to old-

fashioned RCTs with primary endpoint include:

  • Dense temporal sampling
  • Multivariate sampling
  • Passive sampling
  • Machine sampling
  • More direct sampling of biological

variables

slide-5
SLIDE 5

Temporal sampling density

  • Increased density of observations (from

mobile, wearable or IOT)

  • Sampling may occur more than 1 per second
  • consider: 8 weeks x 7 days x 24 hours x 60

minutes x 60 seconds = 4.84M measures

  • Analyze trajectories rather than simple

changes from baseline to endpoint

slide-6
SLIDE 6

Pros and Cons of Laboratory Assessments

slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

0.2 0.4 0.6 0.8 1 1-occasion 1-day (x5) 2-day (x10) 3-day (x15) 14-day (x70)

SS

0.2 0.4 0.6 0.8 1 1-occasion 1-day (x5) 2-day (x10) 3-day (x15) 14-day (x70)

Dot

0.2 0.4 0.6 0.8 1 1-occasion 1-day (x5) 2-day (x10) 3-day (x15) 14-day (x70)

n-Back

Are the advantages of repeated measures over time any greater than you would expect simply from having more items? Reliability predicted from estimated c-bar is correlated with observed reliability over repeated measures (across 3 tasks x 4 time points: r = .96)

Reliability (alpha) is a function of average inter-item covariance (c-bar), average item variance, and N of items. 12 items/test 12 items/test 2 items/test

slide-10
SLIDE 10

Multivariate sampling

  • Single mobile device yields multiple
  • utputs in different modalities
  • GPS
  • Motion
  • Voice
  • Video: light/dark, facial affect, oxygenation
  • EMA
  • GSR
  • HR, HRV
  • Or data may be integrated across

multiple devices

  • Smart watch or actigraphy
  • Skin patch sensor
  • Sleep respiration monitor
  • EEG, EKG, etc…
  • Methods to aggregate all these data

types into composite COAs under development…

slide-11
SLIDE 11

Passive sampling = direct, more objective

  • Less censoring and bias
  • f data related to:
  • Compliance
  • Effort
  • Intent
  • Examinee less prepared

for assessment

  • Measures less likely to

be affected by expectancy biases

  • Presumably better at
  • vercoming placebo

effects

Overall, correlations were low-to-moderate with a mean of 0.37 (SD = 0.25) and a range of -0.71 to 0.98 A comparison of direct versus self-report measures for assessing physical activity in adults: a systematic review; Prince et al 2008

slide-12
SLIDE 12

Machine sampling

  • Increased precision
  • Probably decreased flexibility
  • All flexibility must be programmed in advance (there is no “on

the fly” flexibility that occurs with humans, for better or worse)

  • Interaction monitoring still early (e.g., interactive video

monitoring of engagement during assessment)

  • Unclear impacts on human responders
  • Tech naïve older adults vs early adopters
  • Consider “rod & frame” studies…
slide-13
SLIDE 13

BUT – we still face the same reliability and validity concerns

  • Reliability
  • Internal consistency, construct validity
  • Test-retest reliability: stability, bias, effects of repeated

measurement

  • Inter-rater, Inter-site, Inter-national reliability
  • At least as good as conventional measures?
  • Criterion validity
  • With respect to existing measures
  • With respect to clinical outcomes
  • At least as good as conventional measures?
slide-14
SLIDE 14

Using IRT for co-calibration of tests and longitudinal assessment

  • Test linking
  • Quantify shared latent trait that both

instruments measure

  • Typically requires at least some linking or

“anchor” items

  • Examine differential item functioning (DIF) for

anchor items

  • Summaries include:
  • Test characteristic curves: plot most likely score

for each level of ability

  • Test information curves: plot measurement

precision at each level of ability

  • Assumption that test characteristics are

constant over time is probably wrong

  • Regression and change score approaches all

assume linearity across scale – not true for virtually any test

slide-15
SLIDE 15
slide-16
SLIDE 16

From Crane et al 2008

slide-17
SLIDE 17
slide-18
SLIDE 18

Methods to Assure Equivalency

  • General measurement invariance issues, using multiple group

confirmatory factor analysis (CFA)

  • Equal form: The number of factors and the pattern of factor-indicator

relationships are identical across groups (aka configural equivalence).

  • Equal loadings: Factor loadings are equal across groups (aka metric

equivalence).

  • Equal intercepts: When observed scores are regressed on each factor, the

intercepts are equal across groups (aka scalar equivalence).

  • Equal residual variances: The residual variances of the observed scores not

accounted for by the factors are equal across groups (aka uniqueness equivalence).

slide-19
SLIDE 19

Measurement Invariance Methods for Introducing New Methods into Clinical Trials

  • Assessment of measurement invariance typically requires:
  • Shared “linking” items across instruments that serve as “anchors” against

which other aspects of covariance can be judged

  • Absent linking items, comparability can be established by studying the

same people with both methods. This is the conventional criterion validity approach or assessment of “concurrent validity.”

  • Other strategies are possible for integrative data analysis, sometimes

even without linking items and without having a shared sample:

  • Variable network harmonization
  • Covariance structure harmonization
  • Factor alignment
slide-20
SLIDE 20

Classical psychometric and network approaches to measurement invariance

Psychometric

Major depression

Dysphoria Anhedonia ↓ Appetite ↑ Appetite Insomnia Dysphoria Anhedonia ↓ Appetite ↑ Appetite Insomnia

Network

Psychometric model Assumes latent variable

  • Constrains correlations

Network model No constraints on correlations

  • Saturated model

If networks harmonize…

  • … so will factor model
  • … so will composites
slide-21
SLIDE 21

Method – Harmonize matching symptoms bottom – up or backward-search method)

1. No initial constraint on correlations (“fully saturated” model) 2. Add constraints until fit is maximized

  • CFI: scale from worst (0) to best (1) possible fit; > .95
  • RMSEA: misfit per degree of freedom; < .05
  • SRMR: size of model residuals; < .05
  • Backwards search algorithm, minimizing loss function:
  • 𝑀𝑃𝑇𝑇 = 𝑁𝐵𝑌 𝑆𝑁𝑇𝐹𝐵, 𝑇𝑆𝑁𝑆, 2 ∗ 1 − 𝐷𝐺𝐽

.

3. Identify and diagnose non-harmonized symptoms

  • Content/wording differences
  • Language differences
  • Measurement scale/response option differences
  • Population differences in symptom expression
slide-22
SLIDE 22

Depression – Matching symptoms

Model fit: CFI=.992, RMSEA=.061, SRMR=.089

SCID N=1290 DI-PAD N=3344

Symptom Name SCID DI-PAD Dysphoria (Depression) A52 OPCRIT37 Loss of pleasure A53 OPCRIT39 Weight loss/decreased appetite A55 OPCRIT489 Weight gain/increased appetite A56 OPCRIT501 Insomnia A58 OPCRIT4456 Excessive sleep A59 OPCRIT47 Slowed activity A62 OPCRIT24 Loss of energy or fatigue A63 OPCRIT25 Inappropriate guilt A66 OPCRIT42 Impaired Concentration A68 OPCRIT41 Suicidal ideation A72 OPCRIT43

slide-23
SLIDE 23

Depression – Matching symptoms

Model fit: CFI=.999, RMSEA=.032, SRMR=.038

SCID N=1290 DI-PAD N=3344

Symptom Name SCID DI-PAD MADr Dysphoria (Depression) A52 OPCRIT37 Loss of pleasure A53 OPCRIT39 .256 Weight loss/decreased appetite A55 OPCRIT489 Weight gain/increased appetite A56 OPCRIT501 Insomnia A58 OPCRIT4456 .114 Excessive sleep A59 OPCRIT47 .161 Slowed activity A62 OPCRIT24 Loss of energy or fatigue A63 OPCRIT25 Inappropriate guilt A66 OPCRIT42 Impaired Concentration A68 OPCRIT41 Suicidal ideation A72 OPCRIT43

slide-24
SLIDE 24

Depression – Non-matching symptoms

SCID N=1290 DI-PAD N=3344

Symptom Name SCID DI-PAD Psychomotor agitation A61 Feelings of worthlessness A65 Indecisiveness A69 Recurrent thoughts of death A71 Specific plan A73 Suicide attempts A74 Altered libido OPCRIT40 Diurnal variation OPCRIT38

Residual variance Residual correlation Low residual variance

slide-25
SLIDE 25

DI-PAD (Bipolar) SCID (Dutch bipolar) IRT-Based Harmonization

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

𝐺𝐻𝑆𝑁

= ෍

𝑕1<𝑕2

𝑞∈𝐽1,𝑞∈𝐽2

𝑥

𝑕1,𝑕2𝑔 𝜇𝑞𝑕1,1 − 𝜇𝑞𝑕2,1

+ ෍

𝑕1<𝑕2

𝑞∈𝐽1,𝑞∈𝐽2

𝑥

𝑕1,𝑕2𝑔 𝜉𝑞0𝑕1,1 − 𝜉𝑞0𝑕2,1

With these modifications, the final alignment complexity function is given by As described above, measurement non-invariance is only minimized for items which appear in each pair of instruments, and

  • nly the first measurement intercept is considered.

Next, alignment proceeds as in the continuous case by minimizing the graded response model (GRM) complexity function: 𝐺𝐻𝑆𝑁 = ෍

𝑞

𝑕1<𝑕2

𝑥

𝑕1,𝑕2𝑔 𝜇𝑞𝑕1,1 − 𝜇𝑞𝑕2,1 + ෍ 𝑞

𝑕1<𝑕2

𝑟

𝑥

𝑕1,𝑕2𝑔 𝜉𝑞𝑟𝑕1,1 − 𝜉𝑞𝑟𝑕2,1

Note the extra summation in the second term, which accounts for multiple measurement intercepts in the graded response model. After the model parameters are aligned in the factor analytic metric, the aligned IRT model parameters are given by the following transformations: 𝑏𝑞𝑕1,1 = 1.7 ∗ 𝜇𝑞𝑕1,1 ∗ 𝜔𝑞𝑕 1 − 𝜇𝑞𝑕1,1

2

𝜔𝑞𝑕 𝑒𝑞𝑟𝑕1,1 = 𝑒𝑞𝑟𝑕1,1 − 𝑏𝑞𝑕1,1 ∗ 𝛽𝑕

slide-29
SLIDE 29

~20,000 cases with schizophrenia, schizoaffective disorder, bipolar disorder, major depressive disorder and autism spectrum disorder, relatives and controls from >10 cohorts

slide-30
SLIDE 30
slide-31
SLIDE 31

Many thanks!

rbilder@mednet.ucla.edu

http://www.semel.ucla.edu/creativity http://healthy.ucla.edu

Consortium for Neuropsychiatric Phenomics (52 investigators); Investigators in current RDoC projects, and Whole Genome Sequencing in Psychiatric Disorders (WGSPD; Freimer et al.). Special thanks to Steve Reise, Max Mansolf, Annabel Vreeker, Catherine Sugar, Gerhard Helleman, and Ariana Anderson. Supported by NIH Grants: (CNP) UL1-DE019580, RL1MH083268, RL1MH083269, RL1DA024853, RL1MH083270, RL1LM009833, PL1MH083271, and PL1NS062410; (Cognitive Atlas) RO1NS061771; (Multilevel WM/RDoC) R01MH101478; (Modeling/RDoC) R03MH106922; (WGSPD) U01 MH105578.