Theory in Practice: Modeling in Neuroimaging How to model big MRI - - PowerPoint PPT Presentation

theory in practice modeling
SMART_READER_LITE
LIVE PREVIEW

Theory in Practice: Modeling in Neuroimaging How to model big MRI - - PowerPoint PPT Presentation

Theory in Practice: Modeling in Neuroimaging How to model big MRI datasets Outline of talk Theory recap: modelling approaches can be reduced to two types: predictive and descriptive Big data complicates our ability to apply


slide-1
SLIDE 1

Theory in Practice: Modeling in Neuroimaging

How to model “big” MRI datasets

slide-2
SLIDE 2

Outline of talk

  • Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

  • “Big data” complicates our ability to apply both approaches
  • Marginal Modelling is a good approach good for descriptive modelling
  • Functional Random Forests is a good approach for predictive

modelling

  • Other approaches can also handle big data, but are beyond the scope
  • f this workshop
slide-3
SLIDE 3

Before even considering models, we need to know what question to ask

  • How and where may cortical thickness be associated with working

memory performance?

slide-4
SLIDE 4

Before even considering models, we need to know what question to ask

  • How and where may cortical thickness be associated with working

memory performance?

  • Can measures of functional brain organization predict an individual’s

working memory ability?

slide-5
SLIDE 5

Each question requires a different modelling approach

  • How and where may cortical thickness be associated with working

memory performance? Descriptive modelling

  • Can measures of functional brain organization predict an individual’s

working memory ability? Predictive modelling

slide-6
SLIDE 6

Descriptive models measure what one has collected predictive models measure what one will collect

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-7
SLIDE 7

Descriptive models explore data, predictive models confirm properties of data

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-8
SLIDE 8

Descriptive models provide insight, predictive models apply insight

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-9
SLIDE 9

Descriptive models are limited to in-sample data, predictive models require out-of-sample data

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

slide-10
SLIDE 10

Descriptive models are assessed via theory and inference, predictive models are assessed by independent testing

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

5

slide-11
SLIDE 11

Outline of talk

  • Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

  • “Big data” complicates our ability to apply both approaches
  • Marginal Modelling is a good approach for descriptive modelling
  • Functional Random Forests is a good approach for predictive

modelling

  • Other approaches can also handle big data, but are beyond the scope
  • f this workshop
slide-12
SLIDE 12

First, all health-focused imaging studies should probably be big data

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-13
SLIDE 13

Our ABCD pipeline generates anywhere from 10 to 90 thousand tests

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-14
SLIDE 14

Our ABCD pipeline generates anywhere from 10 to 90 thousand tests (some special cases are in hundreds)

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-15
SLIDE 15

We’ve collected about 10,000 cases

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

slide-16
SLIDE 16

ABCD needed a lot of coordination and data aggregation to collect over 10,000 participants

Auchter et al, 2018, https://doi.org/10.1016/j.dcn.2018.04.003

slide-17
SLIDE 17

Descriptive models must take into account this nested structure

  • Complex models may be slow to calculate when analyzing ~4500

participants

  • Permutation tests may take days or even weeks
  • Permutation tests lack exchangeability for complex questions
slide-18
SLIDE 18

Permutation testing can reveal whether differences in community structure are significantly different

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression

slide-19
SLIDE 19

Permute group assignment and calculate statistic

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression ‘depression’ no depression ‘no depression’

slide-20
SLIDE 20

Do so for multiple permutations and construct a distribution of the statistic for permuted groups

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression ‘depression’ no depression ‘no depression’

slide-21
SLIDE 21

P value is determined by the proportional rank

  • f the observed statistic compared to the

permuted distribution

Frequency

slide-22
SLIDE 22

At a Z=2.3, false positive rates are high when not using permutation testing

slide-23
SLIDE 23

At a Z=3.1, false positive rates are generally better and in-line with the true FP rate

slide-24
SLIDE 24

This all works because each individual is independently acquired from one another – the data are exchangeable

slide-25
SLIDE 25

Independence gets more complicated when you have more complicated designs – but even here we can exchange every individual

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558 Drug use

Cannabis Alcohol Nicotine Stimulant

slide-26
SLIDE 26

However, if a second factor is nested, our permutations are limited to the nested pairs, restricting our permutations

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558 Drug use

Cannabis Alcohol Nicotine Stimulant

Family nested by drug use

slide-27
SLIDE 27

More complex designs have even more restrictions, relative to the total number of permutations

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558 Drug use

Cannabis Alcohol Nicotine Stimulant

Hometown

slide-28
SLIDE 28

In turn, restricted permutations have reduced power when controlling for the false positive rate

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558

slide-29
SLIDE 29

Predictive models must also take into account nested structure

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5736019/

slide-30
SLIDE 30

Scanner effects can be common, independent

  • f site

Gareth Harman, 4/11/19 – combat Cortical Thickness

slide-31
SLIDE 31

ComBat has also been used to correct for ABCD data, which can be predicted by site

Nielson, 2018, biorxiv; http://dx.doi.org/10.1101/309260 Site classification accuracy

slide-32
SLIDE 32

Cross-validation strategies can mitigate known but not unknown effects

  • Stratified validation is possible via independent stratified groups
  • Leave-one-site-out validation can help catch site effects
  • But what about effects of scanner upgrades, software maintenance,
  • r even changes in personnel?
slide-33
SLIDE 33

Outline of talk

  • Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

  • “Big data” complicates our ability to apply both approaches
  • Marginal Modelling is a good approach for descriptive modelling
  • Functional Random Forests is a good approach for predictive

modelling

  • Other approaches can also handle big data, but are beyond the scope
  • f this workshop
slide-34
SLIDE 34

The marginal model may be a more feasible solution for modeling ABCD populations

  • Strengths:
  • Marginal model makes few assumptions with respect to the data
  • Nested-designs can be modeled or unmodeled, and left to the error term (hopefully)
  • Individual cases can be incomplete or missing for a marginal model
  • Longitudinal designs are feasible within the marginal model framework
  • Marginal model has a closed-form solution to the equation via a Sandwich

Estimator (SwE)

  • It’s fast, and can be feasibly run with limited resources on lots of data
  • Use of a wild bootstrap (WB) provides an NHST framework for complex

questions

slide-35
SLIDE 35

Critical limitations

  • The marginal model cannot be used to draw inferences about

individuals within a population

  • It is an exploratory approach, which can be verified using subsequent

confirmatory approaches

  • DEAP can help conform such analyses to best standards and practices through

pre-registered reports, reproducibility, and independent validation

slide-36
SLIDE 36

Bryan Gillaume’s and Tom Nichols implemented an approach that uses a sandwich estimator to solve a marginal model

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Compute model Y/X = Beta

slide-37
SLIDE 37

Marginal models are effectively linear, so we first estimate the parameters for our design matrix by dividing the imaging measure (Y) by the design (X)

Imaging Volume(s) Design matrix Compute model Y/X = Beta

slide-38
SLIDE 38

For our software, the design matrix is just your non-imaging data

Imaging Volume(s) Design matrix Compute model Y/X = Beta

slide-39
SLIDE 39

So for example, with the ABCD data we can input measures and test a model

Imaging Volume(s) Design matrix Compute model Y/X = Beta Marginal model: y ~ RT

slide-40
SLIDE 40

A sandwich estimator is used to estimate covariance and determine the fixed effects parameters

Imaging Volume(s) Estimate FE covariance (SwE) Design matrix Compute model Y/X = Beta

slide-41
SLIDE 41

To handle nested structure, group covariance can be calculated separately (CRITICAL FOR ABCD)

Imaging Volume(s) Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Design matrix Compute model Y/X = Beta

slide-42
SLIDE 42

For ABCD, it is good to control for site and gender

Imaging Volume(s) Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Design matrix Compute model Y/X = Beta

site gender 14 2 5 2

slide-43
SLIDE 43

If needed we can perform a small sample size adjustment – this may be important if we used family as a nesting variable

Imaging Volume(s) Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Compute model Y/X = Beta

slide-44
SLIDE 44

Finally, a Wald test extracts a t-map for statistical inference

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Compute model Y/X = Beta

slide-45
SLIDE 45

The statistical map looks like this

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Compute model Y/X = Beta

slide-46
SLIDE 46

Use of a wild bootstrap enables inference similar to a permutation test – so we can control for the FWER

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Wild bootstrap WB maps Cluster detection/ TFCE Inference map Compute model Y/X = Beta

slide-47
SLIDE 47

Such a test allows us to detect significant clusters

Imaging Volume(s) Statistical T map for inference Estimate FE covariance (SwE) Calculate subject /groups covariance (residuals) Perform small sample adj. Design matrix Perform Wald Test Wild bootstrap WB maps Cluster detection/ TFCE Inference map Compute model Y/X = Beta

slide-48
SLIDE 48

Wild bootstrap

  • WB_value = fitted_value + residual_value*sample_value
  • Sample with replacement can be from simple or complex

distributions:

  • Radenbacher (-1, 1) would mean we either:
  • WB_value = fitted_value – residual_value
  • WB_value = fitted_value + residual_value
  • However, LOTS of possible distributions, so choice of distribution is

important.

slide-49
SLIDE 49

We have begun to implement a standalone MarginalModelCifti package in R

Alpha version will be released at -- http://github.com/dcan-labs/MarginalModelCifti

slide-50
SLIDE 50

The main wrapper for MarginalModelCifti takes in imaging volumes and prepares them for analysis

Imaging Volume(s) PrepCIFTI/Sur f/Vol

slide-51
SLIDE 51

ComputeMM is applied to the prepared data; user specifies the model using Wilkinson notation and wraps the SwE and Wald Test using Geepack

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM Statistical T map for inference Y ~ group + treatment

slide-52
SLIDE 52

ComputeMM_WB generates the WB maps used to draw inferences about the T map

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM ComputeMM_WB Null Distribution Statistical T map for inference

slide-53
SLIDE 53

In turn a family of functions are used to parallellize ComputeMM_WB

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM ComputeMM_WB Null Distribution Statistical T map for inference ApplyWB_to_data ComputeFits ComputeResiudals ComputeZscores GetSurfAreas GetVolAreas

slide-54
SLIDE 54

Cluster detection is performed within the main wrapper, using information from both processes

Imaging Volume(s) PrepCIFTI/Sur f/Vol ComputeMM ComputeMM_WB Null Distribution Cluster detection/ TFCE Inference map Statistical T map for inference

slide-55
SLIDE 55

The MarginalModelCifti package comprises multiple functions that can be accessed by anyone

slide-56
SLIDE 56

Functions are documented in accordance with CRAN guidelines

slide-57
SLIDE 57

Here are all the parameters for ConstructMarginalModel()

slide-58
SLIDE 58

To make things easier – we’ve made a jupyter notebook that can be used as a reference

slide-59
SLIDE 59

Outline of talk

  • Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

  • “Big data” complicates our ability to apply both approaches
  • Marginal Modelling is a good approach for descriptive modelling
  • Functional Random Forests is a good approach for predictive

modelling

  • Other approaches can also handle big data, but are beyond the scope
  • f this workshop
slide-60
SLIDE 60

Nested structures -- people belong to multiple subtypes

SODA COKE POP Dialect preferences: soda, coke or pop? Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-61
SLIDE 61

Nested structures -- people belong to multiple subtypes

DEM GOP U.S. 2016 presidential election voting preferences SODA COKE POP Dialect preferences: soda, coke or pop? Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-62
SLIDE 62

Nested structures -- people belong to multiple subtypes

DEM GOP U.S. 2016 presidential election voting preferences Stroke mortality for Adults 35+ per 100,000 RATE SODA COKE POP Dialect preferences: soda, coke or pop? Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-63
SLIDE 63

But what about effects of scanner upgrades, software maintenance, or even changes in personnel?

slide-64
SLIDE 64

If we want to control for unknown structure, we need to identify subtypes tied to an outcome

  • Supervised approaches can confirm known subtypes but not discover

unknown subtypes tied to an outcome

slide-65
SLIDE 65

If we want to control for unknown structure, we need to identify subtypes tied to an outcome

  • Supervised approaches can confirm known subtypes but not discover

unknown subtypes tied to an outcome

  • Unsupervised approaches can discover unknown subtypes, but not

tied to any outcome

slide-66
SLIDE 66

How does the Functional Random Forest work?

Supervised component

slide-67
SLIDE 67

Ask a question: can we predict depression diagnosis?

Supervised component Unsupervised component

slide-68
SLIDE 68

Supervised component

We start with an input dataset

Input dataset

Unsupervised component

slide-69
SLIDE 69

Supervised component

We start with an input dataset

Input dataset

Unsupervised component

slide-70
SLIDE 70

Supervised component

This dataset can be a functional connectivity matrix

Input dataset

Unsupervised component

slide-71
SLIDE 71

Supervised component

This dataset can be a functional connectivity matrix – which gets reduced to either graph metrics or principal components

Input dataset

Unsupervised component

slide-72
SLIDE 72

Supervised component

Input data are modeled via a random forest via validation/testing

Random Forest Creates decision trees Input dataset

Unsupervised component

slide-73
SLIDE 73

Supervised component

Model is supervised because it attempts to predict the outcome of interest

Random Forest Creates decision trees Input dataset

Unsupervised component

slide-74
SLIDE 74

Unsupervised component Supervised component

If the random forest performs well on independent test data, a similarity matrix is produced from the RFs

Similarity matrix Random Forest Creates decision trees Input dataset

=

slide-75
SLIDE 75

Supervised component Unsupervised component

Subgroups are identified from this matrix via Infomap

Random Forest Creates decision trees Infomap Identifies communities Input dataset Similarity matrix

slide-76
SLIDE 76

Supervised component Unsupervised component

Subtypes arise from the model that are tied to the outcome

Random Forest Creates decision trees Subpopulations Infomap Identifies communities Input dataset Similarity matrix

slide-77
SLIDE 77

The FRF can be used to identify trajectories in longitudinal data

Longitudinal dataset Functional Data Analysis Generates individual trajectories f(t) = a1ø1(t) + .... + akøk(t)

slide-78
SLIDE 78

Combining the set of functions estimates a smooth trajectory for an individual’s symptoms

Longitudinal dataset Functional Data Analysis Generates individual trajectories f(t) = a1ø1(t) + .... + akøk(t)

slide-79
SLIDE 79

Combining the set of functions estimates a smooth trajectory for an individual’s symptoms

Longitudinal dataset Functional Data Analysis Generates individual trajectories f(t) = a1ø1(t) + .... + akøk(t)

slide-80
SLIDE 80

We can use an unsuperv rvised approach to identify trajectories

Unsupervised

Longitudinal dataset Functional Data Analysis Generates individual trajectories Infomap Identifies communities Correlation-based subpopulations f(t) = a1ø1(t) + .... + akøk(t) Correlation Matrix Compares trajectories

slide-81
SLIDE 81

Or use a “hybrid” approach that identifies trajectory subtypes tied to an outcome of interest

Unsupervised Hybrid

Longitudinal dataset Functional Data Analysis Generates individual trajectories Infomap Identifies communities Correlation-based subpopulations Model-based subpopulations Infomap Identifies communities f(t) = a1ø1(t) + .... + akøk(t) Correlation Matrix Compares trajectories Parameters Random Forest Creates decision trees Similarity matrix

slide-82
SLIDE 82

A manual for using the FRF exists online (https://dcan-labs.github.io/functional-random- forest/)

slide-83
SLIDE 83

A new release is available at:

slide-84
SLIDE 84

A manual for using the FRF exists online (https://dcan-labs.github.io/functional-random- forest/)

slide-85
SLIDE 85

Outline of talk

  • Theory recap: modelling approaches can be reduced to two types:

predictive and descriptive

  • “Big data” complicates our ability to apply both approaches
  • Marginal Modelling is a good approach for descriptive modelling
  • Functional Random Forests is a good approach for predictive

modelling

  • Other approaches can also handle big data, but are beyond the

scope of this workshop

slide-86
SLIDE 86

New approaches within statistics and machine learning can also accommodate problems with big data

  • Many of these approaches have been developed in genomics
  • comBat is a Bayesian approach to handle known site effects in data
  • Surrogate Variable Analaysis
  • Such approaches need to be examined in the context of neuroimaging

data to evaluate where each is most useful

  • Knowing how to use these tools requires considerable skill in data

science, which has been relatively untaught in mental health fields

  • Hopefully, the workshop tomorrow should get you excited about

applying these new tools and on your path towards doing “big data” science right.

slide-87
SLIDE 87

Acknowledgments

Fair Lab

  • Damien Fair
  • Oscar Miranda-Dominguez
  • Alice Graham

Computing Team

  • Darrick Sturgeon
  • Eric Earl
  • Anders Perrone
  • Emma Schifsky
  • Anthony Galassi
  • Kathy Snider
  • David Ball
  • Lucille Moore

Alpha Testers

  • Bene Ramirez
  • Jennifer Zhu
  • Robert Hermosillo
  • Mollie Marr
  • Oliva Doyle
  • Michaela Cordova
  • AJ Mitchell
slide-88
SLIDE 88

Acknowledgments

  • The mentors
  • Damien Fair
  • Joel Nigg
  • Eric Fombonne
  • Shannon McWeeney
  • The databasors
  • Lourdes Irwin
  • Darrick Sturgeon
  • Rachel Klein
  • The developers
  • Eric Earl
  • Anders Perrone
  • Darrick Sturgeon
  • Other Labs:
  • Nigg Lab
  • McWeeney Lab
  • The assessors:
  • Beth Langhorst
  • Michaela Cordova
  • Bene Ramirez
  • Brian Mills
  • Olivia Doyle
  • The students:
  • Iliana Javier
  • Nadir Balba
  • The docs:
  • Alice Graham
  • Oscar Miranda-Dominguez
  • Binyam Nardos
  • The collaborators:
  • Sarah Karalunas
  • Alison Hill
  • Jan Van Santen
  • Everyone I forgot, which is many ☺
slide-89
SLIDE 89

Questions?

slide-90
SLIDE 90

High dimensionality is bad for predictive modelling

Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

slide-91
SLIDE 91

Predictive models must also take into account nested structure

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880143/