Classification of fMRI-based cognitive states
Stephen LaConte Department of Neuroscience
Classification of fMRI-based cognitive states Stephen LaConte - - PowerPoint PPT Presentation
Classification of fMRI-based cognitive states Stephen LaConte Department of Neuroscience Organization Background and motivation Basic principles Evaluation of predictive models Model interpretation fMRI
Classification of fMRI-based cognitive states
Stephen LaConte Department of Neuroscience
“Organization”
Stephen LaConte July 25, 2008
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Stephen LaConte July 25, 2008
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Stephen LaConte July 25, 2008
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
0.30 0.40 0.50 0.60 0.0 0.2 0.4 0.6 0.8 1.0 100 PCs 75 PCs 50 PCs 25 PCs 10 PCs Air Alignment High Low DC De-trend None High Low Smooth No Alignment DC De-trend, No Smooth: mean reproducibility mean prediction accuracyStephen LaConte July 25, 2008
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Stephen LaConte July 25, 2008
Stephen LaConte July 25, 2008
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Stephen LaConte July 25, 2008
Stephen LaConte July 25, 2008
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Background and Motivation
(Friston, 1995; McIntosh, 1996; Strother, 2002; Moeller and Habeck 2006)
(Dehaene, 1998)
(Strother, 2002; LaConte, 2003; Shaw, 2003; Mitchell 2004; Mourão-Miranda, 2005; Martinez-Ramon, 2006)
different classes of stimuli
(Haxby, 2001; Cox and Savoy, 2003; Haynes & Rees, 2005; Kamitani & Tong 2005)
cognitive states
(Polyn, 2005)
for real-time fMRI
(LaConte, 2007)
(Kay, 2008)
HBM ’06 and ‘07
Stephen LaConte July 25, 2008
“Organization”
Stephen LaConte July 25, 2008
Supervised learning applied to fMRI
Data acquisition Visual display Supervised learning
Data labels (y) Image data Stimulus
y t I t
Estimated label for time t y t
^
Time-labeled scans Time-labeled scans
Step 1: Train with labeled data
Data acquisition Data acquisition Visual display Visual display Model
Image data Stimulus
I t
Step 2: Use model to predict/decode y represents the stimulus/behavioral categories for each volume. For classification, there is a set of stimulus categories y = {0, 1, 2, …, N}.
y
Stephen LaConte July 25, 2008
Mathematical Representation of fMRI Data
[ ]
N
X X X
1
Stephen LaConte July 25, 2008
M M N N
1 2 22 21 1 12 11
Variables/Features/Space Observations/Time/Intervals
Mathematical Representation of fMRI Data
Stephen LaConte July 25, 2008
x x x
1
Classification with individual volumes
1
x
2
x
time experiment
Stephen LaConte July 25, 2008
time fMRI experiment
y
1 6 3 9 10 8 6 4 2 10 8 6 4 2
) ( β β + =
T
x x f )) ( , ( x f y L
1
x
2
x y
1
x
2
x
Stephen LaConte July 25, 2008
x
g ˆ
high dimensional input space scalar (or very low dimensional) decision
Classifier
Classification
Stephen LaConte July 25, 2008
Training Data and Decision Boundary
Stephen LaConte July 25, 2008
Training Data and Decision Boundary
Stephen LaConte July 25, 2008
Training Data and Decision Boundary
Stephen LaConte July 25, 2008
Training Data Individual 2-class models
1 vs. 4 2 vs. 4 1 vs. 3 1 vs. 2 2 vs. 3 3 vs. 4
1 3 4 2
Stephen LaConte July 25, 2008
4-Class Model
1 3 4 2
Individual 2-class models
1 vs. 4 2 vs. 4 1 vs. 3 1 vs. 2 2 vs. 3 3 vs. 4
Stephen LaConte July 25, 2008
x
g ˆ
high dimensional input space scalar (or very low dimensional) decision
Classifier
Classification
Stephen LaConte July 25, 2008
Canonical Variates Analysis
UTx
x z
g ˆ
PCA rotation based on training data feature space vector linear weights from eigenvectors of class covariances high dimensional input space reduced dimensional feature space scalar (or very low dimensional) decision
Lz
Stephen LaConte July 25, 2008
Data Matrix: Truncate Q (model complexity) PCA via SVD: CVA:
time voxels
Columns of L are determined by the eigenvectors of W-1B. W is the within class variance and B the between class variance, and both are obtained from Q.
T T
T* * =
Stephen LaConte July 25, 2008
Support Vector Machine
k(x) x z
w z w z
g ˆ
non-linear kernel mapping function feature space vector linear weights high dimensional input space very high dimensional feature space scalar decision
Stephen LaConte July 25, 2008
2 1
T t t
minimize
This term allows some training errors. This term favors the widest possible margin, C = infinity is hard margin SVM (as apposed to soft margin) because it does not allow any training errors
i i
Stephen LaConte July 25, 2008
– Finite sample sizes – Noisy samples
– If model is too flexible: overfitting to noise – If model is not flexible enough: will not adequately capture signal structure
and minimize the confidence interval
– penalization with an analytical model (e.g. Akaike’s final prediction error) – Resampling (Cross-validation, Bootstrap)
Stephen LaConte July 25, 2008
controlled by number of model parameters
margin-based: complexity is controlled by size
generalization – high prediction of future samples
Stephen LaConte July 25, 2008
2 1
T t t
For SVM, minimize the following
This term allows some training errors. This term favors the widest possible margin, C = infinity is hard margin SVM (as apposed to soft margin) because it does not allow any training errors
Stephen LaConte July 25, 2008
Seperable C = “small” Side note: The w map Considering the role each voxel plays in the direction of the discriminant boundary, the absolute value of component 2 of w should be larger than component 1. So the “activation map” would show voxel 2. x2 x1 C = large
Stephen LaConte July 25, 2008
Non-seperable The use of C was primarily motivated by this case C = small C = large
Stephen LaConte July 25, 2008
“Organization”
Stephen LaConte July 25, 2008
Evaluating the model: Generalization
– Autocorrelation within a run – Across runs – Across sessions – Across individuals
Stephen LaConte July 25, 2008
Evaluating the model: Generalization
Confusion matrix for binary classification Class decision Positive Negative Positive Negative True positive True negative False negative False positive
Stephen LaConte July 25, 2008
Evaluating the model: Generalization
Quantifying predictive performance
fn tn fp tp tn tp accuracy + + + + = fn tp tp y sensitivit + = fp tn tn y specificit + =
performance on class 1 performance on class 0
Stephen LaConte July 25, 2008
Test Statistic for Data Set 1 Test Statistic for Data Set 2
Pattern 2 Pattern 1
– Non-parametric – Prediction – Activation – Influence – Reproducibility
– reSampling
performance metrics
– Propose Prediction vs. Reproducibility curves as an alternative to ROC analysis (LaConte, 2003)
Stephen LaConte July 25, 2008
(LaConte, 2003)
Stephen LaConte July 25, 2008
Air Alignment
High Low DC De-trend None High Low SmoothNo Alignment
DC De-trend, No Smooth:mean reproducibility mean prediction accuracy
(LaConte, 2003)
Stephen LaConte July 25, 2008
S16 S1 S16 S1 hh hl hn lh ll ln nh nl nn xx hh hl hn lh ll ln nh nl nn xx 100 90 80 60 50 70 SVM Prediction Accuracy LDA Prediction Accuracy
(LaConte, 2005)
Stephen LaConte July 25, 2008
analysis.
competing methodologies by using the NPAIRS performance metrics, prediction accuracy and reproducibility.
data set
be split into independent test/train sets
than CVA
Stephen LaConte July 25, 2008
“Organization”
Stephen LaConte July 25, 2008
accurate predictive models closely approximate the true model” – Cherkassky and Mulier, 2007.
– A consequence of modeling finite data
diffusely contributes to the model
– non-linear models are often difficult to interpret
both spatial and temporal properties
applied!
Stephen LaConte July 25, 2008
Interpreting SVM Models
Support Vector Machines
vectors.
– predefined k(x) – support vectors – SV class labels.
k(x) x z
w z
g ˆ
) ( ) ( w z w z D
i i
+ ⋅ =
1
) , ( ) ( ) ( w x x H y w z z y z D
t T t t t T t t t t
+ = + ⋅ =
=
α
Stephen LaConte July 25, 2008
Interpreting SVM Models
most difficult to classify
hyperplane are most easily classified
result in an identical model
Linear Kernel 3rd Order Polynomial Kernel 5th Order Polynomial Kernel
Stephen LaConte July 25, 2008
Interpreting SVM Models
most difficult to classify
hyperplane are most easily classified
result in an identical model
Linear Kernel 3rd Order Polynomial Kernel 5th Order Polynomial Kernel
20 40 60 80 100 120 −1 1 Scan Support Vectors
Stephen LaConte July 25, 2008
Spatial interpretation
Linear Kernel 3rd Order Polynomial Kernel 5th Order Polynomial Kernel
Stephen LaConte July 25, 2008
Generating Sensitivity Maps
) , ( 2
)) ( | ) ( (
g x p i i
x j x j g p s
∂ =
2 1
)) ( | ) ( ( 1
=
∂ ≈
N j i i
x j x j g p N s Kjems, U., et al. NeuroImage 15:772-786, 2002.
estimation and its partial derivative
Stephen LaConte July 25, 2008
Generating Sensitivity Maps
)) ( exp( ) )] ( ) ( 1 [ exp( )) ( | ) ( ( i i D i y j x j g p ξ
θ
− = − − ≈
+
Kwok, J., IEEE Trans Neur Net 10:1018-1031, 1999.
Stephen LaConte July 25, 2008
“Organization”
Stephen LaConte July 25, 2008
– Exclude transition scans from model – Time shift labels to account for delay – Model the delay based on HRF characteristics
– Block design – Event-related – Complex stimuli
Stephen LaConte July 25, 2008
Dimensionality and feature selection
Besides other potential benefits, can also have interpretive value
Voxel selection maps for Left vs. Right task (A) and Index vs. Pinky task (B). These maps were generated by averaging the voxel rank scores for each training set across the four runs.
RA
RA
RB
RB
(LaConte, 2005)
Stephen LaConte July 25, 2008
(LaConte, 2003; LaConte 2005; Mitchell, 2004)
Stephen LaConte July 25, 2008
20 Total Misclassifications 40 60 80 100 120 140 160 180 20 Total Misclassifications 40 60 80 100 120 140 160 180 2 6 10 14 18 22 26 30 seconds 2 6 10 14 18 22 26 30 seconds
Effect of training with task transition images
exclude 2 exclude 3 exclude 1 all images
training testing X X X X X X X X X X X X
Stephen LaConte July 25, 2008
20 Total Misclassifications 40 60 80 100 120 140 160 180 20 Total Misclassifications 40 60 80 100 120 140 160 180 2 6 10 14 18 22 26 30 seconds 2 6 10 14 18 22 26 30 seconds
Effect of training with task transition images
exclude 2 exclude 3 exclude 1 all images
Stephen LaConte July 25, 2008
75 Accuracy (%) 100 80 85 95 70 90 Discarded Transition Images all images exclude 1 exclude 2 exclude 3
0.7 0.75 0.8 0.85 0.9 0.95 1
Effect of training with task transition images testing all data testing last 20 of 30 s (10 of 15 images)
Stephen LaConte July 25, 2008
Responsiveness to stimulus changes Average classifier output Individual classifier output with behavioral data
all images exclude 2 The model trained without transition images is more sluggish The model trained without transition images is more stable
Stephen LaConte July 25, 2008
Classification of “transition” images
Stephen LaConte July 25, 2008
Stephen LaConte July 25, 2008
Matrix representation of fMRI data…
N = number of brain voxels T = number of repeated measurements
T N
x x x x x x
21 1 12 11
t t voxels t Experimental Design (classification labels)
t
y
leads to an natural vector representation for block design experiments + + Task 2 Task 1 Data Matrix + + + + Basic signal characteristics
Stephen LaConte July 25, 2008
t Experimental Design (class labels)
t
y
The signal characteristics for ER-fMRI data + Task 2 Task 1 Data Matrix + { + { { { Basic signal characteristics + { + { { { “Hyper-Image” Combine HRF times
For ER data, several images must be considered as belonging to the time evolution of the same class of stimuli.
Rapid ER-fMRI Data Matrix
+ { + {
For rapid ER designs, the data matrix has overlapped class labels, and the hyper- image representation shares images that are a mixture of more than one experimental condition.
(Mitchell, et al. 2004. Mach Learn 57, 145-75; LaConte, et al. 2005. ISMRM 1583)
] [ ]) [ ] [ ( ] [
1
t n t h t x t y
s S s s
+ ∗ =
=
n h y + = X
Signal model
Assumes linear time invariance, y[t] the BOLD signal over time x[t] neuronal responses h[t] hemodynamic responses S number of unique stimulus classes n[t] noise term
x2 x1
h2
y
n g h y + + = V X
HRFs are known to vary with repetitions of identical stimuli (Lu, et al. IEEE TMI 24, 236-45)
For classification, we require multiple observations of each of the S stimuli, not an estimate of hs[t]. The mixed model equation estimates the HRFs for each trial. The idea is to estimate each h by accounting for the additive effects of other event responses. Mixed model form V, N×(TL), contains the information of X g, (TL)×1, is the vector of random effects Matrix form of signal model
Direct hyper-image vectors vs. mixed model vectors
– relies on the same principle as time-locked averaging – has limited power to accurately estimate the hemodynamic response function (HRF) – hemodynamic responses are known to vary with repetitions of identical stimuli.
variation in ER-data:
– Between-HRF variation from a voxel’s relative sensitivity to different stimulus types – Within-HRF variation to explain the heterogeneity of a voxel’s response to several repetitions of the same stimulus.
(Lu, et al. IEEE TMI 24, 236-45)
Stephen LaConte July 25, 2008
realizations for 40 events each):
(n~N(0,0.25))
Simulation 1: estimating two HRFs from a time series
2 4 6 8 10 12 14 16 18samples
20Time-locked
h1
5 10 15 20 25 30 35 40realizations
2 4 6 8 10 12 14 16 18samples
20Mixed-model
h1
5 10 15 20 25 30 35 40realizations
h2 h2 h1 h2 h1 h2
Time-locked Mixed-model
Average and standard deviation from individual estimates
Stephen LaConte July 25, 2008
“Organization”
Stephen LaConte July 25, 2008
Stephen LaConte July 25, 2008
Subjects can learn to control activation in a number of different brain areas
– Posse 2001, Yoo 2002, deCharms 2004, Yoo 2004
– Weiskopf 2004
– Posse 2003
– Caria 2007
– Weiskopf 2003, Yoo 2004,Birbaumer 2007, deCharms 2005
Stephen LaConte July 25, 2008
– Updating statistics at each pixel – Time window considerations – Interpretation of brain activation
– designation of that region – filtering and spatial averaging
Stephen LaConte July 25, 2008
Class Training Labels
Training run
Time-Labeled Scans
Image Recon and SVM Classification
Image Data
Data Acquisition
Stimulus Presentation Stimulus
Conventional FMRI
Test Data Classifier Output
Testing Run
Real-Time Classification
(LaConte, 2007)
Stephen LaConte July 25, 2008
Results
Stephen LaConte July 25, 2008
Experimental Timing and Classifier Output (left finger = -1, right finger = +1)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 1 (78 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 1 (78 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 2 (78 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 2 (78 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 3 (79 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 3 (79 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 4 (77 % Accuracy)
1 2 minutes Image Classification 1 3 2 4 5 6 7 8
Subject 4 (77 % Accuracy)
Stephen LaConte July 25, 2008
Brain state classification: a variety of cognitive domains. With the exact same experimental setup (different instructions), subjects can learn to move the arrow
Stephen LaConte July 25, 2008
.9 minutes Classification Accuracy 1 3 2 4 5 6 7 8 .8 .7 .6 .5 .4
Average Learning Curve (+/- Standard Deviation) Subject 1
Stephen LaConte July 25, 2008
classification-based real-time fMRI
unattainable through traditional stimulus-response experiments
meditation, learning studies, sports therapy or other virtual reality– based training, and lie-detection
Stephen LaConte July 25, 2008
Applications
to internal states
including multimedia/complex stimuli
Stephen LaConte July 25, 2008
“Organization”
Stephen LaConte July 25, 2008
resources
Reading
Trends in Cognitive Sciences. 10:424-430.
The NPAIRS data analysis framework. NeuroImage. 15:747-771.
Wiley & Sons, Inc., New York.
inference and prediction. Springer-Verlag, New York.
Software:
Stephen LaConte July 25, 2008
3dsvm Plugin Snapshot Support Vector Machine Analysis
training testing
Stephen LaConte July 25, 2008
Training - 3dsvm -trainvol run1+orig \
Testing - 3dsvm -testvol run2+orig \
Command Line
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
Stephen LaConte July 25, 2008
Example: Left vs. Right visual stimulus
R L
Stephen LaConte July 25, 2008
3dsvm features
major software packages
functional maps
Stephen LaConte July 25, 2008
Xiaoping Hu Scott Peltier Jihong Chen Will Curtis Jeffrey Prescott Yang Zhi Zhihao Li Stephen Strother Vladimir Cherkassky
Work partially supported by NINDS R21NS050183, the Robert and Janice McNair Foundation, and Baylor College of Medicine.
Andrew Fischer Will Curtis Dorina Papageorgiou Prashant Prasad
Stephen LaConte July 25, 2008