Bayesian Models for Combining Data Across Subjects and Studies in - PowerPoint PPT Presentation

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fMRI Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007

Outline • Motivation and Thesis • Preliminary results: Hierarchical Gaussian Naive Bayes • Proposed work, including schedule 2

fMRI • 3D images of hemodynamic activations in the brain • assumed to be correlated with local neural activations • ~10,000 spatial features (voxels, analogous to pixels) • Temporal component • ~10-100 trials 3

fMRI Data Analysis • Descriptive • Locations of activations correlated with a cognitive phenomenon • Most common paradigm used • Predictive • Prediction of the cognitive phenomenon underlying brain activations • Classification of cognitive tasks, prediction of levels of stimulus presence (EBC competition) 4

Motivation: Subject- Level • For predictive analysis, analysis is done separately for individual subjects • Problem: lack of training examples, can potentially improve performance by incorporating data from other subjects • Simple solution: pool the data for all the subjects together • Problem: for some subjects, might not be reasonable to pool data (e.g. subjects with different conditions) • Problem: inter-subject variability is ignored 5

Inter-Subject Variability • Human brains have similar functional structures, but there are differences in shapes and volumes (different feature spaces for different human subjects) • Normalization to a common space is possible, but can result in the distortion of the data • Even after normalization, the activations are also governed by personal experience, and affected by environment Thirion et al. (2006) 6

Motivation: Study-Level • fMRI studies are expensive; it is desirable to incorporate data from existing similar studies • Problem: problems from the subject-level • Problem: variability due to different experimental conditions (e.g. the use of different stimuli, different magnetic field strength) • Problem: which studies are similar 7

Motivation: Generalization • How much commonality exists across different individuals with respect to a particular cognitive task • Influence how much can be shared across different individuals (or groups) • Example: sharing for classification of picture vs sentence might be easy, but sharing for classification of orientation of visual stimuli Kamitani and Tong using V1/V2 voxels might be Nature Neuroscience, 2005 hard 8

Thesis Machine learning and statistical techniques to • Combine data from multiple subjects and studies • Improve predictive performance (compared to separate analyses for individual subjects and studies) • Distinguish common patterns of activations versus subject-specific or study-specific patterns of activations Framework of choice is Bayesian statistics, in particular hierarchical Bayesian modeling • Offer a principled way to account for uncertainties and the different levels of data generation involved 9

Related Work in fMRI • Classification • Pooled data from multiple subjects (Wang et al. (2004), Davatzikos et al. (2005), Mourao-Miranda et al. (2006)) • Group analysis: multiple subjects in a specific study • Focus: descriptive, increase in sensitivity for detection of activations • Mixed-effects model (Woods (1996), Holmes and Friston (1998), Beckmann et al. (2003)) • Hierarchical Bayes model (Friston et al. (2002)) 10

Related Work in ML/ Statistics • Multitask learning/inductive transfer • Caruana (1997) • Generative setting: Rosenstein et al. (2005), Roy and Kaelbling (2007) 11

Preliminary Work • Combining data from multiple subjects in a given study • Extension of the Gaussian Naive Bayes classifier • The use of hierarchical Bayes modeling • Designed for data after feature space normalization • Simplify the problem, even though not ideal 12

Gaussian Naive Bayes (GNB) • Bayesian classifier: pick the class with maximum class posterior probability (proportional to product of class prior and class-conditional probability of the data) c = argmax c k P ( C = c k | y ) ∝ argmax c k P ( C = c k ) p ( y | C = c k ) • Naive Bayes: independence of features conditional on the class J ∏ P ( y | C ) = P ( y j | C ) j = 1 • Gaussian Naive Bayes: for each feature j, the class- conditional distribution is Gaussian y j | C = c k ∼ N ( θ ( k ) j , ( σ ( k ) j ) 2 ) 13

GNB, Learning Use maximum likelihood (sample mean and sample variance) s: subject n s s j = 1 θ ( k ) y ( k ) j: feature ˆ ∑ sji i: instance n s i = 1 k: class n s 1 s j ) 2 = σ ( k ) ( y ( k ) θ ( k ) sji − ˆ sj ) 2 ∑ ( ˆ n s − 1 i = 1 For pooled data, aggregate the data over all the subjects (estimates will be the same for all subjects) 14

Hierarchical Normal Model For each class and each feature µ, τ θ 1 θ 2 θ s θ S · · · · · · y s 1 y s 2 y sn s · · · 15

Hierarchical Normal Model • The tool to extend the Gaussian Naive Bayes classifier to handle multiple µ τ subjects • Gelman et al. (2005), also used in Friston et al. (2002) for group analysis θ σ (aim: hypothesis testing) • Modeling Gaussian data for different y but related groups; the means for each n s s group has a common Gaussian distribution • Generative model: y si ∼ N ( θ s , σ 2 ) s: group (subject) i: instance θ s ∼ N ( µ , τ 2 ) 16

Hierarchical GNB (HGNB) • Use the hierarchical normal model as a class-conditional generative model for each feature, as a way to integrate data from multiple subjects • Assume data has been normalized to a common space • Same variance for all subjects • Estimate variance separately, taking the median of sample variances for all the subjects 17

MAP , Empirical Bayes estimates that S µ MP = 1 ∑ MP: point estimate y s · (approximately) maximize S s = 1 s: subject the marginal likelihood S (the probability of data 1 τ 2 ( y s · − µ MP ) 2 ∑ MP = µ, τ given hyperparameters) S − 1 s = 1 maximum of the θ 1 θ 2 θ s θ S 1 n s · · · · · · σ 2 y s · + posterior of θ s MP µ MP τ 2 conditional on the data θ s = 1 n s and the σ 2 + τ 2 hyperparameters MP y s 1 y s 2 y sn s · · · When the number of examples is small, HGNB behaves like GNB on pooled data When the number of examples is large, HGNB behaves like GNB on the individual subject’s data 18

It is not true that the plus is above the star.

Datasets Starplus • Classification of the types of first stimuli (picture or sentence) given a window of fMRI data • Spatial normalization: use average of voxels in each region of interest (ROI) • Feature selection: use ROI for visual cortex • 16 features (each time point is a feature) • 20 trials per class per subject • 13 subjects 22

hammer

palace

Datasets Twocategories • Classification of the category of word (tools or dwellings) given a window of fMRI data • Spatial normalization: use transformation to a common brain template (MNI template) • Feature selection: 300 voxels ranked using Fisher’s LDA • 300 features (averaged over time) • 42 trials per class per subject • 6 subjects 27

Experiment • Iterate over the subjects, designating the current one as the test subject • 2-fold cross-validation, varying the number of training examples used from the test subject for each class; fold randomly chosen (repeated several times) • GNB indiv: GNB learned using data from the test subject only • GNB pooled: GNB learned using data from the test subject and the other subjects (assuming no inter-subject variability) • HGNB using data from the test subject and the other subjects 28

Classification Accuracies, Starplus 0.85 0.8 classification accuracies 0.75 classification accuracies 0.7 0.65 GNB indiv GNB pooled HGNB 0 2 4 6 8 10 12 no of training examples per class no of training examples per class 29

Classification Accuracies, Twocategories 0.7 0.68 0.66 0.64 classification accuracies 0.62 classification accuracies 0.6 0.58 0.56 0.54 GNB indiv GNB pooled HGNB 0.52 0 5 10 15 20 25 no of training examples per class no of training examples per class 30

HGNB Recap • Classifier to combine data across multiple subjects in a study • Improvement in predictive performance over separate analyses and pooling data • Assume that each cognitive task to predict generates similar brain activations on all the subjects • Show that hierarchical Bayes modeling can model inter- subject variability 31

Proposed Work • Goals that have not been addressed by HGNB: 1. sharing across studies, or both subjects and studies 2. determining groups to share 3. determining cross-subject/study commonality of particular cognitive tasks (related to generalisability) 4. dealing with the distortion caused by normalization • Work proposed to address the above goals: • Variations on HGNB • Latent structure in data • Accounting for normalization 32

Variations on HGNB • Goals (1st and 2nd) • sharing across studies, or both subjects and studies • determining groups to share • Variation/extension of the HGNB classifier 33

Bayesian Models for Combining Data Across Subjects and Studies in - PowerPoint PPT Presentation

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fMRI Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical Gaussian Naive Bayes

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

2020/2021 Which Subjects do you choose? Irish, English, Maths. 4 Other Subjects. Level -

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Core Subjects Evening 27 th September 2017 catchpolek@lytchett.org.uk Why Core Subjects? All

Anderson Secondary School Secondary 2 Express Subject Combination Briefing 2017 BREADTH &

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Validity-preservation properties of rules for combining inferential models combining

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

STAT2201 Analysis of Engineering & Scientific Data Unit 8 Slava Vaisman The University of

STAT 113: TOPIC OUTLINE (FINAL EXAM) COLIN REIMER DAWSON, FALL 2015 The final exam will cover the

Deep Learning: Part 2 Graduate School of Culture Technology, KAIST Juhan Nam Outlines

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Analysis Toolpack on a Mac It seems Excel has done away with the Analysis Toolpack on Macs They

Selecting Statistics the Most Representative How to describe closeness Formulation of the . . .

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 ,

Bayesian Models for Combining Data Across Subjects and Studies in - PowerPoint PPT Presentation

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fMRI Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical Gaussian Naive Bayes

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

2020/2021 Which Subjects do you choose? Irish, English, Maths. 4 Other Subjects. Level -

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Core Subjects Evening 27 th September 2017 catchpolek@lytchett.org.uk Why Core Subjects? All

Anderson Secondary School Secondary 2 Express Subject Combination Briefing 2017 BREADTH &amp;

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Validity-preservation properties of rules for combining inferential models combining

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 8 Slava Vaisman The University of

STAT 113: TOPIC OUTLINE (FINAL EXAM) COLIN REIMER DAWSON, FALL 2015 The final exam will cover the

Deep Learning: Part 2 Graduate School of Culture Technology, KAIST Juhan Nam Outlines

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Analysis Toolpack on a Mac It seems Excel has done away with the Analysis Toolpack on Macs They

Selecting Statistics the Most Representative How to describe closeness Formulation of the . . .

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 ,

Anderson Secondary School Secondary 2 Express Subject Combination Briefing 2017 BREADTH &

STAT2201 Analysis of Engineering & Scientific Data Unit 8 Slava Vaisman The University of