[PPT] - Automated High-dimensional Cytometric Data Analysis Cytometric Data PowerPoint Presentation

SLIDE 1

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis

Philip L. De Jager, M.D. Ph.D.

Director, Program in Translational NeuroPsychiatric Genomics Brigham & Women’s Hospital Assistant Professor of Neurology Harvard Medical School

SLIDE 2

Challenges in cytometric analysis

Large amount of high dimensional data

Challenges in cytometric analysis

Large amount of high dimensional data
Manual data processing (subjective, slow)
Not suitable for high-throughput study

g g p y

Difficult to use in inferential analysis

– “hypothesis limited”

Sub optimal usage of data dimensions
Sub-optimal usage of data dimensions
Increasingly multi-parametric
Restricted visualization

Solution: Automated & Multivariate Analysis

2 Automated & Multivariate Analysis

SLIDE 3

FLAME FLow cytometry analysis with Automated

l i i d l i i i

y y y Multivariate Estimation

Clustering – parametric and multivariate mixture

modeling of the populations in each flow sample

Meta clustering

match the corresponding

Meta-clustering – match the corresponding

populations from multiple samples to compare features of these matched populations

Feature selection – identify features that

distinguish populations between different classes (such as normal vs. disease, wt vs. mutant, (suc as o a s. d sease, t s. uta t, longitudinal observations, etc.)

Classification – predict class membership for new

l b d h di i i f samples based on those distinctive features

SLIDE 4

2. Meta-clustering

FLAME summary

1. Clustering

flow data

FLAME summary

3. Feature Selection

Sample 1

class1 class2 class3

Frequencies
Locations
Means
Modes
Variances
Scales
Orientations
Shapes

Sample 2 Downstream Analyses

Visualization

Cl Di Sample 3

Class Discovery
Class Prediction
Etc.

SLIDE 5

FLAME Methodology Methodology

SLIDE 6

Concept: Finite Mixture Model

Finite Mixture Model: weighted sum of g univariate or multivariate densities

Univariate Gaussian mixture Bivariate Gaussian mixture Univariate Gaussian mixture Bivariate Gaussian mixture

w2=0.5 w1=0.5 Fitted curve

6 µ1 µ2 µ3

g=3 Sum of 3 Gaussians curve g=2

SLIDE 7

Different distributions

Skew N N Skew

SLIDE 8

Model Selection options in FLAME p

Skew N N Skew

SLIDE 9

Step 1: Fitting a distribution

Lymphoblastic cell line

Step 1: Fitting a distribution

9

SLIDE 10

Fitting skew t deals with asymmetry

Skew Asymmetric Data

g y y

Gaussian Skew Asymmetric Data

Density Plot

Distribution Distribution

SLIDE 11

Step 2: Meta-clustering Step 2: Meta clustering

1. Input: Individual samples clustered by mixture model

T k ll l d l th i l t l ti

2. Take all samples and pool their cluster locations
3. Algorithm: Run Partitioning Around Medoids (PAM) to

3. go t : u a t t o g

u d

edo ds ( ) to

btain k meta-clusters

1 1

4. Output: Matched features used for classification of samples

SLIDE 12

Example 2: Identifying discriminating features

Experiment: examine ZAP70 and

SLP76 phosphorylation events before p p y and after T cell receptor activation in naïve and memory T cells

Lymphocytes stained with four

Lymphocytes stained with four markers:

CD4
CD45RA
ZAP70Y292
SLP76Y128
SLP76Y128
60 samples: 30 subjects x two time

points: pre- and post- anti-CD3 ib d i l i antibody stimulation

SLIDE 13

Registering populations across samples

Pre‐stimulation samples Post‐stimulation samples p p

SLIDE 14

a. c.

CD45RA CD45RA C C

e.

CD45RA

Sample 121106A_0min

b d

RA 5RA

b. d.

CD45R CD45

Pre-stimulation P t ti l ti

Sample 121106A_5min

Post-stimulation

SLIDE 15

Step 3: Discriminating features

zero minute five minute Pre-stimulation Post-stimulation zero-minute five-minute

SLIDE 16

Discriminating features

IV II III

feature name Feature Type Cluster # Dimension(s) ∆mean [five- min] p-value vars11.4 Variance 4 1

0.156

1.65E-18

rientation 72

Orientation 5 3

0.649

1.01E-14

rientation 56

Orientation 4 3

0.609

1.13E-12 vars11.5 Variance 5 1

0.082

4.00E-08

rientation 66

Orientation 5 1

0.515

1.37E-05 shape 11 Shape 3 3

0.175

2.62E-08 scale4 Scale 4 NA

0.052

3.32E-06

A

II I

rientation 19

Orientation 2 1

0.632

1.34E-06 shape 8 Shape 2 2

0.141

4.41E-09 shape 15 Shape 4 4

0.178

5.17E-07 vars41.5 Variance 5 1,4

0.024

2.63E-05

rientation 42

Orientation 3 3

0.422

9.73E-04 shape 20 Shape 5 4

0.060

7.93E-05 scale5 Scale 5 NA

0.038

7.23E-04 vars43.3 Variance 3 3,4

0.020

7.10E-04 vars31.4 Variance 4 1,3

0.015

3.34E-03 vars11.3 Variance 3 1 0.314 6.22E-12

CD45RA

I V

rientation 52

Orientation 4 1 0.552 1.87E-10 vars21.2 Variance 2 1,2 0.251 1.22E-10 vars21.3 Variance 3 1,2 0.259 1.14E-11 vars21.4 Variance 4 1,2 0.060 3.42E-08

rientation 20

Orientation 2 1 0.504 2.17E-11 shape 10 Shape 3 3 0.740 1.31E-09 shape 7 Shape 2 2 0.682 4.49E-16 shape 13 Shape 4 4 1.023 4.37E-09 mus1.4 Mean 4 1 1.761 6.13E-22

rientation 54

Orientation 4 2 0 534 1 26E-08

rientation 54

Orientation 4 2 0.534 1.26E 08 mus1.5 Mean 5 1 1.657 2.47E-21 vars22.2 Variance 2 2 0.282 5.45E-05

rientation 59

Orientation 4 3 0.548 1.51E-04 vars22.3 Variance 3 2 0.146 1.09E-05

rientation 47

Orientation 3 4 0.561 4.65E-05

rientation 43

Orientation 3 3 0.066 8.01E-05

rientation 70

Orientation 5 2 0.308 4.07E-03 scale3 Scale 3 NA 0.063 2.19E-04 mus1.2 Mean 2 1 1.571 1.52E-18 vars11 2 Variance 2 1 0 131 1 62E 04 vars11.2 Variance 2 1 0.131 1.62E-04 vars22.5 Variance 5 2 0.023 2.65E-04

SLIDE 17

Example 3: Identifying a rare cell population

Regulatory T cells occur as a less Than 0 5 1 0% population in human

p y g p p

Than 0.5-1.0% population in human peripheral blood mononuclear cells

PE

3-PE Foxp3- Foxp3

1 7

Baecher-Allan et al., JI, 2006

SLIDE 18

Stepwise detection of Tregs Stepwise detection of Tregs

Step 1 Step 2

SLIDE 19

Overview

Operator/QC p /Q FLAME

SLIDE 20

FLAME

Automated analysis method
Deconstructs the components of a mixture of

cells C i ll l l

Cross-registers cell clusters across samples
Provides a specific record of analysis

parameters allowing exact replication of an parameters, allowing exact replication of an analysis by a third party

Operator Modes:

Ope ato

des

– Cell population discovery mode – Clinical trial mode

SLIDE 21

Availability

Free software
Available through the GenePattern toolkit on the

Broad Institute website

http://www broadinstitute org/cancer/software/genepattern/index htm – http://www.broadinstitute.org/cancer/software/genepattern/index.htm l

GenePattern – an environment with pipelining capabilities

and a repertoire of downstream analysis tools and a repertoire of downstream analysis tools

Pyne et al. Proc Natl Acad Sci USA 2009; 106: 8519-8524.

SLIDE 22

Acknowledgements

De Jager lab

– Cristin Aubin

Jill Mesirov

– Saumyadipta Pyne Cristin Aubin – Aaron Brandes – Becky Briskin – Lori Chibnik Saumyadipta Pyne – Pablo Tamayo

Geoff McLachlan

– Portia Chipendo – Xinli Hu – Linda Ottoboni

Kui Wang
David Hafler

– Nikolaos Patsopoulos – Joshua Shulman – Dong Tran Irene Wood – Clare Baecher-Allan – Lisa Maier – Irene Wood – Zongqi Xia Funding Sources

National MS Society
National MS Society
NIH: NIA, NINDS

SLIDE 23

Illustrative Examples Illustrative Examples

SLIDE 24

Example 3: Feature selection – Phosphorylation of naïve & memory T cells pre- and post-stimulation

4-dimensional samples (ZAP70Y292 t h )

Mixture modeling

(ZAP70Y292 not shown)

Mixture modeling

5RA CD45

2 4

CD4

SLIDE 25

Phosphorylation causes feature lt ti i l ti alterations in populations

0 min. 5 min. Pre-stimulation Post-stimulation

SLIDE 26

Matching pre- and post- ti l ti l ti stimulation populations

2 6

pre-stimulation post-stimulation

SLIDE 27

Matching pre- and post- stimulation populations across all samples populations across all samples

SLIDE 28

Feature Selection Heatmap

Zero-minutes Five-minutes

2 8

SLIDE 29

D t QC/ t d di ti Data QC/standardization

Carefully selected panels
Carefully selected panels
Minimal cross‐sample variation

SLIDE 30

FLow analysis with Automated Multivariate Estimation

10/01/2008 10/01/2008

SLIDE 31

Low dimension t- mixture mixture

Outliers ?

Low dimension clustering is not good enough

SLIDE 32

M lti i t t i t i b tt Multivariate t-mixture is better

?

3 2

Symmetric density is often not good enough

SLIDE 33

Modeling with skewed distributions distributions

Better fit with skew

Sk Skew N

Skew-normal distribution

3 3

Photo courtesy: Azzalini J.M. et al. Statistical applications of the multivariate skew-normal distribution, 1999.

SLIDE 34

parametric mixture modeling

A biological population is assumed to follow a mathematical distribution,

such as Gaussian

Each population can be abstracted as a cluster described by parameters such
Each population can be abstracted as a cluster, described by parameters, such

as mean, mode, standard deviation, and skew, etc.

A mixture of populations can be abstracted as a mixture of distributions

SLIDE 35

Modeling with Gaussian

G i b t “ ki ” t Gaussian may be too “skinny” to capture the entire population

SLIDE 36

Modeling with skew‐t

T l t d d ib Tolerates and describes the skew of the population

SLIDE 37

Exploratory discovery of CD4+CD25highFoxp3+ regulatory CD4+CD25

g Foxp3+ regulatory

T cell population CD4 not shown

SLIDE 38

Input: 4-variate mixture component parameters p p p

Points labeled with five colors that represent that represent five populations used to cluster a flow sample

3 8

Dimensions: Cd56 Cd8 Cd4 Cd3

SLIDE 39

ABSTRACTING FLOW CYTOMETRY DATA DATA

Feature location

25th percentile contours (clouds)

3 9

No more data points!

SLIDE 40

COMPARING CLUSTERS ACROSS SUBJECTS SUBJECTS

Overlay features features across all samples

Healthy Disease Points represent population locations for all (healthy & disease) samples

SLIDE 41

META-CLUSTERING: SOLVING CLUSTER CORRESPONDENCE

Meta-cluster ( h d f )

CORRESPONDENCE

(matched features) PAM l t i clustering

Healthy Disease

SLIDE 42

INPUT TO CLASSIFIER: MATCHED POPULATION FEATURES FEATURES

4 2

SLIDE 43

Overview

Operator/QC p /Q FLAME

SLIDE 44

Example 2: Identifying a rare cell population

Regulatory T cells occur as a less Than 0 5 1 0% population in human

p y g p p

Than 0.5-1.0% population in human peripheral blood mononuclear cells

PE

3-PE Foxp3- Foxp3

4 4

Baecher-Allan et al., JI, 2006

SLIDE 45

Stepwise detection of Tregs Stepwise detection of Tregs

Step 1 Step 2

SLIDE 46

Modeling Assumptions Modeling Assumptions

One cell population

⇒ one “cloud” of points ⇒ one multivariate component to fit

Mixture of populations

in one sample

⇒ o e u t a ate co po e t to t ⇒ complex mixture of “clouds” ⇒ mixture of multivariate components to fit

Mixtures of populations

in multiple samples

p ⇒ register/align components across samples ⇒ multiple sample analysis of mixture models

in multiple samples

p p y