Automated High-dimensional Cytometric Data Analysis Cytometric Data - - PowerPoint PPT Presentation

automated high dimensional cytometric data analysis
SMART_READER_LITE
LIVE PREVIEW

Automated High-dimensional Cytometric Data Analysis Cytometric Data - - PowerPoint PPT Presentation

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Womens Hospital Assistant Professor of Neurology Harvard


slide-1
SLIDE 1

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis

Philip L. De Jager, M.D. Ph.D.

Director, Program in Translational NeuroPsychiatric Genomics Brigham & Women’s Hospital Assistant Professor of Neurology Harvard Medical School

slide-2
SLIDE 2

Challenges in cytometric analysis

  • Large amount of high dimensional data

Challenges in cytometric analysis

  • Large amount of high dimensional data
  • Manual data processing (subjective, slow)
  • Not suitable for high-throughput study

g g p y

  • Difficult to use in inferential analysis

– “hypothesis limited”

  • Sub optimal usage of data dimensions
  • Sub-optimal usage of data dimensions
  • Increasingly multi-parametric
  • Restricted visualization

Solution: Automated & Multivariate Analysis

2

Automated & Multivariate Analysis

slide-3
SLIDE 3

FLAME FLow cytometry analysis with Automated

l i i d l i i i

y y y Multivariate Estimation

  • Clustering – parametric and multivariate mixture

modeling of the populations in each flow sample

  • Meta clustering

match the corresponding

  • Meta-clustering – match the corresponding

populations from multiple samples to compare features of these matched populations

  • Feature selection – identify features that

distinguish populations between different classes (such as normal vs. disease, wt vs. mutant, (suc as o a s. d sease, t s. uta t, longitudinal observations, etc.)

  • Classification – predict class membership for new

l b d h di i i f samples based on those distinctive features

slide-4
SLIDE 4
  • 2. Meta-clustering

FLAME summary

  • 1. Clustering

flow data

FLAME summary

  • 3. Feature Selection

Sample 1

class1 class2 class3

  • Frequencies
  • Locations
  • Means
  • Modes
  • Variances
  • Scales
  • Orientations
  • Shapes

Sample 2 Downstream Analyses

  • Visualization

Cl Di Sample 3

  • Class Discovery
  • Class Prediction
  • Etc.
slide-5
SLIDE 5

FLAME Methodology Methodology

slide-6
SLIDE 6

Concept: Finite Mixture Model

Finite Mixture Model: weighted sum of g univariate or multivariate densities

Univariate Gaussian mixture Bivariate Gaussian mixture Univariate Gaussian mixture Bivariate Gaussian mixture

w2=0.5 w1=0.5 Fitted curve

6

µ1 µ2 µ3

g=3 Sum of 3 Gaussians curve g=2

slide-7
SLIDE 7

Different distributions

Skew N N Skew

slide-8
SLIDE 8

Model Selection options in FLAME p

Skew N N Skew

slide-9
SLIDE 9

Step 1: Fitting a distribution

  • Lymphoblastic cell line

Step 1: Fitting a distribution

9

slide-10
SLIDE 10

Fitting skew t deals with asymmetry

Skew Asymmetric Data

g y y

Gaussian Skew Asymmetric Data

Density Plot

Distribution Distribution

slide-11
SLIDE 11

Step 2: Meta-clustering Step 2: Meta clustering

  • 1. Input: Individual samples clustered by mixture model

T k ll l d l th i l t l ti

  • 2. Take all samples and pool their cluster locations
  • 3. Algorithm: Run Partitioning Around Medoids (PAM) to

3. go t : u a t t o g

  • u d

edo ds ( ) to

  • btain k meta-clusters

1 1

  • 4. Output: Matched features used for classification of samples
slide-12
SLIDE 12

Example 2: Identifying discriminating features

  • Experiment: examine ZAP70 and

SLP76 phosphorylation events before p p y and after T cell receptor activation in naïve and memory T cells

  • Lymphocytes stained with four

Lymphocytes stained with four markers:

  • CD4
  • CD45RA
  • ZAP70Y292
  • SLP76Y128
  • SLP76Y128
  • 60 samples: 30 subjects x two time

points: pre- and post- anti-CD3 ib d i l i antibody stimulation

slide-13
SLIDE 13

Registering populations across samples

Pre‐stimulation samples Post‐stimulation samples p p

slide-14
SLIDE 14

a. c.

CD45RA CD45RA C C

e.

CD45RA

Sample 121106A_0min

b d

RA 5RA

b. d.

CD45R CD45

Pre-stimulation P t ti l ti

Sample 121106A_5min

Post-stimulation

slide-15
SLIDE 15

Step 3: Discriminating features

zero minute five minute Pre-stimulation Post-stimulation zero-minute five-minute

slide-16
SLIDE 16

Discriminating features

IV II III

feature name Feature Type Cluster # Dimension(s) ∆mean [five- min] p-value vars11.4 Variance 4 1

  • 0.156

1.65E-18

  • rientation 72

Orientation 5 3

  • 0.649

1.01E-14

  • rientation 56

Orientation 4 3

  • 0.609

1.13E-12 vars11.5 Variance 5 1

  • 0.082

4.00E-08

  • rientation 66

Orientation 5 1

  • 0.515

1.37E-05 shape 11 Shape 3 3

  • 0.175

2.62E-08 scale4 Scale 4 NA

  • 0.052

3.32E-06

A

II I

  • rientation 19

Orientation 2 1

  • 0.632

1.34E-06 shape 8 Shape 2 2

  • 0.141

4.41E-09 shape 15 Shape 4 4

  • 0.178

5.17E-07 vars41.5 Variance 5 1,4

  • 0.024

2.63E-05

  • rientation 42

Orientation 3 3

  • 0.422

9.73E-04 shape 20 Shape 5 4

  • 0.060

7.93E-05 scale5 Scale 5 NA

  • 0.038

7.23E-04 vars43.3 Variance 3 3,4

  • 0.020

7.10E-04 vars31.4 Variance 4 1,3

  • 0.015

3.34E-03 vars11.3 Variance 3 1 0.314 6.22E-12

CD45RA

I V

  • rientation 52

Orientation 4 1 0.552 1.87E-10 vars21.2 Variance 2 1,2 0.251 1.22E-10 vars21.3 Variance 3 1,2 0.259 1.14E-11 vars21.4 Variance 4 1,2 0.060 3.42E-08

  • rientation 20

Orientation 2 1 0.504 2.17E-11 shape 10 Shape 3 3 0.740 1.31E-09 shape 7 Shape 2 2 0.682 4.49E-16 shape 13 Shape 4 4 1.023 4.37E-09 mus1.4 Mean 4 1 1.761 6.13E-22

  • rientation 54

Orientation 4 2 0 534 1 26E-08

  • rientation 54

Orientation 4 2 0.534 1.26E 08 mus1.5 Mean 5 1 1.657 2.47E-21 vars22.2 Variance 2 2 0.282 5.45E-05

  • rientation 59

Orientation 4 3 0.548 1.51E-04 vars22.3 Variance 3 2 0.146 1.09E-05

  • rientation 47

Orientation 3 4 0.561 4.65E-05

  • rientation 43

Orientation 3 3 0.066 8.01E-05

  • rientation 70

Orientation 5 2 0.308 4.07E-03 scale3 Scale 3 NA 0.063 2.19E-04 mus1.2 Mean 2 1 1.571 1.52E-18 vars11 2 Variance 2 1 0 131 1 62E 04 vars11.2 Variance 2 1 0.131 1.62E-04 vars22.5 Variance 5 2 0.023 2.65E-04

slide-17
SLIDE 17

Example 3: Identifying a rare cell population

Regulatory T cells occur as a less Than 0 5 1 0% population in human

p y g p p

Than 0.5-1.0% population in human peripheral blood mononuclear cells

  • PE

3-PE Foxp3- Foxp3

1 7

Baecher-Allan et al., JI, 2006

slide-18
SLIDE 18

Stepwise detection of Tregs Stepwise detection of Tregs

Step 1 Step 2

slide-19
SLIDE 19

Overview

Operator/QC p /Q FLAME

slide-20
SLIDE 20

FLAME

  • Automated analysis method
  • Deconstructs the components of a mixture of

cells C i ll l l

  • Cross-registers cell clusters across samples
  • Provides a specific record of analysis

parameters allowing exact replication of an parameters, allowing exact replication of an analysis by a third party

  • Operator Modes:

Ope ato

  • des

– Cell population discovery mode – Clinical trial mode

slide-21
SLIDE 21

Availability

  • Free software
  • Available through the GenePattern toolkit on the

Broad Institute website

http://www broadinstitute org/cancer/software/genepattern/index htm – http://www.broadinstitute.org/cancer/software/genepattern/index.htm l

  • GenePattern – an environment with pipelining capabilities

and a repertoire of downstream analysis tools and a repertoire of downstream analysis tools

  • Pyne et al. Proc Natl Acad Sci USA 2009; 106: 8519-8524.
slide-22
SLIDE 22

Acknowledgements

  • De Jager lab

– Cristin Aubin

  • Jill Mesirov

– Saumyadipta Pyne Cristin Aubin – Aaron Brandes – Becky Briskin – Lori Chibnik Saumyadipta Pyne – Pablo Tamayo

  • Geoff McLachlan

– Portia Chipendo – Xinli Hu – Linda Ottoboni

  • Kui Wang
  • David Hafler

– Nikolaos Patsopoulos – Joshua Shulman – Dong Tran Irene Wood – Clare Baecher-Allan – Lisa Maier – Irene Wood – Zongqi Xia Funding Sources

  • National MS Society
  • National MS Society
  • NIH: NIA, NINDS
slide-23
SLIDE 23

Illustrative Examples Illustrative Examples

slide-24
SLIDE 24

Example 3: Feature selection – Phosphorylation of naïve & memory T cells pre- and post-stimulation

4-dimensional samples (ZAP70Y292 t h )

Mixture modeling

(ZAP70Y292 not shown)

Mixture modeling

5RA CD45

2 4

CD4

slide-25
SLIDE 25

Phosphorylation causes feature lt ti i l ti alterations in populations

0 min. 5 min. Pre-stimulation Post-stimulation

slide-26
SLIDE 26

Matching pre- and post- ti l ti l ti stimulation populations

2 6

pre-stimulation post-stimulation

slide-27
SLIDE 27

Matching pre- and post- stimulation populations across all samples populations across all samples

slide-28
SLIDE 28

Feature Selection Heatmap

Zero-minutes Five-minutes

2 8

slide-29
SLIDE 29

D t QC/ t d di ti Data QC/standardization

  • Carefully selected panels
  • Carefully selected panels
  • Minimal cross‐sample variation
slide-30
SLIDE 30

FLow analysis with Automated Multivariate Estimation

10/01/2008 10/01/2008

slide-31
SLIDE 31

Low dimension t- mixture mixture

Outliers ?

Low dimension clustering is not good enough

slide-32
SLIDE 32

M lti i t t i t i b tt Multivariate t-mixture is better

?

3 2

Symmetric density is often not good enough

slide-33
SLIDE 33

Modeling with skewed distributions distributions

Better fit with skew

Sk Skew N

Skew-normal distribution

3 3

Photo courtesy: Azzalini J.M. et al. Statistical applications of the multivariate skew-normal distribution, 1999.

slide-34
SLIDE 34

parametric mixture modeling

  • A biological population is assumed to follow a mathematical distribution,

such as Gaussian

  • Each population can be abstracted as a cluster described by parameters such
  • Each population can be abstracted as a cluster, described by parameters, such

as mean, mode, standard deviation, and skew, etc.

  • A mixture of populations can be abstracted as a mixture of distributions
slide-35
SLIDE 35

Modeling with Gaussian

G i b t “ ki ” t Gaussian may be too “skinny” to capture the entire population

slide-36
SLIDE 36

Modeling with skew‐t

T l t d d ib Tolerates and describes the skew of the population

slide-37
SLIDE 37

Exploratory discovery of CD4+CD25highFoxp3+ regulatory CD4+CD25

g Foxp3+ regulatory

T cell population CD4 not shown

slide-38
SLIDE 38

Input: 4-variate mixture component parameters p p p

Points labeled with five colors that represent that represent five populations used to cluster a flow sample

3 8

Dimensions: Cd56 Cd8 Cd4 Cd3

slide-39
SLIDE 39

ABSTRACTING FLOW CYTOMETRY DATA DATA

Feature location

25th percentile contours (clouds)

3 9

No more data points!

slide-40
SLIDE 40

COMPARING CLUSTERS ACROSS SUBJECTS SUBJECTS

Overlay features features across all samples

Healthy Disease Points represent population locations for all (healthy & disease) samples

slide-41
SLIDE 41

META-CLUSTERING: SOLVING CLUSTER CORRESPONDENCE

Meta-cluster ( h d f )

CORRESPONDENCE

(matched features) PAM l t i clustering

Healthy Disease

slide-42
SLIDE 42

INPUT TO CLASSIFIER: MATCHED POPULATION FEATURES FEATURES

4 2

slide-43
SLIDE 43

Overview

Operator/QC p /Q FLAME

slide-44
SLIDE 44

Example 2: Identifying a rare cell population

Regulatory T cells occur as a less Than 0 5 1 0% population in human

p y g p p

Than 0.5-1.0% population in human peripheral blood mononuclear cells

  • PE

3-PE Foxp3- Foxp3

4 4

Baecher-Allan et al., JI, 2006

slide-45
SLIDE 45

Stepwise detection of Tregs Stepwise detection of Tregs

Step 1 Step 2

slide-46
SLIDE 46

Modeling Assumptions Modeling Assumptions

  • One cell population

⇒ one “cloud” of points ⇒ one multivariate component to fit

  • Mixture of populations

in one sample

⇒ o e u t a ate co po e t to t ⇒ complex mixture of “clouds” ⇒ mixture of multivariate components to fit

  • Mixtures of populations

in multiple samples

p ⇒ register/align components across samples ⇒ multiple sample analysis of mixture models

in multiple samples

p p y