flowMerge: Merging Mixture Components to Identify Distinct Cell - - PowerPoint PPT Presentation

flowmerge merging mixture components to identify distinct
SMART_READER_LITE
LIVE PREVIEW

flowMerge: Merging Mixture Components to Identify Distinct Cell - - PowerPoint PPT Presentation

flowMerge: Merging Mixture Components to Identify Distinct Cell Populations in Flow Cytometry Greg Finak, PhD R. Gottardo Laboratory, Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center flowCAP 2010 Summit. September


slide-1
SLIDE 1

flowMerge: Merging Mixture Components to Identify Distinct Cell Populations in Flow Cytometry

Greg Finak, PhD

  • R. Gottardo Laboratory,

Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center

flowCAP 2010 Summit. September 21, 2010

Greg Finak (FHCRC) flowMerge flowCAP 2010 1 / 25

slide-2
SLIDE 2

Outline

Outline

1 Introduction

Goals of Automated Gating Challenges for Automated Gating

2 The flowClust and flowMerge Algorithms

The flowMerge Algorithm Shortfalls of flowClust / flowMerge

3 flowCAP

Algorithm Settings and Gating Strategy Our Take–home lessons Future Improvements to flowClust / flowMerge Acknowledgements

Greg Finak (FHCRC) flowMerge flowCAP 2010 2 / 25

slide-3
SLIDE 3

Introduction Goals of Automated Gating

Goals of Automated Gating

.

In an Ideal World:

. . . . . . . . Identify the same populations that a human expert can identify... as well as those they can’t. Specifically Identify biologically relevant cell populations. Classify events into (one or more) of the identified cell populations. Do it accurately. (relative to some standard) Do it quickly (or at least faster than the human expert). Many approaches, both parametric and non–parametric.

Greg Finak (FHCRC) flowMerge flowCAP 2010 3 / 25

slide-4
SLIDE 4

Introduction Challenges for Automated Gating

Characteristics of FCM Data

.

Globally FCM Data is Well Represented by a Mixture of Distributions. However

. . . . . . . . Cell populations in FCM data tend to be noisy, asymmetric, overlapping and not always well resolved by existing markers. Not all populations in an experiment are of interest to the question at hand. From a modelling perspective. The distributions of individual cell populations are not ” nice” Noisy, non-gaussian, asymmetric, have non–constant variance. Gating strategies depend on the data.

Greg Finak (FHCRC) flowMerge flowCAP 2010 4 / 25

slide-5
SLIDE 5

Introduction Mixture Models

Quick Intro to Mixture Models I

.

Mixture Model

. . . . . . . . Model a complicated distribution using a weighted combination of ” simpler”distributions. f0(y) =

G

g=1

πgfg(y|θ) fg(·)’s can be any distributions. In practice: Multivariate Gaussian Multivariate–t (flowClust) Multivariate–t with Box–Cox transformation (flowClust) Skewed Multivariate–t (FLAME)

Greg Finak (FHCRC) flowMerge flowCAP 2010 5 / 25

slide-6
SLIDE 6

Introduction Mixture Models

Quick Intro to Mixture Models II

−2 2 4 6 0.00 0.05 0.10 0.15 0.20 0.25 Component 1 Component 2

Greg Finak (FHCRC) flowMerge flowCAP 2010 6 / 25

slide-7
SLIDE 7

Introduction Mixture Models

Quick Intro to Mixture Models III

.

Gaussian Mixtures

. . . . . . . . Spherical or ellipsoidal covariance .

t Distribution

. . . . . . . . Robust to outliers .

Box–Cox Transformation

. . . . . . . . Allows for asymmetry

Greg Finak (FHCRC) flowMerge flowCAP 2010 7 / 25

slide-8
SLIDE 8

Introduction Data Transformations

The Box–Cox Transformation

Flow cytometry data is usually transformed prior to gating. arcsinh, log, logicle, Box–Cox Individual populations can still be skewed. .

Generalized Box–Cox Transform

. . . . . . . . x =

  

sgn(y)|y|λ−1 λ

if λ ̸= 0 log(y)

  • therwise

The Box–Cox encompasses the power, square, and log transformations, depending on the value of λ flowClust implements the Box–Cox as part of the model fitting procedure.

Greg Finak (FHCRC) flowMerge flowCAP 2010 8 / 25

slide-9
SLIDE 9

The flowClust and flowMerge Algorithms flowClust

Automated Gating With flowClust I

.

flowClust (Lo K, Brinkman R, Gotardo R. Cytometry A, 2008

. . . . . . . . A robust, flexible, model–based approach to automated gating of flow cytometry data. Mixture model framework. Multivariate-t - robust Box–Cox transformation - allows for asymmetric popualtions.

Greg Finak (FHCRC) flowMerge flowCAP 2010 9 / 25

slide-10
SLIDE 10

The flowClust and flowMerge Algorithms flowClust

Automated Gating With flowClust II

Normal-Gamma compound parameterization of the multivariate-t .

The flowClust Model (Lo et. al.) Complete data log-likelihood

. . . . . . . . Lc(Ψ|y, z, u) =

n

i=1 G

g=1

zig log

{

πgϕp(y(λ)

i

|µg, Σg/ui) ·|Jp(yi; λ)| · Ga(ui, ν 2, ν 2)

}

Ψ = {µg, Σg, πg, ν, λ} population means, covariances, proportions, transformation parameter and degrees of freedom for the t–distribution; Can be computed efficiently via EM.

Greg Finak (FHCRC) flowMerge flowCAP 2010 10 / 25

slide-11
SLIDE 11

The flowClust and flowMerge Algorithms Model Selection

flowClust: Model Selection

.

Standard approach using BIC (Bayesian information Criterion)

. . . . . . . . BIC = −2 ln(L) + k ln(n) fit flowClust models with G = 1 through G = 20 clusters Choose the model with the largest BIC value. When G is large, there are many events, or many samples, this becomes time-consuming. Can be parallelized.

  • 5

10 15 20 −180000 −175000 −170000 −165000

BIC

# of Clusters BIC

Greg Finak (FHCRC) flowMerge flowCAP 2010 11 / 25

slide-12
SLIDE 12

The flowClust and flowMerge Algorithms flowClust: Problems

Problems with flowClust

. . . . . . . G fixed to the ” true”number of populations doesn’t necessarily give the best model fit. Multiple mixture components represent the same cell population.

200 400 600 800 1000 200 400 600 800 FL1−H FL2−H

flowClust: G=4

19% misclassified 200 400 600 800 1000 200 400 600 800 FL1−H FL2−H

flowClust: G=9

18% misclassified

Greg Finak (FHCRC) flowMerge flowCAP 2010 12 / 25

slide-13
SLIDE 13

The flowClust and flowMerge Algorithms flowMerge

flowMerge: Modelling Distinct Cell Populations

.

flowMerge (Finak G, Bashashati A, Brinkman R, Gottardo R. Advances in Bioinformatics, 2009

. . . . . . . . Extends the flowClust methodology to identify and model distinct cell populations. Merges overlapping mixture components based on entropy. Summarizes merged components using a single multivariate–t distribution based on moment matching conditions.

Greg Finak (FHCRC) flowMerge flowCAP 2010 13 / 25

slide-14
SLIDE 14

The flowClust and flowMerge Algorithms Merging Cell Populations

Mixture Components and Entropy

Entropy measures the uncertainty of a random variable. For mixture models we define the entropy of clustering of a G–component mixture model. .

Definition

. . . . . . . . Entropy of Clustering H (G) = −2

G

i=1 N

j=1

zij log(zij ) c zij is the probability that cell j is assigned to population i. Overlapping mixture components: large uncertainty, high entropy.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 0.0 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 Entropy: Single Cell, Two Clusters P(Z=1) Entropy

Greg Finak (FHCRC) flowMerge flowCAP 2010 14 / 25

slide-15
SLIDE 15

The flowClust and flowMerge Algorithms The flowMerge Algorithm

The flowMerge Algorithm

1 Start with a max(BIC) flowClust model (k clusters). 2 Compute the entropy for all pairwise model components. 3 Merge the two components that contribute most to the entropy. 4 Recompute the pairwise entropy of the new merged cluster. 5 Repeat from 2. until one component remains. 6 Choose the ”

best”fitting merged model from the plot of Entropy vs Number of Clusters.

Greg Finak (FHCRC) flowMerge flowCAP 2010 15 / 25

slide-16
SLIDE 16

The flowClust and flowMerge Algorithms Summarizing Merged Components

Summarizing Components

We can summarize merged components using the same multivariate–t framework used in flowClust. p∗f∗ = pifi + pj fj .

Moment Matching Conditions

. . . . . . . . p∗ = pi + pj µ∗ = (piµi + pj µj ) p∗ ; Σ∗ = (ν∗ − 2)pi

[

νi νi−2Σi + µiµ′ i

]

p∗ν∗ + (ν∗ − 2)pj

[

νj νj −2Σj + µj µ′ j

]

p∗ν∗ − (ν∗ − 2)p∗µ∗µ′

p∗ν∗

Greg Finak (FHCRC) flowMerge flowCAP 2010 16 / 25

slide-17
SLIDE 17

The flowClust and flowMerge Algorithms flowClust vs flowMerge Example

flowClust vs flowMerge

101 102 103 104

flowClustICL solution CD7 FITC CD4 PE

101 102 103 104

CD7 FITC CD8 PC5

101 102 103 104 101 102 103 104

CD4 PE CD8 PC5

101 102 103 104 101 102 103 104 101 102 103 104

CD7 FITC CD4 PE

101 102 103 104

CD7 FITC CD8 PC5

101 102 103 104 101 102 103 104

CD4 PE CD8 PC5

101 102 103 104 101 102 103 104 101 102 103 104 101 102 103 104

CD7 FITC CD4 PE

101 102 103 104

CD7 FITC

101 102 103 104

CD8 PC5

101 102 103 104

CD8 PC5

101 102 103 104

CD4 PE

20000 60000 2 4 6 8 10 12

Number of Clusters Entropy A B C D Figure 4 flowClustICL solution flowClustICL solution flowClustBIC solution flowClustBIC solution flowClustBIC solution flowMerge solution flowMerge solution flowMerge solution

Greg Finak (FHCRC) flowMerge flowCAP 2010 17 / 25

slide-18
SLIDE 18

The flowClust and flowMerge Algorithms flowClust vs flowMerge Example

flowMerge on HSCT and WNV Data

  • 2000

6000 10000 500 1000 1500 2000 2500

Entropy of Clustering

Cumulative Number of Merged Observations Entropy 200 400 600 800 1000 200 400 600 800 FL1−H FL2−H

flowMerge: G=5

1.8% misclassified 50000 150000 250000 50000 150000 250000

WNV Data

CFSE−A PE−Cy5−A

  • 0e+00

2e+05 4e+05 6e+05 8e+05 50000 100000 150000

Entropy of Clustering

Cumulative Number of Merged Observations Entropy rue'' Number of Populations WNV Data

Greg Finak (FHCRC) flowMerge flowCAP 2010 18 / 25

slide-19
SLIDE 19

The flowClust and flowMerge Algorithms Shortfalls of flowClust / flowMerge

flowMerge Caveats

.

Things to watch out for

. . . . . . . . Automated model selection based on a heuristic. Not always a good choice. Starting with the max BIC model is sometimes flawed. Models with K = ” true # of clusters”don’t always fit as well as K = ” true number + some outlier clusters” . Multi–stage gating (gate scatter, then gate fluorescence) may be an invalid assumption. Some data sets are just difficult to gate (i.e. WNV) Sometimes lacks modelling flexibility (common ν, λ).

Greg Finak (FHCRC) flowMerge flowCAP 2010 19 / 25

slide-20
SLIDE 20

flowCAP

The Challenges: A reminder

Challenge 1: Fully Automated Algorithms: we know nothing Challenge 2: Tuned Algorithms: Take a better guess at the number of populations Challenge 3: The number of populations is predefined. Challenge 4: The assignment of events to populations is known for some samples.

Greg Finak (FHCRC) flowMerge flowCAP 2010 20 / 25

slide-21
SLIDE 21

flowCAP Algorithm Settings and Gating Strategy

Gating Strategies Change Depending on the Data I

Scatter Sc Pop1 Sc Pop2 Sc Pop3 Fl Pop 1 Fl Pop 2 Fl Pop 3 Fl Pop 4 Fl Pop 5 Fl Pop 6 Fl Pop 7 Fl Pop 8 Fl Pop 9

50000 100000 150000 200000 50000 100000 150000 200000 FSC−A SSC−A

.

Challenge 1

. . . . . . . . Automated parameter estimation. Automated model selection (number of clusters). Two–stage gating: big mistake.. .

Challenge 2

. . . . . . . . Automated or fixed parameter estimation. Automated or fixed model selection. Model class dependent on data set.

Greg Finak (FHCRC) flowMerge flowCAP 2010 21 / 25

slide-22
SLIDE 22

flowCAP Algorithm Settings and Gating Strategy

Gating Strategies Change Depending on the Data II

Fluorescence

Fl Pop 1 Fl Pop 2 Fl Pop 3

.

Challenge 3

. . . . . . . . Gating only fluorescence channels. Automated parameter estimation but fixed number of clusters to known values. (could do better). .

Challenge 4

. . . . . . . . Gating only fluorescence channels. Fully automated parameter estimation and model selection.

Greg Finak (FHCRC) flowMerge flowCAP 2010 22 / 25

slide-23
SLIDE 23

flowCAP Our Take–home lessons

Overall Impression

.

Don’t make assumptions

. . . . . . . . If you know nothing about the data, don’t make too many assumptions. (i.e. Challenge 1: two stage gating - mistake). .

One size does not fit all

. . . . . . . . flowClust is very flexibie. More model classes should be explored by default. .

Use prior information, but with caution

. . . . . . . . Models with exactly the ” true”number of clusters don’t necessarily provide best fit. Knowing some gating assignments and informative dimensions is better.

Greg Finak (FHCRC) flowMerge flowCAP 2010 23 / 25

slide-24
SLIDE 24

flowCAP Future Improvements to flowClust / flowMerge

Future Improvements

.

Model Selection

. . . . . . . . There is a clear need to improve our model-selection strategy to improve speed and fitting to diverse data. .

flowClust with Bayesian Priors

. . . . . . . . Include prior information about population locations for gating rare populations or repetitive gating of similar samples.

Greg Finak (FHCRC) flowMerge flowCAP 2010 24 / 25

slide-25
SLIDE 25

flowCAP Acknowledgements

Acknowledgements

flowCAP Organizing Committee Ryan Brinkman, BC Cancer Agency Raphael Gottardo, Fred Hutchinson Cancer Research Center Richard Scheuermannm, University of Texas Southwestern Medical Center Jill Schoenfeld, Tree Star Inc. Data BC Cancer Center Amgen, Inc. McMaster University Tree Star Inc. Funding NIH

Greg Finak (FHCRC) flowMerge flowCAP 2010 25 / 25