flowBin: A Complete Pipeline for Feature Extraction and - - PowerPoint PPT Presentation

flowbin a complete pipeline for feature extraction and
SMART_READER_LITE
LIVE PREVIEW

flowBin: A Complete Pipeline for Feature Extraction and - - PowerPoint PPT Presentation

flowBin: A Complete Pipeline for Feature Extraction and Classification of Multi-tube Flow Cytometry Data Kieran ONeill Terry Fox Laboratory, BC Cancer Agency September 22, 2011 Kieran ONeill (TFL) FlowBin September 22, 2011 1 / 18


slide-1
SLIDE 1

flowBin: A Complete Pipeline for Feature Extraction and Classification

  • f

Multi-tube Flow Cytometry Data

Kieran O’Neill

Terry Fox Laboratory, BC Cancer Agency

September 22, 2011

Kieran O’Neill (TFL) FlowBin September 22, 2011 1 / 18

slide-2
SLIDE 2

Background

Background

Kieran O’Neill (TFL) FlowBin September 22, 2011 2 / 18

slide-3
SLIDE 3

Background

Multi Tube/well Flow Cytometry

◮ Why? Get more colours ◮ Use common parameters in all tubes to identify populations ◮ Get some further information out of other parameters, often compared to negative control ◮ Two common use cases:

1

Determine immunophenotype of identified population

2

Determine immunological response to stimulus

Kieran O’Neill (TFL) FlowBin September 22, 2011 3 / 18

slide-4
SLIDE 4

Background

Multiplexed flow cytometry

Bone marrow aspirate SSC CD45 Isotype controls Cell surface markers

.

Intracellular markers CD45 CD45 SSC SSC

. . . . .

aliquots flow cytometry data

Kieran O’Neill (TFL) FlowBin September 22, 2011 4 / 18

slide-5
SLIDE 5

Background

Typical Manual Expert’s Approach

Tube 1. Gate blasts on CD45/SS, then set autofluorescence thresholds. Tube 2 (and subsequent). Gate blasts on CD45/SS, then look at expression relative to autofluorescence.

Kieran O’Neill (TFL) FlowBin September 22, 2011 5 / 18

slide-6
SLIDE 6

Feature Extraction

Feature Extraction

Kieran O’Neill (TFL) FlowBin September 22, 2011 6 / 18

slide-7
SLIDE 7

Feature Extraction

FlowBin Approach

1

Bin single tube in terms of population ID parameters (K-means; k=100, inspired by FlowMeans)

2

Map bins across tubes using 1-NN (after Pedreira et al)

3

Extract immunophenotype for each bin in terms of non-ID parameters

Kieran O’Neill (TFL) FlowBin September 22, 2011 7 / 18

slide-8
SLIDE 8

Feature Extraction

Binning and KNN Mapping of Bins

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

slide-9
SLIDE 9

Feature Extraction

Binning and KNN Mapping of Bins

1

K-means cluster tube 1

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

slide-10
SLIDE 10

Feature Extraction

Binning and KNN Mapping of Bins

1

K-means cluster tube 1

2

For each population

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

slide-11
SLIDE 11

Feature Extraction

Binning and KNN Mapping of Bins

1

K-means cluster tube 1

2

For each population

3

KNN map across tubes

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter Kieran O’Neill (TFL) FlowBin September 22, 2011 8 / 18

slide-12
SLIDE 12

Feature Extraction

Intra-sample, Inter-tube Variation and Quantile Normalization

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter

Surface

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter

Intracellular

SSC.H Empirical CDF 0.0 0.2 0.4 0.6 0.8 1.0 200 400 600 800 1000

ECDF (all tubes)

Kieran O’Neill (TFL) FlowBin September 22, 2011 9 / 18

slide-13
SLIDE 13

Feature Extraction

Intra-sample, Inter-tube Variation and Quantile Normalization

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter

Surface

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter

Intracellular

SSC.H Empirical CDF 0.0 0.2 0.4 0.6 0.8 1.0 200 400 600 800

ECDF (all tubes)

Kieran O’Neill (TFL) FlowBin September 22, 2011 9 / 18

slide-14
SLIDE 14

Feature Extraction

Measuring Immunophenotype/Response

For bin k, tube l , channel m exprk,l,m = log[median(x∗

k,l,m) − median(x∗ k,ctrl,m)]

◮ I use MFI with correction from negative control ◮ Options for other measures will be in final package ◮ Option to include MFIs of popualtion ID parameters

Kieran O’Neill (TFL) FlowBin September 22, 2011 10 / 18

slide-15
SLIDE 15

Feature Extraction

Results (for each sample)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600 800 1000 CD45 PerCP Side Scatter

CD117 SSC CD13 cytCD3 CD7 CD2 CD5 CD3 CD8 cytLactoferrin CD61 CD19 CD20 CD10 cytCD79a cytTdT CD34 CD14 CD56 cytCD22 CD45 cytMPO HLA CD33 CD64 CD4 FSC

sample_1946__26 sample_1946__30 sample_1946__61 sample_1946__43 sample_1946__9 sample_1946__73 sample_1946__1 sample_1946__8 sample_1946__97 sample_1946__72 sample_1946__70 sample_1946__59 sample_1946__10 sample_1946__12 sample_1946__99 sample_1946__51 sample_1946__53 sample_1946__92 sample_1946__17 sample_1946__39 sample_1946__81 sample_1946__41 sample_1946__65 sample_1946__71 sample_1946__24 sample_1946__4 sample_1946__62 sample_1946__84 sample_1946__11 sample_1946__48 sample_1946__45 sample_1946__86 sample_1946__55 sample_1946__36 sample_1946__38 sample_1946__23 sample_1946__89 sample_1946__76 sample_1946__88 sample_1946__3 sample_1946__40 sample_1946__67 sample_1946__54 sample_1946__100 sample_1946__56 sample_1946__13 sample_1946__63 sample_1946__87 sample_1946__7 sample_1946__91 sample_1946__32 sample_1946__85 sample_1946__31 sample_1946__5 sample_1946__44 sample_1946__50 sample_1946__80 sample_1946__49 sample_1946__83 sample_1946__90 sample_1946__93 sample_1946__25 sample_1946__47 sample_1946__19 sample_1946__29 sample_1946__18 sample_1946__20 sample_1946__16 sample_1946__96 sample_1946__60 sample_1946__95 sample_1946__35 sample_1946__33 sample_1946__57 sample_1946__21 sample_1946__14 sample_1946__27 sample_1946__98 sample_1946__34 sample_1946__77 sample_1946__79 sample_1946__82 sample_1946__78 sample_1946__69 sample_1946__22 sample_1946__2 sample_1946__66 sample_1946__42 sample_1946__46 sample_1946__52

0.4 0.8 1.2

Value

400 1000

Color Key and Histogram Count

Kieran O’Neill (TFL) FlowBin September 22, 2011 11 / 18

slide-16
SLIDE 16

Classification

Classification

Kieran O’Neill (TFL) FlowBin September 22, 2011 12 / 18

slide-17
SLIDE 17

Classification

Collating Sample Data

◮ Problem: need some common measure ◮ But bins are sample-specific ◮ First tried metaclustering (cluster clusters) ◮ Unsatisfactory, over-merges populations ◮ Solution: voting SVM classifier ◮ Pass all bins to classifier independently, labelled with sample label ◮ Take vote of each sample’s component bins when predicting

Kieran O’Neill (TFL) FlowBin September 22, 2011 13 / 18

slide-18
SLIDE 18

Classification

More Formally

Training: For sample j with k bins Set: Cjk = Cj Prediction: Cj =

  • if

k P(Ck = 0) > k P(Ck = 1)

1

  • therwise

Kieran O’Neill (TFL) FlowBin September 22, 2011 14 / 18

slide-19
SLIDE 19

Classification

Results

◮ Works pretty well when there is signal (see Challenge 2) ◮ And this is without any feature selection ◮ But tends to get class bias when no signal (Challenge 1, FLT3-ITD) ◮ Some performance bottlenecks ⊲ KNN mapping can take an hour or two for larger N (challenge 1) ⊲ Grid parameterization of SVM under CV also slow for more samples (challenge 2)

Kieran O’Neill (TFL) FlowBin September 22, 2011 15 / 18

slide-20
SLIDE 20

Conclusions

Conclusions

Kieran O’Neill (TFL) FlowBin September 22, 2011 16 / 18

slide-21
SLIDE 21

Conclusions

FlowBin Features

◮ In preparation for BioConductor ◮ User writes own per-sample pre-processing and loading code ◮ Everything else through to classification is provided ◮ Very close to biologists’ approach ◮ Treats each measured (non-ID) parameter independently

Kieran O’Neill (TFL) FlowBin September 22, 2011 17 / 18

slide-22
SLIDE 22

Conclusions

In Progress / Near Future

◮ Intertube quality control ◮ Other QC plots / reports (e.g. bin removal) ◮ FlowFP binning option ◮ Feature (FC parameter) selection ◮ Population selection ◮ Extracting relevant populations (at original FCS level) ◮ Nested cross-validation

Kieran O’Neill (TFL) FlowBin September 22, 2011 18 / 18

slide-23
SLIDE 23

Conclusions

Testing/Refinement

◮ Good value for Kmeans? (100 is arbitrary) ◮ Kmeans vs flowFP ◮ FlowFP binning option ◮ Empirical measurement of quantile normalization (flowFP) ◮ Tuning of population and feature selection FlowCAP2 data provides an excellent test bed.

Kieran O’Neill (TFL) FlowBin September 22, 2011 19 / 18