FlowCAP2 Results: Challenges 1, 2, and 3
Nima Aghaeepour
CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research,
University of British Columbia
Sep.22.2011
1 / 27
FlowCAP2 Results: Challenges 1, 2, and 3 Nima Aghaeepour CIHR/MSFHR - - PowerPoint PPT Presentation
FlowCAP2 Results: Challenges 1, 2, and 3 Nima Aghaeepour CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research, University of British Columbia Sep.22.2011 1 / 27 Problem Statement Binary Classification Goal: evaluate
Nima Aghaeepour
CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research,
University of British Columbia
Sep.22.2011
1 / 27
Binary Classification Goal: evaluate the ability of computational pipelines in finding cell populations that can discriminate between two classes:
1: HEU vs UE 2: AML vs normal 3a: ENV vs GAG 3b: Responders vs non-responders
Participants identify the cell populations that are different across the two classes.
2 / 27
10 20 30 40 0.0 0.2 0.4 0.6 0.8 Samples Probabilities
UE HEU
How does it generalize to previously unseen samples?
3 / 27
Binary Classification Two classes.
1
HEU vs UE
2
AML vs normal
3
ENV vs GAG
4
Responders vs non-responders
Participants identify the cell populations that are different across the two classes. Results will be tested on independent samples.
4 / 27
True(T), False(F), Positive(P) Negative(N) TP: An AML case marked as AML by a participants. FP: A normal case marked as AML by a participants. FN: An AML case marked as normal by a participants. TN: A normal case marked as normal by a participants. Accuracy Accuracy: (TP + TN)/(TP + TN + FP + FN) Sensitivity and Specificity Sensitivity: TP/(TP + FN) Specificity: TN/(FN + FP) F-measure F-measure: 2 ∗ Sensitivity ∗ Specificity/(Sensitivity + Specificity)
Should not be mistaken with FlowCAP1’s F-measure.
5 / 27
6 / 27
Table 1: Challenge 1: HEUvsUE
Sensitivity Specificity Accuracy F-measure 2DhistsSVM 0.50 0.50 0.50 0.50 flowBin 0.00 0.48 0.45 0.00 flowType 0.58 0.60 0.59 0.59 flowType-FeaLect 0.33 0.38 0.36 0.36 PBSC 0.55 0.55 0.55 0.55 PramSpheres 0.36 0.36 0.36 0.36 SWIFT 0.67 0.62 0.64 0.64 Note: FLOCK has been renamed to PBSC.
7 / 27
HEU vs UE Random: 0.5 How can some of them be worst than random? Have we been able to find something meaningful?
Cross-validation Holdout validation (using other time points).
Algorithms F−measures SWIFT flowType PBSC 2DhistsSVM PramSpheres flowType−FeaLect flowBin 0.0 0.1 0.2 0.3 0.4 0.5 0.6
8 / 27
10 20 30 40 0.0 0.2 0.4 0.6 0.8 Samples Probabilities
UE HEU
9 / 27
10 20 30 40 0.0 0.2 0.4 0.6 0.8 Samples Probabilities
UE HEU
10 / 27
SampleNumber MisClassifications 1 2 3 4 5 6 UE HEU
11 / 27
12 / 27
AML Three perfect classification of 360 patients.
Algorithms F−measures 0.70 0.75 0.80 0.85 0.90 0.95 1.00 flowPeakssvm flowType−FeaLect SPADE 2DhistsSVM EMMIXCYTOM flowType RandomSpheres flowBin PBSC
13 / 27
Table 2: Challenge 2: AML
Sensitivity Specificity Accuracy F-measure 2DhistsSVM 1.00 0.99 0.99 1.00 EMMIXCYTOM 0.95 0.99 0.99 0.97 PBSC 0.75 0.97 0.94 0.85 flowBin 1.00 0.92 0.92 0.96 flowPeakssvm 1.00 1.00 1.00 1.00 flowType 0.95 0.99 0.99 0.97 flowType-FeaLect 1.00 1.00 1.00 1.00 RandomSpheres 0.95 0.99 0.99 0.97 SPADE 1.00 1.00 1.00 1.00
14 / 27
50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0 Samples Probabilities
Normal AML
15 / 27
50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0 Samples Probabilities
Normal AML
16 / 27
SampleNumber MisClassifications 2 4 6 8 10 12 normal aml
17 / 27
400 600 800 1000 100 200 300 400 FS Lin SS Log
0.9%
Normal
400 600 800 1000 100 200 300 400 FS Lin SS Log
21%
AML
400 600 800 1000 100 200 300 400 FS Lin SS Log
17%
Outlier
This dataset, perhaps, requires analysis of one marker at a time. Potential challenge for FlowCAP3: a dataset in which multiple markers should be used to find a rare cell populations.
18 / 27
19 / 27
HVTNa Six perfect classifiers for 40 patients.
F−measures 0.75 0.80 0.85 0.90 0.95 1.00 flowCore−flowStats flowType−FeaLect Kmeanssvm PRAMS SPADE SWIFT PBSC PramSpheres flowType
20 / 27
Table 3: Challenge 3: HVTNa
Sensitivity Specificity Accuracy F-measure PBSC 0.95 0.95 0.95 0.95 flowType 0.88 0.76 0.81 0.82 flowType-FeaLect 1.00 1.00 1.00 1.00 flowCore-flowStats 1.00 1.00 1.00 1.00 Kmeanssvm 1.00 1.00 1.00 1.00 PRAMS 1.00 1.00 1.00 1.00 PramSpheres 0.90 0.90 0.90 0.90 SPADE 1.00 1.00 1.00 1.00 SWIFT 1.00 1.00 1.00 1.00
21 / 27
SampleNumber MisClassifications 0.0 0.5 1.0 1.5 2.0 GAG ENV
22 / 27
23 / 27
HVTNb Maximum of 0.8 F-measure against cytokine reposes measured by a human across 80
been wrong?
Algorithms F−measures 0.0 0.2 0.4 0.6 0.8 flowCore−flowStats SPADE SWIFT PBSC
24 / 27
Table 4: Challenge 3: HVTNb
Sensitivity Specificity Accuracy F-measure PBSC 0.27 0.89 0.81 0.42 flowCore-flowStats 0.79 1.00 0.96 0.88 SPADE 0.67 0.99 0.93 0.80 SWIFT 0.43 0.98 0.83 0.60
25 / 27
1 4 7 10 14 18 22 26 30 34 38 42 3 6 9 12 16 20 24 28 32 36 40 SampleNumber MisClassifications 1 2 3 4 − +
26 / 27
FlowCAP CC Ryan Brinkman, Raphael Gottardo, Tim Mosmann, Richard Scheuermann Upenn Wade Rogers CFRI Tobias Kollman FHCRC Steve De Rosa UBC Holger Hoos FlowCAP Participants Funding FlowCAP is supported by NIH/NIBIB grant (EB008400). The FlowCAP summits are supported by NIH/NIAID. 27 / 27