Phenotyping and Robust Feature Selection for Flow Cytometry Data
Nima Aghaeepour
CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research,
University of British Columbia
Sep 22, 2011
1 / 24
Phenotyping and Robust Feature Selection for Flow Cytometry Data - - PowerPoint PPT Presentation
Phenotyping and Robust Feature Selection for Flow Cytometry Data Nima Aghaeepour CIHR/MSFHR Strategic Training Program in Bioinformatics for Health Research, University of British Columbia Sep 22, 2011 1 / 24 Introduction Problem statement
1 / 24
2 / 24
aAn event is defined as progression to AIDS or initiation of HAART. 3 / 24
4 / 24
5 / 24
1
2
1
2
3
Lowest (371/86%) Highest (59/14%)
p < 8.6e−13 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0 E vent−free Proportion Y ears from Cell S ample
Lowest (387/90%) Highest (43/10%)
p < 1.8e−06 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
Lowest (356/83%) Highest (74/17%)
p < 4.6e−10 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
6 / 24
7 / 24
1 310 ≈ 60, 000 phenotypes 2 Cox Proportional Hazards Regression 3 Log rank test 4 Multiple testing 5 Sensitivity analysis 6 101 phenotypes remain statistically significant Phenotype p-value p-value CI adj p-value CPHR Coef Cell Freq 1 KI-67+CD8+CD27- 6.4e-07 (1.1e-12, 3.6e-03) 3e-04 35.2 0.00560 2 KI-67+CD8+CD57- 1.1e-06 (2.7e-13, 3.5e-03) 2e-06 28.3 0.00648 3 KI-67+CD45RO+ 8.9e-07 (2.1e-14, 2.0e-03) 4e-05 15.4 0.01343 4 KI-67+CD28-CD8- 8.3e-08 (6.9e-14, 1.6e-03) 2e-04 44.2 0.00523 5 KI-67+CD28-CD27- 7.1e-08 (1.5e-13, 3.0e-03) 2e-05 26.3 0.00874 6 KI-67+CD28- 1.9e-07 (3.9e-13, 3.3e-03) 2e-05 18.3 0.01053 7 KI-67+CD28-CD27-CCR7- 3.3e-09 (6.6e-14, 8.6e-04) 4e-04 43.0 0.00647 8 KI-67+CD28-CCR7- 3.3e-09 (3.2e-13, 7.6e-04) 3e-03 37.7 0.00739 9 KI-67+CD57-CD27-CCR7- 1.2e-08 (1.3e-13, 3.4e-03) 1e-03 36.8 0.00762 10 KI-67+CD57-CCR7- 2.7e-08 (5.3e-15, 1.2e-02) 2e-05 26.6 0.01008 . . . 101 KI-67+CD8+CD27- 6.4e-07 (2.3e-14, 1.1e-02) 2e-02 35.2 0.00560 8 / 24
phenotypes phenotypes
0.2 0.4 0.6 0.8 1
Value
0.5 1 1.5 2 2.5 3
Color Key and Density Plot Density
9 / 24
phenotypes phenotypes
0.2 0.4 0.6 0.8 1
Value
0.5 1 1.5 2 2.5 3
Color Key and Density Plot Density
KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 Markers Phenotypes
Positive Neutral Negative
10 / 24
1
2
K I − 6 7 C D 2 8 C D 4 5 R O C D 8 C D 4 C D 5 7 C C R 5 C D 2 7 C C R 7 C D 1 2 7 0.00 0.02 0.04 Marker Impact P
Mixed Negative K I − 6 7 C D 2 8 C D 4 5 R O C D 8 C D 4 C D 5 7 C C R 5 C D 2 7 C C R 7 C D 1 2 7 0.000 0.010 0.020 Phenotype Name K I − 6 7 C D 2 8 C D 4 5 R O C D 8 C D 4 C D 5 7 C C R 5 C D 2 7 C C R 7 C D 1 2 7 0.00 0.02 0.04
11 / 24
K I − 6 7 C D 2 8 C D 4 5 R O C D 8 C D 4 C D 5 7 C C R 5 C D 2 7 C C R 7 C D 1 2 7 0.00 0.02 0.04 Marker Impact P
Mixed Negative K I − 6 7 C D 2 8 C D 4 5 R O C D 8 C D 4 C D 5 7 C C R 5 C D 2 7 C C R 7 C D 1 2 7 0.000 0.010 0.020 Phenotype Name K I − 6 7 C D 2 8 C D 4 5 R O C D 8 C D 4 C D 5 7 C C R 5 C D 2 7 C C R 7 C D 1 2 7 0.00 0.02 0.04
Phenotype p-value p-value CI adjusted Cell p-value Frequency 1 KI-67+CD4-CCR5+CD127- 1.7e-10 (0.0e+00, 1.0e-05) 1.7e-08 0.00704 2 CD45RO-CD8+CD4- CD57+CCR5-CD27+CCR7- CD127- 1.2e-07 (0.0e+00, 7.7e-05) 1.3e-05 0.00068 3 CD28-CD45RO+CD4- CD57-CD27-CD127- 6.5e-08 (2.2e-16, 1.9e-05) 6.5e-06 0.02456 12 / 24
Phenotype p-value p-value CI adjusted Cell p-value Frequency 1 KI-67+CD4-CCR5+CD127- 1.7e-10 (0.0e+00, 1.0e-05) 1.7e-08 0.00704 2 CD45RO-CD8+CD4- CD57+CCR5-CD27+CCR7- CD127- 1.2e-07 (0.0e+00, 7.7e-05) 1.3e-05 0.00068 3 CD28-CD45RO+CD4- CD57-CD27-CD127- 6.5e-08 (2.2e-16, 1.9e-05) 6.5e-06 0.02456
KI− 67+ CD4− CCR5+ CD127− KI− 67+ CD4− CD127− KI− 67+ CD127− KI− 67+ 2 4 6 8 10
13 / 24
KI− 67+ CD28− CD4− CD57− CD27− CD127− KI− 67+ CD127− KI− 67+ CD45RO+ CD4− CD57− CD127− 20 40 60 80 Bootstrapp percentage CD28− CD45RO− CD4− CD57+ CCR5− CD27+ CCR7− CD127− CD28− CD45RO− CD4− CD57+ CCR5− CD27+ CD127− CD28− CD45RO− CD8+ CD57+ CCR5− CD27+ CCR7− CD127− CD45RO− CD4− CD57+ CCR5− CD27+ CCR7− CD127− CD45RO− CD8+ CD4− CD57+ CCR5− CD27+ CCR7− CD127− CD45RO− CD8+ CD57+ CCR5− CD27+ CCR7− CD127− 10 20 30 40 Bootstrapp percentage CD28− CD4− CD57− CD28− CD57− CD27− CD127− CD28− CD57− CD127− CD28− CD45RO+ CD4− CD57− CD27− CD127− CD28− CD45RO+ CD4− CD57− CD127− CD28− CD45RO+ CD4− CD57− CCR5+ CD27− CD127− CD28− CD45RO+ CD4− CD57− CCR5+ CD127− CD28− CD45RO+ CD57− CD27− CD127− CD28− CD45RO+ CD57− CD28− CD45RO+ CD8+ CD4− CD57− CD27− CD127− CD28− CD45RO+ CD8+ CD4− CD57− CD127− CD45RO+ CD57− CD27− CD127− 10 20 30 40 Bootstrapp percentage Group 1 Group 2 Group 3
14 / 24
Phenotypes Phenotypes KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 0.00 0.02 0.04 Marker Impact Positive Mixed Negative KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 0.000 0.010 0.020
KI−67+CD4−CCR5+CD127− KI−67+CD4−CD127− KI−67+CD127− KI−67+
2 4 6 8 0.0 0.5 0.9 1.4 1.9 −log10(pvalue)
P−value %Cell Freq. CD28−CD45RO+CD4−CD57−CD27−CD127− CD28−CD45RO+CD57−CD27−CD127− CD28−CD45RO+CD57−CD127− CD28−CD45RO+CD57− CD28−CD57− CD28−
1 2 3 4 5 6 7 4 9 13 22 30 % Cell Frequency Phenotype Name
Lowest (371/86%) Highest (59/14%)
p < 8.6e−13 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0 Event−free Proportion Years from Cell Sample
Lowest (387/90%) Highest (43/10%)
p < 1.8e−06 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0
Lowest (356/83%) Highest (74/17%)
p < 4.6e−10 5 10 15 0.0 0.2 0.4 0.6 0.8 1.0 CD28 CD45RO Negative Positive Neutral Positive
(A)Population Identification (C)Grouping (D)Marker Selection (E)Marker Elimination (F)Kaplan-Meier Curves
Phenotype Name
(B)Statistical Modeling
Cox Proportional Hazards Regression Sensitivity Analysis Multiple Testing Correction
1 2 3
Phenotype Groups:
CD45RO−CD4−CD57+CCR5−CD27+CCR7−CD127− CD4−CD57+CCR5−CD27+CCR7−CD127− CD57+CCR5−CD27+CCR7−CD127− CD57+CD27+CCR7−CD127− CD57+CD27+CD127− CD57+CD27+ CD27+
1 2 3 4 5 6 10 21 31 41 51 62 KI−67 CD28 CD45RO CD8 CD4 CD57 CCR5 CD27 CCR7 CD127 0.00 0.02 0.04
0.2 0.6 1 1.5 3 Density Color Key and Density Plot
Neutral Negative
15 / 24
16 / 24
17 / 24
1 Use a mathematical model and bagging to score the
2 Use the selected phenotypes to construct a linear model (L1
3 Perform cross-validation. 4 Perform hold-out validation. 5 Label the test-set. 18 / 24
Algorithms F− measures 0.75 0.80 0.85 0.90 0.95 1.00 flowCore− flowStats flowType− FeaLect Kmeanssvm PRAMS SPADE SWIFT PBSC PramSpheres flowType
19 / 24
Algorithms F− measures 0.70 0.75 0.80 0.85 0.90 0.95 1.00 flowPeakssvm flowType− FeaLect SPADE 2DhistsSVM EMMIXCYTOM flowType RandomSpheres flowBin PBSC
20 / 24
Algorithms F− measures SWIFT flowType PBSC 2DhistsSVM PramSpheres flowType− FeaLect flowBin 0.0 0.1 0.2 0.3 0.4 0.5 0.6
21 / 24
22 / 24
400 600 800 1000 100 200 300 400 FS Lin SS Log
Normal
400 600 800 1000 100 200 300 400 FS Lin SS Log
AML
400 600 800 1000 100 200 300 400 FS Lin SS Log
Outlier 23 / 24
24 / 24
BCCA Ryan Brinkman, Habil Zare, Kieran O’Neill, and Adrin Jalali. UBC Holger Hoos NIH/USMIL Mario Roederer, Pratip Chattopadhyay, Anurdha Ganesan Funding NIAID Intramural Research Program; NIH/NIBIB grant EB008400; an NSERC discovery grant held by HHH; National Cancer Institute; NIH (contract HSN261200800001E); Military Infectious Disease Research Program, US Army Medical Research and Materiel Command; Infectious Disease Clinical Research Program; Uniformed Services University of the Health Sciences. Computing Resources Western Canada Research Grid (WestGrid), Compute/Calcul Canada, and Canada’s Michael Smith Genome Sciences Center. 25 / 24