automated high dimensional cytometric data analysis
play

Automated High-dimensional Cytometric Data Analysis Cytometric Data - PowerPoint PPT Presentation

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Womens Hospital Assistant Professor of Neurology Harvard


  1. Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Women’s Hospital Assistant Professor of Neurology Harvard Medical School

  2. Challenges in cytometric analysis Challenges in cytometric analysis • • Large amount of high dimensional data Large amount of high dimensional data • Manual data processing (subjective, slow) • Not suitable for high-throughput study g g p y • Difficult to use in inferential analysis – “hypothesis limited” • • Sub optimal usage of data dimensions Sub-optimal usage of data dimensions - Increasingly multi-parametric - Restricted visualization Solution: Automated & Multivariate Analysis Automated & Multivariate Analysis 2

  3. FLAME FLow cytometry analysis with Automated y y y Multivariate Estimation • Clustering – parametric and multivariate mixture l i i d l i i i modeling of the populations in each flow sample • Meta clustering • Meta-clustering – match the corresponding match the corresponding populations from multiple samples to compare features of these matched populations • Feature selection – identify features that distinguish populations between different classes (such as normal vs. disease, wt vs. mutant, (suc as o a s. d sease, t s. uta t, longitudinal observations, etc.) • Classification – predict class membership for new samples based on those distinctive features l b d h di i i f

  4. 2. Meta-clustering FLAME summary FLAME summary 1. Clustering flow data Sample 1 3. Feature Selection class1 class2 class3 • Frequencies • Locations • Means • Modes Sample 2 • Variances • Scales • Orientations • Shapes Downstream Analyses Sample 3 • Visualization • Class Discovery Cl Di • Class Prediction • Etc.

  5. FLAME Methodology Methodology

  6. Concept: Finite Mixture Model Finite Mixture Model : weighted sum of g univariate or multivariate densities Univariate Gaussian mixture Univariate Gaussian mixture Bivariate Gaussian mixture Bivariate Gaussian mixture w 1 =0.5 w 2 =0.5 Fitted curve curve µ 1 µ 2 µ 3 6 g=3 g=2 Sum of 3 Gaussians

  7. Different distributions Skew N N Skew

  8. Model Selection options in FLAME p Skew N N Skew

  9. Step 1: Fitting a distribution Step 1: Fitting a distribution • Lymphoblastic cell line 9

  10. Fitting skew t deals with asymmetry g y y Gaussian Skew Skew Asymmetric Data Asymmetric Data Distribution Distribution Density Plot

  11. Step 2: Meta-clustering Step 2: Meta clustering 1. Input: Individual samples clustered by mixture model 2. Take all samples and pool their cluster locations T k ll l d l th i l t l ti 3. 3. Algorithm: Run Partitioning Around Medoids (PAM) to go t : u a t t o g ou d edo ds ( ) to obtain k meta-clusters 4. Output: Matched features used for classification of samples 1 1

  12. Example 2: Identifying discriminating features •Experiment: examine ZAP70 and SLP76 phosphorylation events before p p y and after T cell receptor activation in naïve and memory T cells •Lymphocytes stained with four Lymphocytes stained with four markers: •CD4 •CD45RA •ZAP70Y292 •SLP76Y128 •SLP76Y128 •60 samples: 30 subjects x two time points: pre- and post- anti-CD3 antibody stimulation ib d i l i

  13. Registering populations across samples Pre ‐ stimulation samples p Post ‐ stimulation samples p

  14. a. c. CD45RA CD45RA C C e. Sample 121106A_0min CD45RA b. b d d. 5RA RA CD45R CD45 Pre-stimulation P Post-stimulation t ti l ti Sample 121106A_5min

  15. Step 3: Discriminating features Pre-stimulation Post-stimulation zero minute zero-minute five minute five-minute

  16. Discriminating features ∆ mean Feature [five- feature name Type Cluster # Dimension(s) min] p-value III vars11.4 Variance 4 1 -0.156 1.65E-18 IV orientation 72 Orientation 5 3 -0.649 1.01E-14 orientation 56 Orientation 4 3 -0.609 1.13E-12 vars11.5 Variance 5 1 -0.082 4.00E-08 orientation 66 Orientation 5 1 -0.515 1.37E-05 shape 11 Shape 3 3 -0.175 2.62E-08 scale4 Scale 4 NA -0.052 3.32E-06 II II orientation 19 Orientation 2 1 -0.632 1.34E-06 shape 8 Shape 2 2 -0.141 4.41E-09 shape 15 Shape 4 4 -0.178 5.17E-07 vars41.5 Variance 5 1,4 -0.024 2.63E-05 orientation 42 Orientation 3 3 -0.422 9.73E-04 shape 20 Shape 5 4 -0.060 7.93E-05 scale5 Scale 5 NA -0.038 7.23E-04 vars43.3 Variance 3 3,4 -0.020 7.10E-04 vars31.4 Variance 4 1,3 -0.015 3.34E-03 I I vars11.3 Variance 3 1 0.314 6.22E-12 CD45RA A orientation 52 Orientation 4 1 0.552 1.87E-10 V vars21.2 Variance 2 1,2 0.251 1.22E-10 vars21.3 Variance 3 1,2 0.259 1.14E-11 vars21.4 Variance 4 1,2 0.060 3.42E-08 orientation 20 Orientation 2 1 0.504 2.17E-11 shape 10 Shape 3 3 0.740 1.31E-09 shape 7 Shape 2 2 0.682 4.49E-16 shape 13 Shape 4 4 1.023 4.37E-09 mus1.4 Mean 4 1 1.761 6.13E-22 orientation 54 orientation 54 Orientation Orientation 4 4 2 2 0.534 0 534 1 26E-08 1.26E 08 mus1.5 Mean 5 1 1.657 2.47E-21 vars22.2 Variance 2 2 0.282 5.45E-05 orientation 59 Orientation 4 3 0.548 1.51E-04 vars22.3 Variance 3 2 0.146 1.09E-05 orientation 47 Orientation 3 4 0.561 4.65E-05 orientation 43 Orientation 3 3 0.066 8.01E-05 orientation 70 Orientation 5 2 0.308 4.07E-03 scale3 Scale 3 NA 0.063 2.19E-04 mus1.2 Mean 2 1 1.571 1.52E-18 vars11 2 vars11.2 Variance Variance 2 2 1 1 0 131 0.131 1 62E 04 1.62E-04 vars22.5 Variance 5 2 0.023 2.65E-04

  17. Example 3: Identifying a rare cell population p y g p p Regulatory T cells occur as a less Than 0 5 1 0% population in human Than 0.5-1.0% population in human peripheral blood mononuclear cells 3-PE -PE Foxp3- Foxp3 1 7 Baecher-Allan et al., JI , 2006

  18. Stepwise detection of Tregs Stepwise detection of Tregs Step 1 Step 2

  19. Overview Operator/QC p /Q FLAME

  20. FLAME • Automated analysis method • Deconstructs the components of a mixture of cells • Cross-registers cell clusters across samples C i ll l l • Provides a specific record of analysis parameters allowing exact replication of an parameters, allowing exact replication of an analysis by a third party • Operator Modes: Ope ato odes – Cell population discovery mode – Clinical trial mode

  21. Availability • Free software • Available through the GenePattern toolkit on the Broad Institute website – http://www.broadinstitute.org/cancer/software/genepattern/index.htm http://www broadinstitute org/cancer/software/genepattern/index htm l • GenePattern – an environment with pipelining capabilities and a repertoire of downstream analysis tools and a repertoire of downstream analysis tools • Pyne et al. Proc Natl Acad Sci USA 2009; 106: 8519-8524.

  22. Acknowledgements • De Jager lab • Jill Mesirov – Cristin Aubin Cristin Aubin – Saumyadipta Pyne Saumyadipta Pyne – Aaron Brandes – Pablo Tamayo – Becky Briskin – Lori Chibnik • Geoff McLachlan – Portia Chipendo • Kui Wang – Xinli Hu – Linda Ottoboni • David Hafler – Nikolaos Patsopoulos – Clare Baecher-Allan – Joshua Shulman – Lisa Maier – Dong Tran – Irene Wood Irene Wood – Zongqi Xia Funding Sources • • National MS Society National MS Society • NIH: NIA, NINDS

  23. Illustrative Examples Illustrative Examples

  24. E xample 3: Feature selection – Phosphorylation of naïve & memory T cells pre- and post-stimulation 4-dimensional samples Mixture modeling Mixture modeling (ZAP70Y292 (ZAP70Y292 not shown) t h ) 5RA CD45 2 CD4 4

  25. P hosphorylation causes feature alterations in populations lt ti i l ti 5 min. 0 min. Pre-stimulation Post-stimulation

  26. M atching pre- and post- stimulation populations ti l ti l ti 2 pre-stimulation post-stimulation 6

  27. M atching pre- and post- stimulation populations across all samples populations across all samples

  28. F eature Selection Heatmap Zero-minutes Five-minutes 2 8

  29. D t QC/ t Data QC/standardization d di ti • Carefully selected panels • Carefully selected panels • Minimal cross ‐ sample variation

  30. FLow analysis with Automated Multivariate Estimation 10/01/2008 10/01/2008

  31. L ow dimension t - mixture mixture Outliers ? Low dimension clustering is not good enough

  32. M lti M ultivariate t -mixture is better i t t i t i b tt ? 3 Symmetric density is often not good enough 2

  33. M odeling with skewed distributions distributions Better fit with skew Sk Skew N Skew-normal distribution 3 Photo courtesy: Azzalini J.M. et al. Statistical applications of the multivariate skew-normal distribution, 1999. 3

  34. parametric mixture modeling •A biological population is assumed to follow a mathematical distribution, such as Gaussian •Each population can be abstracted as a cluster described by parameters such •Each population can be abstracted as a cluster , described by parameters, such as mean, mode, standard deviation, and skew, etc. •A mixture of populations can be abstracted as a mixture of distributions

  35. Modeling with Gaussian G Gaussian may be too “skinny” to i b t “ ki ” t capture the entire population

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend