constrained mixture estimation for
play

Constrained Mixture Estimation for Constrained Mixture Estimation - PowerPoint PPT Presentation

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust Classification of for Analysis and Robust Classification Clinical Time Series of Clinical Time Series Alexander Schnhuth (joint work with Ivan Costa,


  1. Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust Classification of for Analysis and Robust Classification Clinical Time Series of Clinical Time Series Alexander Schönhuth (joint work with Ivan Costa, Christoph Hafemeister and Alexander Schliep) Lab for Mathematical and Computational Biology Department of Mathematics UC Berkeley

  2. Multiple Sclerosis (MS) • Autoimmune disease – leads to neuronal disability – multiple genetic causes – Prevalence: 266,000 (U.S.) • Treatment with IFN β – stops disease progression – works only for half of the patients

  3. Personalized Medicine • Treatment selection according to patient genetics • Machine learning methods to classify response to treatments • Challenges: – dimensionality: more features (genes) than observations (patients) – gene expression : noise and missing data – patient classification: subjective and error prone

  4. Treatment Response Classification • Clinical Time Series (Baranzini et al., 2005) – 52 MS Patients after IFN β treatment – Good and bad responders – Expression of 70 genes over 7 time points • Classification method (IBIS) – uses only first time point – 75% accuracy Baranzini,S.E. et al. (2005) Transcription-based prediction of response to ifnbeta using supervised computational methods. PLoS Biol, 3 , e2.

  5. Caveats • Temporal information relevant – patients have individual response time (Lin et. al 2008) • MS has multiple genetic causes – response groups may display heterogeneous expression patterns • Expert classification can be wrong Lin, T. H. et al. (2008). Alignment and classification of time series gene expression in clinical studies. Bioinformatics, 24(13), i147–i155.

  6. Our Approach • Mixture Model based classification – Mixture Estimation with constraints (semi-supervised) • explore sub-groups within classes • robustness to wrong labels • Models: linear HMMs – align time courses with respect to patient response time – support missing value handling and robust w.r.t. noise

  7. Our Approach • Mixture Model based classification – Mixture Estimation with constraints (semi-supervised) • explore sub-groups within classes • robustness to wrong labels • Models: linear HMMs – align time courses with respect to patient response time – support missing value handling and robust w.r.t. noise

  8. Patient Response Classification Gene 2 Gene 1 good responder bad responder unknown

  9. Patient Response Classification Gene 2 Gene 2 Gene 1 Gene 1 good responder bad responder unknown

  10. Patient Response Classification Gene 2 Gene 2 Gene 1 Gene 1 good responder bad responder unknown

  11. Mixture Estimation with Constraints

  12. Mixture Estimation with Constraints negative constraints

  13. Mixture Estimation with Constraints negative constraints

  14. Mixture Estimation with Constraints negative constraints

  15. Mixture Estimation with Constraints negative constraints

  16. Mixture Estimation with Constraints negative constraints

  17. Our Approach • Mixture Model based classification – Mixture Estimation with constraints (semi-supervised) • explore sub-groups within classes • robustness to wrong labels • Models: linear HMMs – align time courses with respect to patient response time – mixtures as emissions: support missing value handling and robust w.r.t. noise

  18. Our Approach • Mixture Model based classification – Mixture Estimation with constraints (semi-supervised) • explore sub-groups within classes • robustness to wrong labels • Models: linear HMMs – align time courses with respect to patient response time – mixtures as emissions: support missing value handling and robust w.r.t. noise

  19. Robustness to Wrong Labels good responder bad responder unknown

  20. Robustness to Wrong Labels Potentially mislabelled good responder bad responder unknown

  21. Robustness to Wrong Labels Potentially mislabelled “misclassified“ good responder bad responder unknown

  22. Experiments • Comparison with – IBIS (Baranzini et al., 2005) – SVM Kalman (Borgwardt, et al., 2006) – HMM Discriminant Learning (Lin et al. 2008) • Experiments – 5 times 4-fold cross validation – linear HMM with 4 states – feature selection and number of sub-classes • based on training error Borgwardt, K. M., et al. (2006). Class prediction from time series gene expression profiles using dynamical systems kernel. Pacific Symposium on Biocomputing, 11, 547–558.

  23. Results Method Genes Test Acc. IBIS 3 75.00% HMM Disc 7 85.00% SVM Kal. 70 87.80% HMM Const 2. 17 89.62%* HMM Const 3. 17 90.39%* *Significantly higher than other methods (paired t-test)

  24. Results - Consensus Analysis All 5 x 4-fold classifications – HMM Const. 3 % co-classification % % % Monti, S et al. . (2003). Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1-2), 91–118.

  25. Results - Consensus Analysis All 5 x 4-fold classifications – HMM Const. 3 % co-classification % % % Monti, S et al. . (2003). Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1-2), 91–118.

  26. Results - Consensus Analysis All 5 x 4-fold classifications – HMM Const. 3 % co-classification % % % sub-group1 sub-group2 Monti, S et al. . (2003). Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1-2), 91–118.

  27. Results – Selected Genes

  28. Results – Selected Genes

  29. Results – Selected Genes

  30. Results – Selected Genes

  31. Results – Selected Genes

  32. Results – Selected Genes

  33. Results – Selected Genes

  34. Conclusion • Increase in classification accuracy – robustness to mislabeled patients – detection of sub-classes • MS Treatment Classification – mislabeled sample was confirmed – sub-classes of good responders can have clinical implications – selected relevant MS genes as features

  35. Acknowledgements Software: • Benjamin Georgi • GHMM – www.ghmm.org Max Planck Institute for Molecular Genetics • PyMix - algorithmics.molgen.mpg.de • GQL – www.ghmm.org/gql (soon) • Katrin Höfl, Peter van den Elzen Funding: Department of Pathology and Laboratory Medicine, • PIMS Fellowship University of British • CAPES (Prodoc Fellowship) Columbia, Vancouver • Sergio Baranzini • FACEPE Department of Neurology, UCSF

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend