Constrained Mixture Estimation for Constrained Mixture Estimation - - PowerPoint PPT Presentation

constrained mixture estimation for
SMART_READER_LITE
LIVE PREVIEW

Constrained Mixture Estimation for Constrained Mixture Estimation - - PowerPoint PPT Presentation

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust Classification of for Analysis and Robust Classification Clinical Time Series of Clinical Time Series Alexander Schnhuth (joint work with Ivan Costa,


slide-1
SLIDE 1

Constrained Mixture Estimation for Analysis and Robust Classification of Clinical Time Series

Alexander Schönhuth

(joint work with Ivan Costa, Christoph Hafemeister and Alexander Schliep)

Lab for Mathematical and Computational Biology Department of Mathematics UC Berkeley

Constrained Mixture Estimation for Analysis and Robust Classification

  • f Clinical Time Series
slide-2
SLIDE 2

Multiple Sclerosis (MS)

  • Autoimmune disease

– leads to neuronal disability – multiple genetic causes – Prevalence: 266,000 (U.S.)

  • Treatment with IFNβ

– stops disease progression – works only for half of the patients

slide-3
SLIDE 3

Personalized Medicine

  • Treatment selection according to patient

genetics

  • Machine learning methods to classify

response to treatments

  • Challenges:

– dimensionality: more features (genes) than

  • bservations (patients)

– gene expression : noise and missing data – patient classification: subjective and error prone

slide-4
SLIDE 4

Treatment Response Classification

  • Clinical Time Series (Baranzini et al., 2005)

– 52 MS Patients after IFNβ treatment – Good and bad responders – Expression of 70 genes over 7 time points

  • Classification method (IBIS)

– uses only first time point – 75% accuracy

Baranzini,S.E. et al. (2005) Transcription-based prediction of response to ifnbeta using supervised computational methods. PLoS Biol, 3, e2.

slide-5
SLIDE 5

Caveats

  • Temporal information relevant

– patients have individual response time (Lin et. al

2008)

  • MS has multiple genetic causes

– response groups may display heterogeneous expression patterns

  • Expert classification can be wrong

Lin, T. H. et al. (2008). Alignment and classification of time series gene expression in clinical studies. Bioinformatics, 24(13), i147–i155.

slide-6
SLIDE 6

Our Approach

  • Mixture Model based classification

– Mixture Estimation with constraints (semi-supervised)

  • explore sub-groups within classes
  • robustness to wrong labels
  • Models: linear HMMs

– align time courses with respect to patient response time – support missing value handling and robust w.r.t. noise

slide-7
SLIDE 7

Our Approach

  • Mixture Model based classification

– Mixture Estimation with constraints (semi-supervised)

  • explore sub-groups within classes
  • robustness to wrong labels
  • Models: linear HMMs

– align time courses with respect to patient response time – support missing value handling and robust w.r.t. noise

slide-8
SLIDE 8

Patient Response Classification

good responder bad responder unknown

Gene 1 Gene 2

slide-9
SLIDE 9

Patient Response Classification

good responder bad responder unknown

Gene 1 Gene 2 Gene 1 Gene 2

slide-10
SLIDE 10

Patient Response Classification

good responder bad responder unknown

Gene 1 Gene 2 Gene 1 Gene 2

slide-11
SLIDE 11

Mixture Estimation with Constraints

slide-12
SLIDE 12

Mixture Estimation with Constraints

constraints negative

slide-13
SLIDE 13

Mixture Estimation with Constraints

constraints negative

slide-14
SLIDE 14

Mixture Estimation with Constraints

constraints negative

slide-15
SLIDE 15

Mixture Estimation with Constraints

constraints negative

slide-16
SLIDE 16

Mixture Estimation with Constraints

constraints negative

slide-17
SLIDE 17

Our Approach

  • Mixture Model based classification

– Mixture Estimation with constraints (semi-supervised)

  • explore sub-groups within classes
  • robustness to wrong labels
  • Models: linear HMMs

– align time courses with respect to patient response time – mixtures as emissions: support missing value handling and robust w.r.t. noise

slide-18
SLIDE 18

Our Approach

  • Mixture Model based classification

– Mixture Estimation with constraints (semi-supervised)

  • explore sub-groups within classes
  • robustness to wrong labels
  • Models: linear HMMs

– align time courses with respect to patient response time – mixtures as emissions: support missing value handling and robust w.r.t. noise

slide-19
SLIDE 19

Robustness to Wrong Labels

good responder bad responder unknown

slide-20
SLIDE 20

Robustness to Wrong Labels

good responder bad responder unknown

Potentially mislabelled

slide-21
SLIDE 21

Robustness to Wrong Labels

good responder bad responder unknown

Potentially mislabelled “misclassified“

slide-22
SLIDE 22

Experiments

  • Comparison with

– IBIS (Baranzini et al., 2005) – SVM Kalman (Borgwardt, et al., 2006) – HMM Discriminant Learning (Lin et al. 2008)

  • Experiments

– 5 times 4-fold cross validation – linear HMM with 4 states – feature selection and number of sub-classes

  • based on training error

Borgwardt, K. M., et al. (2006). Class prediction from time series gene expression profiles using dynamical systems kernel. Pacific Symposium on Biocomputing, 11, 547–558.

slide-23
SLIDE 23

Results

Method Genes Test Acc. IBIS 3 75.00% HMM Disc 7 85.00% SVM Kal. 70 87.80% HMM Const 2. 17 89.62%* HMM Const 3. 17 90.39%*

*Significantly higher than other methods (paired t-test)

slide-24
SLIDE 24

Results - Consensus Analysis

All 5 x 4-fold classifications – HMM Const. 3

Monti, S et al. . (2003). Consensus clustering: A resampling-based method for class discovery and visualization

  • f gene expression microarray data. Machine Learning, 52(1-2), 91–118.

% % %

% co-classification

slide-25
SLIDE 25

Results - Consensus Analysis

All 5 x 4-fold classifications – HMM Const. 3

Monti, S et al. . (2003). Consensus clustering: A resampling-based method for class discovery and visualization

  • f gene expression microarray data. Machine Learning, 52(1-2), 91–118.

% % %

% co-classification

slide-26
SLIDE 26

Results - Consensus Analysis

All 5 x 4-fold classifications – HMM Const. 3

sub-group1 sub-group2

Monti, S et al. . (2003). Consensus clustering: A resampling-based method for class discovery and visualization

  • f gene expression microarray data. Machine Learning, 52(1-2), 91–118.

% % %

% co-classification

slide-27
SLIDE 27

Results – Selected Genes

slide-28
SLIDE 28

Results – Selected Genes

slide-29
SLIDE 29

Results – Selected Genes

slide-30
SLIDE 30

Results – Selected Genes

slide-31
SLIDE 31

Results – Selected Genes

slide-32
SLIDE 32

Results – Selected Genes

slide-33
SLIDE 33

Results – Selected Genes

slide-34
SLIDE 34

Conclusion

  • Increase in classification accuracy

– robustness to mislabeled patients – detection of sub-classes

  • MS Treatment Classification

– mislabeled sample was confirmed – sub-classes of good responders can have clinical implications – selected relevant MS genes as features

slide-35
SLIDE 35

Acknowledgements

  • Benjamin Georgi

Max Planck Institute for Molecular Genetics

  • Katrin Höfl, Peter van

den Elzen

Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver

  • Sergio Baranzini

Department of Neurology, UCSF

Software:

  • GHMM – www.ghmm.org
  • PyMix - algorithmics.molgen.mpg.de
  • GQL – www.ghmm.org/gql (soon)

Funding:

  • PIMS Fellowship
  • CAPES (Prodoc Fellowship)
  • FACEPE