Use of Microarray Data via Model- Based Classification in the Study - PowerPoint PPT Presentation

Use of Microarray Data via Model- Based Classification in the Study and Prediction of Survival from Lung Cancer Liat Jones * , Angus Ng * , Chris Ambroise ** , Katrina Monico* and Geoff McLachlan * * Institute for Molecular Bioscience ** Laboratoire Heudiasyc University of Queensland

AIM : To link gene-expression data with survival from lung cancer A CLUSTER ANALYSIS We apply a model-based clustering approach to classify tumor tissues on the basis of microarray gene expression. B SURVIVAL ANALYSIS The association between the clusters so formed and patient survival (recurrence) times is established. C DISCRIMINANT ANALYSIS We demonstrate the potential of the clustering-based prognosis as a predictor of the outcome of disease.

STANFORD and ONTARIO DATASETS : cDNA microarrays were used to obtain gene expression profiles for the tissue (tumor) samples . STANFORD: 918 genes ONTARIO : 2880 genes The Stanford Dataset contains relatively more adenocarcinoma (AC) samples, and the Ontario Dataset contains only non-small cell lung carcinomas (NSCLC).

Tumor Types in Stanford and Ontario Datasets Tumor Type Number of Samples Stanford Ontario Adenocarcinoma 41 19 Squamous cell 16 14 Large cell 5 4 Adenosquamous 0 1 Carcinoid 0 1 Small Cell 5 0 TOTAL 67 39

Heat Map for 2880 Ontario Genes (39 Tissues) Genes Tissues

CLUSTERING OF ONTARIO TUMORS Using EMMIX-GENE Steps used in the application of EMMIX-GENE: • Select the most relevant genes from this filtered set of 2,880 genes. The set of retained genes is thus reduced to 766. • Cluster these 766 genes into twenty groups. The majority of gene groups produced were reasonably cohesive and distinct. • Using these twenty group means, cluster the tissue samples into two groups using a mixture of normal components/factor analyzers.

Heat Maps for the 20 Ontario Gene-Groups (39 Tissues) Genes Tissues Tissues are ordered as: Recurrence (1-24) and Censored (25-39)

Expression Profiles for Useful Metagenes (Ontario 39 Tissues) Gene Group 1 Gene Group 2 Our Tissue Cluster 1 Our Tissue Cluster 2 Log Expression Value Recurrence (1-24) Censored (25-39) Gene Group 19 Gene Group 20 Tissues

Expression Profiles of some Genes Identified in Ontario Cluster A Clusters B and C (down Rec, up Censored) (up Rec, down Censored) Log Expression Value PNUTL1 ATM Recurrence (1-24) Censored (25-39) Recurrence (1-24) Censored (25-39) FUS HIF1A Wee1 RABIF Tissues

Only ZNF136 is retained by us and also identified in Ontario Log Expression Value Tissues Recurrence (1-24) Censored (25-39) It is found in our Group 19 (up-regulated in recurrence).

Tissue Clusters Tumors 1-24 belong to RECURRENCE group Tumors 25-39 are CENSORED CLUSTER ANALYSIS via EMMIX-GENE of 20 METAGENES yields TWO CLUSTERS: CLUSTER 1: 1-14, 16-24 (recurrence) plus 25-29, 33, 36, 38 (censored) CLUSTER 2: 15 (recurrence) plus 30-32, 34, 35, 37, 39 (censored)

SURVIVAL ANALYSIS: LONG-TERM SURVIVOR (LTS) MODEL = > S ( t ) . { T t } prob = π + π S ( t ) 1 1 2 where T is time to recurrence and π 1 = 1- π 2 is the prior prob. of recurrence. Adopt Weibull model for the survival function for recurrence S 1 (t).

Fitted LTS Model vs. Kaplan-Meier

PCA of Tissues Based on Metagenes Second PC First PC

PCA of Tissues Based on All Genes (via SVD) Second PC First PC

Cluster-Specific Kaplan-Meier Plots

Survival Analysis for Ontario Dataset • Nonparametric analysis: Mean time to Failure ( ± SE) Cluster No. of Tissues No. of Censored 665 ± 85.9 1 29 8 1388 ± 155.7 2 8 7 A significant difference between Kaplan-Meier estimates for the two clusters ( P = 0.027). • Cox’s proportional hazards analysis: Variable Hazard ratio (95% CI) P -value Cluster 1 vs. Cluster 2 6.78 (0.9 – 51.5) 0.06 Tumor stage (I vs. II&III) 1.07 (0.57 – 2.0) 0.83

Discriminant Analysis (Supervised Classification) A prognosis classifier was developed to predict the class of origin of a tumor tissue with a small error rate after correction for the selection bias. A support vector machine (SVM) was adopted to identify important genes that play a key role on predicting the clinical outcome, using all the genes, and the metagenes. A cross-validation (CV) procedure was used to calculate the prediction error, after correction for the selection bias.

ONTARIO DATA (39 tissues): Support Vector Machine (SVM) with Recursive Feature Elimination (RFE) 0.12 0.1 Error Rate (CV10E) 0.08 0.06 0.04 0.02 0 0 2 4 6 8 10 12 log2 (number of genes) Ten-fold Cross-Validation Error Rate (CV10E) of Support Vector Machine (SVM). applied to g=2 clusters (G1: 1-14, 16- 29,33,36,38; G2: 15,30-32,34,35,37,39)

STANFORD DATA 918 genes based on 73 tissue samples from 67 patients. Row and column normalized, retained 451 genes after select-genes step. Used 20 metagenes to cluster tissues. Retrieved histological groups.

Heat Maps for the 20 Stanford Gene-Groups (73 Tissues) Genes Tissues Tissues are ordered by their histological classification: Adenocarcinoma (1-41), Fetal Lung (42), Large cell (43-47), Normal (48-52), Squamous cell (53-68), Small cell (69-73)

Reduced dataset of 35 Adenocarcinoma (AC) Tissues Full dataset had 41 AC tissues. According to our cluster analysis: AC tissues 5, 16, 26 are put with LCLC 7, 29 are put with SCLC 40 is put with SCC. Also, we did not add tissues 43 (LCLC) nor 68 (SCC) (as done in the Stanford study), as they were both assigned to the LCLC cluster. This left 35 AC tissues with 918 genes, reduced to 219 genes, which were clustered into 15 groups (metagenes).

STANFORD CLASSIFICATION: Cluster 1: 1-19 (good prognosis) Cluster 2: 20-26 (long-term survivors) Cluster 3: 27-35 (poor prognosis)

Heat Maps for the 15 Stanford Gene-Groups (35 Tissues) Genes Tissues Tissues are ordered by the Stanford classification into AC groups: AC group 1 (1-19), AC group 2 (20-26), AC group 3 (27-35)

Expression Profiles for Top Metagenes (Stanford 35 AC Tissues) Gene Group 1 Gene Group 2 Stanford AC group 1 Log Expression Value Stanford AC group 2 Stanford AC group 3 Misallocated Gene Group 4 Gene Group 3 Tissues

Which Genes make up the top 4 Metagenes ? Group 1 ( 22 genes ) includes: Group 2 ( 12 genes ) includes: ESTs Hs.11607 ornithine decarboxylase ataxia-telangiectasia group D-associated protein carbonyl reductase ( metabolic enzyme ) solute carrier family 7, member 5 (CD98) vascular endothelial growth factor C Marker Genes For Group 3 (Supervised) Marker Genes for Group 2 (Supervised) High in group 3, low in 1 and 2 (4/10 genes) High in group 2, low in 3 (1/8 genes) Group 4 ( 14 genes ) includes: Group 3 ( 16 genes ) includes: cartilage paired-class homeoprotein aldo-keto reductase family 1 tumor suppressor deleted in oral cancer-related 1 glutathione peroxidase thioredoxin reductase Metabolic Enzymes (Unsupervised) Marker Genes for Group 2 (Supervised) High in group 3, also SCC (3/6 genes) High in group 2, low in 3 (2/8 genes)

Some other interesting Metagenes Gene Group 7 Gene Group 9 Log Expression Value Tissues Group 7 ( 19 genes ) includes: Group 9 ( 22 genes ) includes: citron ICAM-1 (CD54) surfactant A1 collagen, type IX hepsin thyroid transcription factor Marker Genes For Group 1 (Supervised) Marker Genes For Group 1 (Supervised) High in group 1, low in 2 (1/ 9 genes) High in group 1, low in 2 (4/ 9 genes) Surfactant Proteins (Unsupervised) High in groups 1 and 2, low in 3

Cluster-Specific Kaplan-Meier Plots

STANFORD DATA: TWO-COMPONENT WEIBULL MIXTURE MODEL = π + π S ( t ) S ( t ) S ( t ), 1 1 2 2 where β = − α = S ( t ) exp ( t ) ( i 1 , 2 ). i i i

Plot of 1- and 2-component Weibull Mixture vs. Kaplan-Meier

Survival Analysis for Stanford Dataset • Kaplan-Meier estimation: Mean time to Failure ( ± SE) Cluster No. of Tissues No. of Censored 37.5 ± 5.0 1 17 10 5.2 ± 2.3 2 5 0 A significant difference in survival between clusters ( P < 0.001) • Cox’s proportional hazards analysis: Variable Hazard ratio (95% CI) P -value Cluster 3 vs. Clusters 1&2 13.2 (2.1 – 81.1) 0.005 Grade 3 vs. grades 1 or 2 1.94 (0.5 – 8.5) 0.38 Tumor size 0.96 (0.3 – 2.8) 0.93 No. of tumors in lymph nodes 1.65 (0.7 – 3.9) 0.25 Presence of metastases 4.41 (1.0 – 19.8) 0.05

Survival Analysis for Stanford Dataset • Univariate Cox’s proportional hazards analysis (metagenes): Metagene Coefficient (SE) P -value 1 1.37 (0.44) 0.002 2 -0.24 (0.31) 0.44 3 0.14 (0.34) 0.68 4 -1.01 (0.56) 0.07 5 0.66 (0.65) 0.31 6 -0.63 (0.50) 0.20 7 -0.68 (0.57) 0.24 8 0.75 (0.46) 0.10 9 -1.13 (0.50) 0.02 10 0.73 (0.39) 0.06 11 0.35 (0.50) 0.48 12 -0.55 (0.41) 0.18 13 -0.61 (0.48) 0.20 14 0.22 (0.36) 0.53 15 1.70 (0.92) 0.06

Use of Microarray Data via Model- Based Classification in the Study - PowerPoint PPT Presentation

Use of Microarray Data via Model- Based Classification in the Study and Prediction of Survival from Lung Cancer Liat Jones * , Angus Ng * , Chris Ambroise ** , Katrina Monico* and Geoff McLachlan * * Institute for Molecular Bioscience **

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Higher Dimensional Approach for Classification of Lung Cancer Microarray Data Nathan Palmer

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer

Conflicts between Optimality Criteria in Incomplete-Block Designs for Microarray Experiments R.

Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics

L15:Microarray analysis (Classification) November 09 Bafna Silly Quiz Social networking

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Classification, Dose-response Modelling, and the Evaluation of Biomarker in a Microarray Setting

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

From supernovae to neutron stars Yudai Suwa 1,2 1 Yukawa Institute for Theoretical Physics, Kyoto

HBV Testing for drug resistance and resistance interpretation Martin Dumer Institute of

Rule 4702 Internal Combustion Engines Joven Nazareno Senior AQ Engineer September 9, 2010

FIS POINTS HARD TO EXPLAIN AT THE BEST OF TIMES W HERE DO WE START ? W ITH THE FIS L IST @ FIS -

2020 Renault F1 Team Paddock Club Presentation Welcome The Formula One Paddock Club is located

Community & Public Services Committee: Mandate Community & Public Services Committee shall

NPS 1 st International Forum on the Decommissioning of the Fukushima J. C. Lentijo Daiichi NPS.

Public Hearing 2020-2021 School Budget May 7, 2020 1 Presentation Overview Revenue

Sambuz

Useful Links

Newsletter

Mail Us

Use of Microarray Data via Model- Based Classification in the Study - PowerPoint PPT Presentation

Use of Microarray Data via Model- Based Classification in the Study and Prediction of Survival from Lung Cancer Liat Jones * , Angus Ng * , Chris Ambroise ** , Katrina Monico* and Geoff McLachlan * * Institute for Molecular Bioscience **

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Higher Dimensional Approach for Classification of Lung Cancer Microarray Data Nathan Palmer

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer

Conflicts between Optimality Criteria in Incomplete-Block Designs for Microarray Experiments R.

Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics

L15:Microarray analysis (Classification) November 09 Bafna Silly Quiz Social networking

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Classification, Dose-response Modelling, and the Evaluation of Biomarker in a Microarray Setting

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

From supernovae to neutron stars Yudai Suwa 1,2 1 Yukawa Institute for Theoretical Physics, Kyoto

HBV Testing for drug resistance and resistance interpretation Martin Dumer Institute of

Rule 4702 Internal Combustion Engines Joven Nazareno Senior AQ Engineer September 9, 2010

FIS POINTS HARD TO EXPLAIN AT THE BEST OF TIMES W HERE DO WE START ? W ITH THE FIS L IST @ FIS -

2020 Renault F1 Team Paddock Club Presentation Welcome The Formula One Paddock Club is located

Community &amp; Public Services Committee: Mandate Community &amp; Public Services Committee shall

NPS 1 st International Forum on the Decommissioning of the Fukushima J. C. Lentijo Daiichi NPS.

Public Hearing 2020-2021 School Budget May 7, 2020 1 Presentation Overview Revenue

Sambuz

Useful Links

Newsletter

Mail Us

Community & Public Services Committee: Mandate Community & Public Services Committee shall