Microarray Data Integration and Machine Learning Techniques For Lung Cancer Survival Prediction
Daniel Berrar, Brian Sturgeon, Ian Bradbury,
- C. Stephen Downes, Werner Dubitzky
Microarray Data Integration and Machine Learning Techniques For - - PowerPoint PPT Presentation
Microarray Data Integration and Machine Learning Techniques For Lung Cancer Survival Prediction Daniel Berrar , Brian Sturgeon, Ian Bradbury, C. Stephen Downes, Werner Dubitzky November 14, 2003 Outline Summary of Results (1 slide)
– Integration of lung cancer microarray data of Harvard and Michigan data set and data pre-processing;
– (a) Prediction of 5-year survival of patients based on (1) patient data, (2) expression data, and (3) both patient and expression data; – (b) Comparison of 6 machine learning methods;
– Prediction of survival time using (1) patient data, (2) expression data, and (3) both patient and expression data;
– Biological interpretation of identified genes.
1 96
...
Learning Set 1 40
...
Test Set 1 96
...
Learning Set 1 40
...
Test Set 1 96
...
Learning Set 1 40
...
Test Set
Model Model Model
148 1 148
...
Learning Set 1 63
...
Test Set 1 148
...
Learning Set 1 63
...
Test Set 1
...
Learning Set 1 63
...
Test Set
CART CART CART
– Integration of lung cancer microarray data of Harvard and Michigan data set;
– (a) Prediction of 5-year survival of patients based on (1) patient data, (2) expression data, and (3) both patient and expression data; – (b) Comparison of 6 machine learning methods;
– Prediction of survival time using (1) patient data, (2) expression data, and (3) both patient and expression data;
– Biological interpretation of identified genes.
(1) k -nearest neighbour (k-NN) (2) Decision Tree C5.0 (3) Boosted Decision Trees (4) Support Vector Machines (SVMs) (5) Artificial Neural Networks (Multilayer Perceptrons, MLPs) (6) Probabilistic Neural Networks (PNNs)
(1) Classification and Regression Tree (CART)
and resampling of the data set.
*Lee Y., Lee C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioionformatics 19(9), pp. 1132-1139, (2003).
– Integration of lung cancer microarray data of Harvard and Michigan data set;
– (a) Prediction of 5-year survival of patients based on (1) patient data, (2) expression data, and (3) both patient and expression data; – (b) Comparison of 6 machine learning methods;
– Prediction of survival time using (1) patient data, (2) expression data, and (3) both patient and expression data;
– Biological interpretation of identified genes.
10.9
– Integration of lung cancer microarray data of Harvard and Michigan data set;
– (a) Prediction of 5-year survival of patients based on (1) patient data, (2) expression data, and (3) both patient and expression data; – (b) Comparison of 6 machine learning methods;
– Prediction of survival time using (1) patient data, (2) expression data, and (3) both patient and expression data;
– Biological interpretation of identified genes.
Case # Similarity (= 1 − distance) Normed Class 27 0.0921 0.35795
0.0833 0.32375 34 0.0819 0.31831
0.31831 = 0.67626
Confidence for class :
0.32375
– Construct optimal separating hyperplane by maximizing the margin
SVM1 SVM2 SVM
3
SVM2 SVM1 SVM3
Fruits on this site of the hyperplane, given by SVM , cannot be bananas.
1
– pk: prior probability that case belongs to class k – ck: costs associated with a case of this class being misclassified – fk: estimated density of class k
pi ci fi (z) > pj cj fj (z)
x1 x2x3 x4 x 5 x6 x 7 y1 y2 y 3 y4y5y6y7
ˆ
X
f ˆ
Y
f
z
∑
x1 x2 x3 x5 x6 x7 y1 y2 y3 y4 y5 y6 y7
∑
O
X,Y
IX,Y ? ˆ( |?) p X ˆ( |?) p Y
X Y
x4