Neural Network Classifiers and Gene Selection Methods for - PowerPoint PPT Presentation

Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung Adenocarcinoma Gaolin Zheng School of Computer Science Florida International University, Miami, FL E.O. George Department of Mathematical Sciences, University of Memphis G. Narasimhan School of Computer Science, Florida International University

Our Work � Building classifiers to predict tumor stage based on gene expression data. � Comparative study of neural network classifiers. � Comparative study of gene selection methods. � Explore data integration. Zheng et al. 2

Data Set 1 (Michigan) Normal Stage 1 Tumor Stage 3 Tumor Lung Female Male Female Male Non-smoking 9 1 2 7 10 Smoking 33 24 0 10 � 86 patients with Adenocarcinoma divided into: � Stage 1 and Stage 3 tumors, � male and female � Smoking and non-smoking � Data is severely unbalanced. � 10 non-neoplastic (normal) lung samples and their gene expression, but with no additional information (e.g. gender, smoking etc.) � 7129 probe sets. Zheng et al. 3

Data Set 2 (Boston) Normal Stage 1 Tumor Stage 2 Tumor Stage 3 Tumor Lung Female Male Female Male Female Male Non-smoking 5 2 3 0 2 0 13 Smoking 39 30 12 9 4 7 � 113 patients with Adenocarcinoma divided into: � Stage 1, Stage 2, and Stage 3 tumors � male and female � Smoking and non-smoking � 13 normal lung samples without any additional information. � Over 12600 probe sets Looked at 490 overlapping probe sets. Zheng et al. 4

Gene Selection Goal: Find genes that discriminate on the basis of tumor stage information. Methods: � ANOVA � SAM (http://www-stat.stanford.edu/~tibs/SAM/) � GS � GS-Robust � PCA � Select principal components contributing to >80% variation. Zheng et al. 5

ANOVA Based Gene Selection � For individual data set: � Single-factor (stage) � Multifactor (stage, gender, smoking) � For integrated data set � Single factor, multiple factor � Mixed-effect model (stage as fixed factor, lab as random factor) for the 490 overlapping probe sets. � Use P-value for stage to rank significance of genes Zheng et al. 6

Gene Selection Method: GS A measure of ratio of inter-group to intra-group variation. k ∑ − − 2 ( g g ) /( k 1 ) ij. i .. = = j 1 GS i n k ij ∑∑ − − 2 ( ) /( 1 ) g g n ijl ij ij . = = j 1 l 1 = g mean ( g ) ij. ij = = g mean { mean ( g ), j 1 ,..., k } i .. ij Zheng et al. 7

GS-Robust A robust measure of ratio of inter-group to intra-group variation. MAD [ median ( g ),..., median ( g )] = i 1 ik GSRobust i k ∑ MAD ( g ) ij = j 1 GSRobust : the GSRobust value for the ith gene. i MAD : median absolute deviation g : the vector of gene expression corresponding ij to ith gene and jth class. Zheng et al. 8

Classifiers � Feed forward neural network ( nnet() from R) Yet another machine learning classifier. Yawn! � FNN with Bayesian learning of network weights. � Neural Network Ensembles � Bagging (Breiman, 1994) � Boosting (Freund and Schapire, 1996) Zheng et al. 9

Bayesian Neural Networks: Bayesian Learning of the Weights Choose initial values of hyperparameters α and β W ~ N (0, 1/ α ) Total Error Error term Classifier Regularization term S W = β E D + α E W Eigenvalue of the λ W ≡ ∑ γ i Hessian matrix λ + α = i 1 i α = γ / 2 E new W β = − γ ( N ) / 2 E new D Zheng et al. 10

Bagging Classifiers ... ... Ensembled Classifier Zheng et al. 11

Boosting Classifiers Zheng et al. 12

Benchmarking the classifiers Iris BreastCancer .18 .10 .16 .09 5-fold Cross-validation Error .14 5-fold Cross-validation Error .08 .12 .07 .10 .08 .06 10 .06 .05 .04 2 1 .04 4 .02 8 0.00 .03 N = 10 10 10 10 10 10 N = 10 10 10 10 10 10 NNET NBAG NBOOST BNN BBAG BBOOST NNET NBAG NBOOST BNN BBAG BBOOST Zheng et al. 13

How different are these gene selection methods? SAM GS ANOVA GS-Robust SAM 200 GS 167 200 Michigan ANOVA 179 164 200 GS-Robust 23 28 20 200 SAM GS ANOVA GS-Robust SAM 200 Boston GS 43 200 ANOVA 68 35 200 GS-Robust 8 13 6 200 Zheng et al. 14

Common Significant Genes Gene Name Unigene Comment GAPD glyceraldehyde-3-phosphate dehydrogenase MGP matrix Gla protein RTVP1 GLI pathogenesis-related 1 (glioma) DDXBP1 Not found FGR Gardner-Rasheed feline sarcoma viral (v-fgr) oncogene homolog FGFR2 fibroblast growth factor receptor 2 (bacteria- expressed kinase, keratinocyte growth factor receptor, craniofacial dysostosis 1, Crouzon syndrome, Pfeiffer syndrome, Jackson-Weiss syndrome) TNNC1 troponin C, slow KIAA0140 KIAA0140 gene product Zheng et al. 15

Neural Network Topology Michigan Boston 20 Input Layer 20 4 Hidden Layer 4 3 Output Layer 4 Zheng et al. 16

Practical Issues � Underrepresented classes � Contradictions in mapping � Unbalanced testing data Zheng et al. 17

K-fold Cross-Validation Training Testing Training Testing Training Testing Error1 Error2 Error3 3-fold Cross-Validation Error Zheng et al. 18

Validation across Data Sets Data Set 1 Data Set 2 Data Set 2 Data Set 1 Training Testing Training Testing Zheng et al. 19

Results – Data Set 1 Gene Selection Method NN Type GS-ANOVA GS-SAM GS GS-Robust GS-PCA 0.289 ± 0.025 0.290 ± 0.022 0.296 ± 0.031 0.277 ± 0.024 0.288 ± 0.021 nnet 0.279 ± 0.004 0.277 ± 0.008 0.267 ± 0.018 0.273 ± 0.006 0.278 ± 0.000 nnet.bag 0.292 ± 0.012 0.290 ± 0.017 0.262 ± 0.016 0.272 ± 0.012 0.282 ± 0.013 nnet.boost 0.335 ± 0.048 0.311 ± 0.046 0.315 ± 0.036 0.269 ± 0.030 0.299 ± 0.034 Bayesian 0.282 ± 0.008 0.273 ± 0.014 0.264 ± 0.021 0.236 ± 0.017 0.280 ± 0.009 bayes.bag 0.282 ± 0.012 0.280 ± 0.015 0.257 ± 0.019 0.246 ± 0.015 0.277 ± 0.013 bayes.boost Zheng et al. 20

Results – Data Set 2 Gene Selection Method NN Type GS-ANOVA GS-SAM GS GS-Robust GS-PCA 0.153 ± 0.010 0.150 ± 0.005 0.148 ± 0.006 0.149 ± 0.002 0.150 ± 0.007 nnet 0.148 ± 0.000 0.148 ± 0.000 0.148 ± 0.000 0.148 ± 0.000 0.148 ± 0.000 nnet.bag 0.148 ± 0.000 0.149 ± 0.002 0.148 ± 0.000 0.148 ± 0.000 0.148 ± 0.000 nnet.boost 0.157 ± 0.016 0.152 ± 0.005 0.145 ± 0.006 0.154 ± 0.014 0.148 ± 0.000 Bayesian 0.148 ± 0.000 0.149 ± 0.003 0.147 ± 0.002 0.148 ± 0.000 0.148 ± 0.000 bayes.bag 0.147 ± 0.003 0.149 ± 0.005 0.142 ± 0.006 0.149 ± 0.002 0.148 ± 0.000 bayes.boost Zheng et al. 21

Validation Across Different Data Sets Michigan/Boston & Boston/Michigan Gene Selection Method Training/ NN Type Testing GS-ANOVA GS-SAM GS GS-Robust GS-PCA 0.090 ± 0.122 0.055 ± 0.054 0.122 ± 0.257 0.033 ± 0.000 0.142 ± 0.272 nnet 0.033 ± 0.000 0.033 ± 0.000 0.034 ± 0.003 0.033 ± 0.000 0.035 ± 0.005 nnet.bag 0.036 ± 0.008 0.055 ± 0.068 0.049 ± 0.037 0.033 ± 0.000 0.054 ± 0.050 nnet.boost Michigan 0.172 ± 0.309 0.269 ± 0.358 0.405 ± 0.466 0.099 ± 0.126 0.171 ± 0.294 /Boston Bayesian 0.034 ± 0.003 0.035 ± 0.003 0.057 ± 0.059 0.033 ± 0.003 0.105 ± 0.155 bayes.bag 0.037 ± 0.007 0.060 ± 0.038 0.138 ± 0.188 0.033 ± 0.000 0.061 ± 0.086 bayes.boost 0.391 ± 0.226 0.250 ± 0.077 0.299 ± 0.154 0.293 ± 0.178 0.221 ± 0.000 nnet 0.221 ± 0.000 0.221 ± 0.000 0.221 ± 0.000 0.221 ± 0.000 0.221 ± 0.000 nnet.bag 0.219 ± 0.004 0.222 ± 0.004 0.221 ± 0.000 0.221 ± 0.000 0.221 ± 0.000 nnet.boost Boston/ 0.434 ± 0.245 0.343 ± 0.249 0.380 ± 0.201 0.510 ± 0.336 0.276 ± 0.131 Michigan Bayesian 0.226 ± 0.015 0.222 ± 0.004 0.307 ± 0.149 0.280 ± 0.167 0.221 ± 0.000 bayes.bag 0.241 ± 0.042 0.271 ± 0.101 0.399 ± 0.286 0.337 ± 0.206 0.221 ± 0.000 bayes.boost Zheng et al. 22

Questions from Anomalous Results � Could it be due to different compositions of the data sets? � Could the assignment of tumor stage by TNM system be non-uniform? Does “Stage 1” mean the same for both data sets? � Could there be differences in preprocessing (normalization)? � Tumor heterogeneity? � Differences in treatment? � How can these questions be approached? Zheng et al. 23

Conclusions � Bagging exhibited consistently better performance. � Boosting improved classification, but was erratic. � Univariate Bayesian learning did not usually improve performance. � Bagging is a faster and simpler ensemble technique than boosting. � GS-Robust selected many unique genes and had excellent ability to select features for our classifiers. Zheng et al. 24

Acknowledgements Members of the Bioinformatics Research Group (BioRG), School of Computer Science, FIU: � Patricia Buendia � Daniel Cazalis � Tom Milledge � Xintao Wei � Chengyong Yang � Erliang Zeng http://www.cs.fiu.edu/~giri/BNN/ Zheng et al. 25

Zheng et al. 26

Neural Network Classifiers and Gene Selection Methods for - PowerPoint PPT Presentation

Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung Adenocarcinoma Gaolin Zheng School of Computer Science Florida International University, Miami, FL E.O. George Department of Mathematical Sciences,

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Gene Expression Data Introduction to gene expression data Expression data storage concept An

NATURAL SELECTION AND GENE FREQUENCY WHAT IS THAT? Natural selection is a key mechanism

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Brief evaluation of the NHS Evidence search function comparing NHS Evidence with PubMed and

The Schipperke A Breed Study by the Schipperke Club of America, Inc. Approved by the SCA Board

Gospel Standard Bethesda: 2. Restoring the financial viability of Bethesda Fund Bethesda The

Board of Education Open Work Session #2 Planning Process Update, December 3, 2015 Larry

A pioneer in pet therapeutics. Delivering safe and effective therapeutics that elevate the

Closure Presentation www.encompass-am.eu The ENCOMPASS project has received funding by the

Walking Football Update - 2+ Years on Walking football wins over older men to new form of the

Evolving Technique Update The Dislocated Knee: My Algorithm for Success What Has Worked For Me

Neural Network Classifiers and Gene Selection Methods for - PowerPoint PPT Presentation

Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung Adenocarcinoma Gaolin Zheng School of Computer Science Florida International University, Miami, FL E.O. George Department of Mathematical Sciences,

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Gene Expression Data Introduction to gene expression data Expression data storage concept An

NATURAL SELECTION AND GENE FREQUENCY WHAT IS THAT? Natural selection is a key mechanism

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Brief evaluation of the NHS Evidence search function comparing NHS Evidence with PubMed and

The Schipperke A Breed Study by the Schipperke Club of America, Inc. Approved by the SCA Board

Gospel Standard Bethesda: 2. Restoring the financial viability of Bethesda Fund Bethesda The

Board of Education Open Work Session #2 Planning Process Update, December 3, 2015 Larry

A pioneer in pet therapeutics. Delivering safe and effective therapeutics that elevate the

Closure Presentation www.encompass-am.eu The ENCOMPASS project has received funding by the

Walking Football Update - 2+ Years on Walking football wins over older men to new form of the

Evolving Technique Update The Dislocated Knee: My Algorithm for Success What Has Worked For Me

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?