Effect of Classifiers in Consensus Feature Ranking for Biomedical - PowerPoint PPT Presentation

Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi, Kost Elisevich Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets

Dimension Reduction  Prediction accuracy of practical machine learning algorithms degrades when faced with many features that are not necessary for predicting the desired output.  Feature Construction / Extraction • Construct new features based on the original data e.g. PCA and ISOMAP.  Feature Selection / Ranking • Choose features from the original feature set. e.g. Filter and Wrapper methods. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 2

Feature Selection / Ranking  Improves the prediction performance.  Eases understanding of the underlying process that generated the data.  Reduces measurement and storage requirements.  Facilitates data visualization.  Reduces training and utilization times. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 3

Feature Ranking  The output of the process is a ranked list of features according to a criteria.   Variable ranking is not necessarily used to build predictors: • Understanding of the underlying data. • e.g. which medical test is more accurate or reliable than the others in a diagnosis. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 4

Consensus Feature Ranking  Ensemble (consensus) methods have been used to mitigate the problems of traditional methods such as poor accuracy, bias, and stability.  Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 5

Motivation   Score is a Single Variable Classifier  Feature score is the predictive performance of a classifier build based on only that single feature. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 6

Motivation   The effect of inclusion of classifiers in the combination (ensemble function) has been studies to see which classifier plays a positive/negative role. • Logistic-Regression • Support Vector Machines (SVM) • K-nearest Neighbors • Naïve Bayes • Bagging Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 7

Biomedical Datasets  When applying Feature Ranking methods on medical datasets, one has to consider the common characteristics of medical datasets: • Class-imbalanced data • Missing values Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 8

Missing Value / Class-Imbalance  Missing value estimation and imputation negatively affects the reliability of the model.  We performed the study only based on properly recorded values and missing values were eliminated . • Adversely affecting the imbalance distribution  We used the area under receiver operating characteristic (ROC) curve (AUC) as a performance evaluator for individual features, to address the balance problem. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 9

Framework  Experimental Framework: Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 10

Evaluation  α features from the top of the ranked features were selected and the predictive power of this feature subset was tested with a classifier via cross validation.  To use the maximum possible instances for each feature subset, we used the samples that have all the values for only the features in the subset being evaluated.  The number of instances varies for each feature subset, making the comparison of the ranking methods with different feature subsets difficult. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 11

Performance Index  To mitigate the mismatching number of instances.  n is the number of features considered in the calculation.  c is the evaluating classifier.  F i is the set of i features with the highest score  F i _ins is the numbers of instances that have all the values for features in F i .  AUC(c(F i )) represents the average AUC of ROC for evaluation of on c , using the leave-one-out technique. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 12

Performance Index  A consideration in this formula is that the ranking methods that achieve a higher accuracy with fewer features and more instances are preferable.  For this reason, the number of features appears in the weight factor as 1/i and the number of instances as F i _ins . Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 13

Experiments Environment  The dataset used in the experiments is from Human Brain Image Database System (HBIDS), developed in the Radiology Department of Henry Ford Health System (Detroit, Michigan USA).  The main task in this dataset is a binary classification that predicts the patients’ lateralization (side of abnormality).  The database contains 197 medical features and 145 patients. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 14

Some features in HBIDS  Semiology,  Pre- and postoperative neuropsychological profiles  Location of surgery,  Surgery outcome according to the Engel classification.  Interictal waveforms, their location and predominance as well as ictal onset location.  Both magnetic resonance (MR) and single photon emission computed tomography (SPECT) (ictal and interictal) imaging is included with the provision for quantitative semi-automated assessment of compartmental volume, fluid-attenuated inversion recovery (FLAIR) mean signal and standard deviation and texture analysis  Compartmentalized ictal SPECT subtraction image analysis is also available. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 15

HBIDS Missing Values  Missing values were identified for: • EEG features in 21% of cases • Wada studies in 31% of cases • Imaging features in 46% of cases • The remaining features in about 20% of cases on average. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 16

Experimental Results Evaluation on Bagging Evaluation with SVM Evaluation on K-Nearest-Neighbors Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 17

Experimental Results Evaluation on Naïve-Bayes Evaluation on Logistic-Regression Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 18

Observations  Evaluation with SVM:  Evaluation with K-NN: ◦ ◦ SVM: Neutral SVM: Neutral ◦ ◦ Naïve-Bayes: Positive Naïve-Bayes: Negative ◦ ◦ K-Nearest Neighbors: Negative K-Nearest Neighbors: Neutral ◦ ◦ Bagging: Negative Bagging: Neutral ◦ ◦ Logistic Regression: Positive Logistic Regression: Negative  Evaluation with Bagging:  Evaluation with Naïve-Bayes: ◦ ◦ SVM: Neutral SVM: Neutral ◦ ◦ Naïve-Bayes: Negative Naïve-Bayes: Negative ◦ ◦ K-Nearest Neighbors: Neutral K-Nearest Neighbors: Negative ◦ ◦ Bagging: Neutral Bagging: Neutral ◦ ◦ Logistic Regression: Negative Logistic Regression: Neutral Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 19

Observations  Performance of the consensus feature ranking with a classifier is not highly dependent on inclusion of that classifier itself in the fusion.  Therefore, features ranked based on ensemble of scores from multiple classifiers are likely to perform well on unseen classifiers .  This ranking plays an important role in data-warehousing , where data are gathered with the possibility to be used with new emerging classifiers in the future. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 20

Refrences Y. Saeys, I. Inza, P. Larranaga, "A review of 1. feature selection techniques in bioinformatics," Bioinformatics, vol. 23, p. 2507, 2007. I. Guyon, A. Elisseeff, "An introduction to variable 2. and feature selection," Journal of Machine Learning Research, vol. 3, pp. 1157-82, 2003. Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets 21

Thank you If you are interested to get more details about this research please contact Shobeir Fakhraei {shobeir@wayne.com} 22

Effect of Classifiers in Consensus Feature Ranking for Biomedical - PowerPoint PPT Presentation

Shobeir Fakhraei, Hamid Soltanian-Zadeh, Farshad Fotouhi, Kost Elisevich Effect of Classifiers in Consensus Feature Ranking for Biomedical Datasets Dimension Reduction Prediction accuracy of practical machine learning algorithms degrades

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu

Evaluation of Classifiers Evaluation of Classifiers ROC Curves ROC Curves Reject Curves Reject

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Visualization for Explainable Classifiers Yao MING THE HONG KONG UNIVERSITY OF SCIENCE AND

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11,

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Off- -The The- -Shelf Classifiers Shelf Classifiers Off A method that can be applied directly

Good Predictions Are Worth a Few Comparisons Carine Pivoteau with Nicolas Auger and Cyril Nicaud

Linear Methods for Regression and Classification Petr Pok Czech Technical University in

Air Travel Forecast Problem Objectives Introduction to forecasting methods Experience

Revision (Part I I ) Ke Chen Revision slides are going to summarise all you have learnt from

Spring 2016 Research Update Presentations UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN |

Coling 2008 workshop on human judgements in Computational Linguistics Ron Artstein Gemma Boleda

Evaluating Interfaces with Users Why evaluation is crucial to interface design General approaches

Invulnerable software D. J. Bernstein University of Illinois at Chicago Public goal of