Towards robust feature selection for high-dimensional, small sample - PowerPoint PPT Presentation

Towards robust feature selection for high-dimensional, small sample settings Yvan Saeys Bioinformatics and Evolutionary Genomics, Ghent University, Belgium yvan.saeys@psb.ugent.be Marseille, January 14th, 2010

Background: biomarker discovery Common task in computational biology Find the entities that best explain phenotypic differences Challenges: ◮ Many possible biomarkers (high dimensionality) ◮ Only very few biomarkers are important for the specific phenotypic difference ◮ Very few samples Examples: ◮ Microarray data ◮ Mass spectrometry data ◮ SNP data Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 2 / 36

Dimensionality reduction techniques Dimensionality reduction techniques Feature selection techniques Feature selection techniques Feature transformation techniques Feature transformation techniques Subset selection Projection Compression Feature ranking PCA Fourier transform Feature weighting LDA Wavelet transform Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 3 / 36

Dimensionality reduction techniques Dimensionality reduction techniques Feature selection techniques Feature selection techniques Feature transformation techniques Feature transformation techniques Subset selection Projection Compression Feature ranking PCA Fourier transform Feature weighting LDA Wavelet transform Preserve the original semantics ! Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 3 / 36

Casting the problem as a feature selection task Feature selection is a way to avoid the curse of dimensionality Improve model performance ◮ Classification: improve classification performance (maximize accuracy, AUC) ◮ Clustering: improve cluster detection (AIC, BIC, sum of squares, various indices) ◮ Regression: improve fit (sum of squares error) Faster and more cost-effective models Improve generalization performance (avoiding overfitting) Gain deeper insight into the processes that generated the data (esp. important in Bioinformatics) Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 4 / 36

The need for robust marker selection algorithms Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

The need for robust marker selection algorithms Ranked gene list: •gene A •gene B •gene C •gene D •gene E •… Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

The need for robust marker selection algorithms Ranked gene list: Ranked gene list: •gene A •gene X •gene B •gene A •gene C •gene W •gene D •gene Y •gene E •gene C •… •… Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

The need for robust marker selection algorithms Motivation Highly variable marker ranking algorithms decrease the confidence of a domain expert ◮ Need to quantify the stability of a ranking algorithm ◮ Use this as an additional criterion next to the predictive power More robust rankings yield a higher chance of representing biologically relevant markers Focus on quantifying/increasing marker stability within one data source Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

Formalizing feature selection robustness Definition Consider a dataset D = { x 1 , . . . , x M } , x i = ( x 1 i , . . . x N i ) with M instances and N features. A feature selection algorithm can then be defined as a mapping F : D → f from D to an N -dimensional vector f = ( f 1 , . . . , f N ) , weighting: f i = w i denotes the weight of feature i 1 ranking: f i ∈ { 1 , 2 , . . . , N } denotes the rank of feature i 2 subset selection: f i = 0 / 1 denotes the exclusion/inclusion of 3 feature i in the selected subset Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 6 / 36

Formalizing feature selection robustness Research questions: How stable are current feature selection techniques for high dimensional, small 1 sample settings ? ◮ Analyze sensitivity of robustness to signature size, model parameters. 2 Can we increase the robustness of feature selection in this setting ? Definition A feature selection algorithm is stable if small variations in the input [training data] result in small variations in the output [selected features]: F is stable iff for D ≈ D ′ , it follows that S ( f , f ′ ) < ǫ Methodological requirements: Framework to generate small changes in training data 1 Similarity measures for feature weightings/rankings/subsets 2 Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 7 / 36

Generating training set variations A subsampling approach: Draw k subsamples of size ⌈ xM ⌉ (0 < x < 1) randomly without replacement from D , where the parameters k and x can be varied. In our experiments: k =500 x =0.9 Algorithm Generate k subsamples of size xM , {D 1 , . . . , D k } 1 Perform the basic feature selector F on each of these k subsamples 2 ∀ k : F ( D k ) = f k Perform all k ( k − 1 ) pairwise comparisons, and average over them 3 2 2 � k � k j = i + 1 S ( f i , f j ) i = 1 Stab ( F ) = k ( k − 1 ) where S ( . , . ) denotes an appropriate similarity function between weightings/rankings/subsets Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 8 / 36

Similarity measures for feature selection outputs Weighting (Pearson CC): 1 l ( f l i − µ f i )( f l � j − µ f j ) S ( f i , f j ) = �� l ( f l i − µ f i ) 2 � l ( f l j − µ f j ) 2 Ranking (Spearman rank CC): 2 ( f l i − f l j ) 2 � S ( f i , f j ) = 1 − 6 N ( N 2 − 1 ) l Subset selection (Jaccard index): 3 l I ( f l i = f l � j = 1 )) S ( f i , f j ) = | f i ∩ f j | | f i ∪ f j | = l I ( f l i + f l � j > 0 ) Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 9 / 36

Kuncheva’s index for comparing feature subsets Definition Let A and B be subsets of features, both of the same cardinality s . Let r = | A ∩ B | Requirements for a desirable stability index for feature subsets: Monotonicity : for a fixed subset size s , and number of features 1 N , the larger the intersection between the subsets, the higher the value of the consistency index. Limits : index should be bound by constants that do not depend 2 on N or s . Maximum should be attained when the subsets are identical: r = s Correction for chance : index should have a constant value for 3 independently drawn subsets of the same cardinality s . Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 10 / 36

Kuncheva’s index for comparing feature subsets General form of the index: Observed r − Expected r Maximum r − Expected r For randomly drawn A and B ,the number of objects from A selected also in B is a random variable Y with hypergeometric distribution with probability mass function � N − s P ( Y = r ) = ( s � r ) s − r ( N s ) The expected value of Y for given s and N is s 2 N Thus define KI ( A , B ) = r − s 2 = rN − s 2 N s − s 2 s ( N − s ) N KI is bound by − 1 ≤ KI ≤ 1 [Kuncheva (2007)] Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 11 / 36

Improving feature selection robustness Methodology based on ensemble methods for classification. Can we transfer this to feature selection ? Previous work ◮ Use feature selection to construct an ensemble ◮ Works of Cherkauer, Opitz, Tsymbal and Cunningham ◮ Feature selection → ensemble This work ◮ Use ensemble methods to perform feature selection ◮ Feature selection ← ensemble Research questions: Can we improve feature selection robustness/stability using ensembles of feature selectors ? Statistical, computational and representational aspects of ensemble learning transferable to feature selection ? How does it affect classification performance ? Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 12 / 36

Components of ensemble feature selection Training set Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

Components of ensemble feature selection Training set Feature selection algorithm 1 Ranked list 1 Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

Components of ensemble feature selection Training set Feature selection Feature selection algorithm 1 algorithm 2 Ranked list 1 Ranked list 2 Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

Components of ensemble feature selection Training set … Feature selection Feature selection Feature selection algorithm 1 algorithm 2 algorithm t … Ranked list 1 Ranked list 2 Ranked list T Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

Components of ensemble feature selection Training set … Feature selection Feature selection Feature selection algorithm 1 algorithm 2 algorithm t … Ranked list 1 Ranked list 2 Ranked list T Aggregation operator Consensus Ranked list C Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

Towards robust feature selection for high-dimensional, small sample - PowerPoint PPT Presentation

Towards robust feature selection for high-dimensional, small sample settings Yvan Saeys Bioinformatics and Evolutionary Genomics, Ghent University, Belgium yvan.saeys@psb.ugent.be Marseille, January 14th, 2010 Background: biomarker discovery

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Factor Analysis for Multiple Testing : an R package for large-scale significance testing under

STK-IN4300 The bet on sparsity principle Statistical Learning Methods in Data Science

Hommels Method for False Discovery Proportions Jelle Goeman Joint work with: Aldo Solari,

A Linked Data Representation for Summary Statistics and Grouping Criteria RPI IDEA/Tetherless

Cherry-picking Multiple Testing for Exploratory Research Jelle Goeman Aldo Solari Leiden

Hypertension in Renal Tx Transplants 100% Hypertension most common modifiable CV risk factor

Nonparametric Density Estimation October 1, 2018 Introduction If we cant fit a

Building Community-Based Research Opportunities December 5th, 2018 Hallie Ford Center #115

Towards robust feature selection for high-dimensional, small sample - PowerPoint PPT Presentation

Towards robust feature selection for high-dimensional, small sample settings Yvan Saeys Bioinformatics and Evolutionary Genomics, Ghent University, Belgium yvan.saeys@psb.ugent.be Marseille, January 14th, 2010 Background: biomarker discovery

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Factor Analysis for Multiple Testing : an R package for large-scale significance testing under

STK-IN4300 The bet on sparsity principle Statistical Learning Methods in Data Science

Hommels Method for False Discovery Proportions Jelle Goeman Joint work with: Aldo Solari,

A Linked Data Representation for Summary Statistics and Grouping Criteria RPI IDEA/Tetherless

Cherry-picking Multiple Testing for Exploratory Research Jelle Goeman Aldo Solari Leiden

Hypertension in Renal Tx Transplants 100% Hypertension most common modifiable CV risk factor

Nonparametric Density Estimation October 1, 2018 Introduction If we cant fit a

Building Community-Based Research Opportunities December 5th, 2018 Hallie Ford Center #115

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani