Towards robust feature selection for high-dimensional, small sample - - PowerPoint PPT Presentation
Towards robust feature selection for high-dimensional, small sample - - PowerPoint PPT Presentation
Towards robust feature selection for high-dimensional, small sample settings Yvan Saeys Bioinformatics and Evolutionary Genomics, Ghent University, Belgium yvan.saeys@psb.ugent.be Marseille, January 14th, 2010 Background: biomarker discovery
Background: biomarker discovery
Common task in computational biology Find the entities that best explain phenotypic differences Challenges:
◮ Many possible biomarkers
(high dimensionality)
◮ Only very few biomarkers
are important for the specific phenotypic difference
◮ Very few samples
Examples:
◮ Microarray data ◮ Mass spectrometry data ◮ SNP data Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 2 / 36
Dimensionality reduction techniques
Dimensionality reduction techniques Projection Compression PCA LDA Fourier transform Wavelet transform Subset selection Feature ranking Feature weighting Feature selection techniques Feature transformation techniques Feature selection techniques Feature transformation techniques
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 3 / 36
Dimensionality reduction techniques
Preserve the original semantics ! Dimensionality reduction techniques Projection Compression PCA LDA Fourier transform Wavelet transform Subset selection Feature ranking Feature weighting Feature selection techniques Feature transformation techniques Feature selection techniques Feature transformation techniques
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 3 / 36
Casting the problem as a feature selection task
Feature selection is a way to avoid the curse of dimensionality Improve model performance
◮ Classification: improve classification performance (maximize
accuracy, AUC)
◮ Clustering: improve cluster detection (AIC, BIC, sum of squares,
various indices)
◮ Regression: improve fit (sum of squares error)
Faster and more cost-effective models Improve generalization performance (avoiding overfitting) Gain deeper insight into the processes that generated the data (esp. important in Bioinformatics)
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 4 / 36
The need for robust marker selection algorithms
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36
The need for robust marker selection algorithms
Ranked gene list:
- gene A
- gene B
- gene C
- gene D
- gene E
- …
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36
The need for robust marker selection algorithms
Ranked gene list:
- gene A
- gene B
- gene C
- gene D
- gene E
- …
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36
The need for robust marker selection algorithms
Ranked gene list:
- gene A
- gene B
- gene C
- gene D
- gene E
- …
Ranked gene list:
- gene X
- gene A
- gene W
- gene Y
- gene C
- …
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36
The need for robust marker selection algorithms
Motivation
Highly variable marker ranking algorithms decrease the confidence of a domain expert
◮ Need to quantify the stability of a ranking algorithm ◮ Use this as an additional criterion next to the predictive power
More robust rankings yield a higher chance of representing biologically relevant markers Focus on quantifying/increasing marker stability within one data source
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36
Formalizing feature selection robustness
Definition
Consider a dataset D = {x1, . . . , xM}, xi = (x1
i , . . . xN i ) with M
instances and N features. A feature selection algorithm can then be defined as a mapping F : D → f from D to an N-dimensional vector f = (f1, . . . , fN),
1
weighting: fi = wi denotes the weight of feature i
2
ranking: fi ∈ {1, 2, . . . , N} denotes the rank of feature i
3
subset selection: fi = 0/1 denotes the exclusion/inclusion of feature i in the selected subset
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 6 / 36
Formalizing feature selection robustness
Research questions:
1
How stable are current feature selection techniques for high dimensional, small sample settings ?
◮ Analyze sensitivity of robustness to signature size, model parameters. 2
Can we increase the robustness of feature selection in this setting ?
Definition
A feature selection algorithm is stable if small variations in the input [training data] result in small variations in the output [selected features]: F is stable iff for D ≈ D′, it follows that S(f, f′) < ǫ Methodological requirements:
1
Framework to generate small changes in training data
2
Similarity measures for feature weightings/rankings/subsets
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 7 / 36
Generating training set variations
A subsampling approach: Draw k subsamples of size ⌈xM⌉ (0 < x < 1) randomly without replacement from D, where the parameters k and x can be varied. In our experiments: k=500 x=0.9
Algorithm
1
Generate k subsamples of size xM, {D1, . . . , Dk}
2
Perform the basic feature selector F on each of these k subsamples ∀k : F(Dk) = fk
3
Perform all k(k−1)
2
pairwise comparisons, and average over them Stab(F) = 2 k
i=1
k
j=i+1 S(fi, fj)
k(k − 1) where S(. , .) denotes an appropriate similarity function between weightings/rankings/subsets
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 8 / 36
Similarity measures for feature selection outputs
1
Weighting (Pearson CC): S(fi, fj) =
- l(fl
i − µfi)(fl j − µfj)
- l(fl
i − µfi)2 l(fl j − µfj)2
2
Ranking (Spearman rank CC): S(fi, fj) = 1 − 6
- l
(fl
i − fl j)2
N(N2 − 1)
3
Subset selection (Jaccard index): S(fi, fj) = |fi ∩ fj| |fi ∪ fj| =
- l I(fl
i = fl j = 1))
- l I(fl
i + fl j > 0)
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 9 / 36
Kuncheva’s index for comparing feature subsets
Definition
Let A and B be subsets of features, both of the same cardinality s. Let r = |A ∩ B| Requirements for a desirable stability index for feature subsets:
1
Monotonicity: for a fixed subset size s, and number of features N, the larger the intersection between the subsets, the higher the value of the consistency index.
2
Limits: index should be bound by constants that do not depend
- n N or s. Maximum should be attained when the subsets are
identical: r = s
3
Correction for chance: index should have a constant value for independently drawn subsets of the same cardinality s.
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 10 / 36
Kuncheva’s index for comparing feature subsets
General form of the index: Observed r − Expected r Maximum r − Expected r For randomly drawn A and B,the number of objects from A selected also in B is a random variable Y with hypergeometric distribution with probability mass function P(Y = r) = ( s
r )
N−s
s−r
- ( N
s )
The expected value of Y for given s and N is s2
N Thus define
KI(A, B) = r − s2
N
s − s2
N
= rN − s2 s(N − s) KI is bound by −1 ≤ KI ≤ 1 [Kuncheva (2007)]
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 11 / 36
Improving feature selection robustness
Methodology based on ensemble methods for classification. Can we transfer this to feature selection ? Previous work
◮ Use feature selection to construct an ensemble ◮ Works of Cherkauer, Opitz, Tsymbal and Cunningham ◮ Feature selection → ensemble
This work
◮ Use ensemble methods to perform feature selection ◮ Feature selection ← ensemble
Research questions: Can we improve feature selection robustness/stability using ensembles of feature selectors ? Statistical, computational and representational aspects of ensemble learning transferable to feature selection ? How does it affect classification performance ?
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 12 / 36
Components of ensemble feature selection
Training set
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36
Components of ensemble feature selection
Training set
Feature selection algorithm 1
Ranked list 1
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36
Components of ensemble feature selection
Training set
Feature selection algorithm 1
Ranked list 1
Feature selection algorithm 2
Ranked list 2
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36
Components of ensemble feature selection
Training set
Feature selection algorithm 1
Ranked list 1
Feature selection algorithm 2
Ranked list 2
Feature selection algorithm t
Ranked list T … …
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36
Components of ensemble feature selection
Training set
Feature selection algorithm 1
Ranked list 1
Feature selection algorithm 2
Ranked list 2
Feature selection algorithm t
Ranked list T … …
Aggregation
- perator
Consensus Ranked list C
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36
Components of ensemble feature selection
Variation in the feature selectors
◮ Choosing different feature selection techniques ◮ Dataset perturbation ⋆ Instance level perturbation ⋆ Feature level perturbation ◮ Stochasticity in the feature selector ◮ Bayesian model averaging ◮ Combinations of these techniques Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 14 / 36
Components of ensemble feature selection
Variation in the feature selectors
◮ Choosing different feature selection techniques ◮ Dataset perturbation ⋆ Instance level perturbation ⋆ Feature level perturbation ◮ Stochasticity in the feature selector ◮ Bayesian model averaging ◮ Combinations of these techniques
Aggregation of the results into a single output
◮ Rank aggregation ◮ Weighted rank aggregations ◮ Score aggregation ◮ Counting most frequently selected features Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 14 / 36
Overview: 2 case studies
1
Bagging based ensemble feature selection
◮ Microarray data sets ◮ Feature ranking approach ◮ Rank aggregation method 2
Ensemble feature selection using model stochasticity
◮ Mass spectrometry data sets ◮ Feature selection approach ◮ Subset aggregation approach Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 15 / 36
Case study 1: Bagging based ensemble feature selection
Generate feature selection diversity by instance perturbation
◮ Bootstrapping ◮ Generate t datasets by sampling the training set with replacement ◮ For each dataset, apply a feature selection algorithm (e.g. a ranker)
EFS = {F1, F2, . . . , Ft}
◮ Each feature selector Fk results in a ranking fi = (f 1
i , . . . , f N i ), where
f j
i denotes the rank of feature j in bootstrap i.
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 16 / 36
Aggregation methods
Rank aggregation f = (
t
- i=1
w1f1
i , . . . , t
- i=1
wNfN
i )
◮ Complete linear aggregation (CLA)
wi = 1
◮ Complete weighted aggregation (CWA)
wi = OO-AUCi
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 17 / 36
Overview methodology
SUBSAMPLING Full data set (100% of the samples) 90 % 90 % 90 % …
Marker selection algorithm
Ranked list 1
Marker selection algorithm
Ranked list 2
Marker selection algorithm
Ranked list K … … 90 %
Marker selection algorithm
Ranked list 1
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36
Overview methodology
90 %
Marker selection algorithm
Ranked list 1 90 %
Marker selection algorithm
Ranked list 1
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36
Overview methodology
90 %
Marker selection algorithm
Ranked list A BOOTSTRAPPING Bootstrap 1 Bootstrap 2 Bootstrap T …
Marker selection algorithm
Ranked list B
Marker selection algorithm
Ranked list T 90 %
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36
Overview methodology
90 %
Consensus marker selection algorithm
Consensus ranked list
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36
Experiments
Microarray datasets
Name # Class 1 # Class 2 Size # Features SDR Reference Colon 40 22 62 2000 0.031 Alon et al. (1999) Leukemia 47 25 72 7129 0.010 Golub et al. (1999) Lymphoma 22 23 45 4026 0.011 Alizadeh et al. (2000) Prostate 52 55 107 6033 0.017 Singh et al. (2002)
Baseline classifier/feature selection algorithm Linear SVM SVM Recursive Feature Elimination (RFE, Guyon et al. (2002))
1
Train linear SVM on full feature set
2
Rank features based on |w|
3
Eliminate 50% worst features
4
Retrain SVM on remaining features
5
Go to step 2
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 19 / 36
Results: stability distributions
ensemble baseline
colon
10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000
leukemia
10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000
lymphoma
10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000
prostate
10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 20 / 36
Results: stability
Colon Leukemia
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva index Percentage of selected features
CLA CWA Baseline
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva index Percentage of selected features
CLA CWA Baseline
Lymphoma Prostate
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva index Percentage of selected features
CLA CWA Baseline
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva index Percentage of selected features
CLA CWA Baseline
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 21 / 36
Results: classification performance
Colon Leukemia
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
CLA CWA Baseline
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
CLA CWA Baseline
Lymphoma Prostate
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
CLA CWA Baseline
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
CLA CWA Baseline
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 22 / 36
Bagging based EFS: first conclusions
Ensemble feature selection (EFS) increases model performance:
◮ More stable biomarker selection ◮ Increased predictive performance
EFS is easy to parallelize As signature sizes get smaller, EFS progressively improves upon the baseline Robust, small signatures are interesting candidates for prognostic tests Linear aggregation method is preferred
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 23 / 36
Sensitivity analysis: number of bootstraps
Effect on stability
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Colon
20 Boot 40 Boot 60 Boot Baseline
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Leukemia
20 Boot 40 Boot 60 Boot Baseline
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Lymphoma
20 Boot 40 Boot 60 Boot Baseline
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Prostate
20 Boot 40 Boot 60 Boot Baseline
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 24 / 36
Sensitivity analysis: number of bootstraps
Effect on classification performance
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Colon
20 Boot 40 Boot 60 Boot Baseline
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Leukemia
20 Boot 40 Boot 60 Boot Baseline
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Lymphoma
20 Boot 40 Boot 60 Boot Baseline
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Prostate
20 Boot 40 Boot 60 Boot Baseline
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 25 / 36
Sensitivity analysis: RFE elimination percentage
Effect on stability
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Colon
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Leukemia
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Lymphoma
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5
Kuncheva Index Percentage of selected features
Prostate
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 26 / 36
Sensitivity analysis: RFE elimination percentage
Effect on classification performance
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Colon
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Leukemia
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Lymphoma
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5
AUC Percentage of selected features
Prostate
CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 27 / 36
Bagging based EFS: final conclusions
Ensemble feature selection (EFS) increases model performance:
◮ More stable biomarker selection ◮ Increased predictive performance
Number of bootstraps only effects stability RFE elimination percentage does not affect EFS RFE elimination percentage has a strong impact on baseline:
◮ Single run SVM performs best in terms of stability ◮ Smaller impact on classification performance Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 28 / 36
Case study 2: Ensemble FS using model stochasticity
Traditional approach:
◮ Run a stochastic FS method many times (e.g. MCMC, Genetic
Algorithm, stochastic iterative sampling)
◮ Compare all feature subsets found ◮ Make a final selection ⋆ Intersection of the results ⋆ Most frequently selected features Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 29 / 36
Case study 2: Ensemble FS using model stochasticity
Traditional approach:
◮ Run a stochastic FS method many times (e.g. MCMC, Genetic
Algorithm, stochastic iterative sampling)
◮ Compare all feature subsets found ◮ Make a final selection ⋆ Intersection of the results ⋆ Most frequently selected features
Computationally more efficient approach:
◮ Don’t use only the single best results of the sampling procedure ◮ Average over the whole distribution Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 29 / 36
Estimation of distribution algorithms (EDA)
Instead of working on one solution, work on a set of solutions (distribution) Use stochastic iterative sampling, combined with probabilistic graphical models to model good solutions
- 1. Generate ini
- 2. Select a number of samples
- 3. Estimate probability distribution
- 4. Generate new samples by
sampling the estimated distribution Termination criteria met ? No Yes End
- 5. Create new solution set
- 1. Generate initial solution set S0
- 2. Select a number
- 3. Estimate probability distribution
- 4. Generate new i
sampling the estimated distribution Termination criteria met ? No Yes End
- 5. Create new
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 30 / 36
Estimating the probability distribution
EDA
X8 X8 X2 X2 X3 X3 X4 X4 X6 X6 X5 X5 X7 X7 X1 X1 X8 X2 X3 X4 X6 X5 X7 X1 X2 X3 X4 X6 X5 X7 X1
UMDA BMDA BOA, EBNA
Xi p(xi
j) j=0,1
X1 p(x1
j)
X2 p(x2
j)
X3 p(x3
j)
X4 p(x4
j)
X5 p(x5
j)
X6 p(x6
j)
X7 p(x7
j)
X8 p(x8
j)
Xi p(xi
j) j,k=0,1
X1 p(x1
j)
X2 p(x2
j)
X3 p(x3
j)
X4 p(x4
j | x1 k)
X5 p(x5
j | x3 k)
X6 p(x6
j | x4 k)
X7 p(x7
j | x3 k)
X8 p(x8
j | x5 k)
Xi p(xi
j) j,k,l=0,1
X1 p(x1
j)
X2 p(x2
j)
X3 p(x3
j
X4 p(x4
j | x1 k)
X5 p(x5
j | x3 k , x4 l)
X6 p(x6
j | x4 k)
X7 p(x7
j | x3 k)
X8 p(x8
j | x5 k)
X8
Graphical model Probability distribution
| x1
k)
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 31 / 36
Experiments
Mass spectrometry datasets
Name # C1 # C2 Size # Features SDR Reference Ovarian cancer profiling 121 79 200 45,200 0,0044 Petricoin et al. (2002) Detection of drug-induced toxicity 28 34 62 45,200 0,00137 Petricoin et al. (2004) Hepatocellular carcinoma 78 72 150 36,802 0,0041 Ressom et al. (2006)
Estimation algorithms: UMDA, BMDA Classifiers: Naive Bayes, k-NN, SVM Average all EDA results over 500 multistarts
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 32 / 36
Results [preliminary]
Usage for knowledge discovery: peak frequency plots
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 33 / 36
Future challenges
Better dealing with correlated features
◮ First cluster correlated features, then choose representatives from
each cluster, and build a model with the representatives
◮ Adapt similarity measures to deal with correlated features Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 34 / 36
Future challenges
Better dealing with correlated features
◮ First cluster correlated features, then choose representatives from
each cluster, and build a model with the representatives
◮ Adapt similarity measures to deal with correlated features
Increasing stability by transfer learning
◮ Assume 2 related datasets D1 and D2 ◮ Use feature selection on D1 as “prior” for feature selection on D2 ◮ Preliminary research shows that this “transferral” of feature
selection information increases the stability of feature selection on D2 [Helleputte and Dupont (2009)]
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 34 / 36
Future challenges
Better dealing with correlated features
◮ First cluster correlated features, then choose representatives from
each cluster, and build a model with the representatives
◮ Adapt similarity measures to deal with correlated features
Increasing stability by transfer learning
◮ Assume 2 related datasets D1 and D2 ◮ Use feature selection on D1 as “prior” for feature selection on D2 ◮ Preliminary research shows that this “transferral” of feature
selection information increases the stability of feature selection on D2 [Helleputte and Dupont (2009)]
A comparative evaluation of different ensemble FS techniques
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 34 / 36
Acknowledgements
Thomas Abeel (Ghent University) Yves Van de Peer (Ghent University) Thibault Helleputte (UC Louvain) Pierre Dupont (UC Louvain) Ruben Armañanzas (Universidad Politecnica de Madrid) Iñaki Inza (University of the Basque Country) Pedro Larrañaga (Universidad Politecnica de Madrid)
Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 35 / 36
Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., & Levine, A. (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745–6750. Golub, T., Slonim, D. K., Tamayo, P ., Huard, C., Gaasenbeek, M., Mesirov, J. P ., Coller, H., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422. Helleputte, T., & Dupont, P . (2009). Feature selection by transfer learning with linear regularized models. Lecture Notes in Artificial Intelligence, 5781, 533–547. Kuncheva, L. (2007). A stability index for feature selection. Proceedings of the 25th International Multi-Conference on Artificial Intelligence and Applications (pp. 309–395). Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P . J., Fusaro, V. A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C., & Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577. Petricoin, E. F., Rajapaske, V., Herman, E. H., Arekani, A. M., Ross, S., Johann, D., Knapton, A., Zhang, J., Hitt, B. A., Conrads,
- T. P
., Veenstra, T. D., Liotta, L. A., & Sistare, F. D. (2004). Toxicoproteomics: Serum proteomic pattern diagnostics for early detection of drug induced cardiac toxicities and cardioprotection. Toxicologic Pathology, 32, 122–130. Ressom, H. W., Varghese, R. S., Orvisky, E., Drake, S. K., Hortin, G. L., Abdel-Hamid, M., Loffredo, C. A., & Goldman, R. (2006). Ant colony optimization for biomarker identification from MALDI-TOF mass spectra. Proceedings of the 28th International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 4560–4563). Singh, D., Febbo, P ., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P ., Renshaw, A., D’Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P ., TR, T. G., & Sellers, W. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1, 203–209. Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 36 / 36