Towards robust feature selection for high-dimensional, small sample - - PowerPoint PPT Presentation

towards robust feature selection for high dimensional
SMART_READER_LITE
LIVE PREVIEW

Towards robust feature selection for high-dimensional, small sample - - PowerPoint PPT Presentation

Towards robust feature selection for high-dimensional, small sample settings Yvan Saeys Bioinformatics and Evolutionary Genomics, Ghent University, Belgium yvan.saeys@psb.ugent.be Marseille, January 14th, 2010 Background: biomarker discovery


slide-1
SLIDE 1

Towards robust feature selection for high-dimensional, small sample settings

Yvan Saeys

Bioinformatics and Evolutionary Genomics, Ghent University, Belgium yvan.saeys@psb.ugent.be

Marseille, January 14th, 2010

slide-2
SLIDE 2

Background: biomarker discovery

Common task in computational biology Find the entities that best explain phenotypic differences Challenges:

◮ Many possible biomarkers

(high dimensionality)

◮ Only very few biomarkers

are important for the specific phenotypic difference

◮ Very few samples

Examples:

◮ Microarray data ◮ Mass spectrometry data ◮ SNP data Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 2 / 36

slide-3
SLIDE 3

Dimensionality reduction techniques

Dimensionality reduction techniques Projection Compression PCA LDA Fourier transform Wavelet transform Subset selection Feature ranking Feature weighting Feature selection techniques Feature transformation techniques Feature selection techniques Feature transformation techniques

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 3 / 36

slide-4
SLIDE 4

Dimensionality reduction techniques

Preserve the original semantics ! Dimensionality reduction techniques Projection Compression PCA LDA Fourier transform Wavelet transform Subset selection Feature ranking Feature weighting Feature selection techniques Feature transformation techniques Feature selection techniques Feature transformation techniques

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 3 / 36

slide-5
SLIDE 5

Casting the problem as a feature selection task

Feature selection is a way to avoid the curse of dimensionality Improve model performance

◮ Classification: improve classification performance (maximize

accuracy, AUC)

◮ Clustering: improve cluster detection (AIC, BIC, sum of squares,

various indices)

◮ Regression: improve fit (sum of squares error)

Faster and more cost-effective models Improve generalization performance (avoiding overfitting) Gain deeper insight into the processes that generated the data (esp. important in Bioinformatics)

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 4 / 36

slide-6
SLIDE 6

The need for robust marker selection algorithms

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

slide-7
SLIDE 7

The need for robust marker selection algorithms

Ranked gene list:

  • gene A
  • gene B
  • gene C
  • gene D
  • gene E

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

slide-8
SLIDE 8

The need for robust marker selection algorithms

Ranked gene list:

  • gene A
  • gene B
  • gene C
  • gene D
  • gene E

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

slide-9
SLIDE 9

The need for robust marker selection algorithms

Ranked gene list:

  • gene A
  • gene B
  • gene C
  • gene D
  • gene E

Ranked gene list:

  • gene X
  • gene A
  • gene W
  • gene Y
  • gene C

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

slide-10
SLIDE 10

The need for robust marker selection algorithms

Motivation

Highly variable marker ranking algorithms decrease the confidence of a domain expert

◮ Need to quantify the stability of a ranking algorithm ◮ Use this as an additional criterion next to the predictive power

More robust rankings yield a higher chance of representing biologically relevant markers Focus on quantifying/increasing marker stability within one data source

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 5 / 36

slide-11
SLIDE 11

Formalizing feature selection robustness

Definition

Consider a dataset D = {x1, . . . , xM}, xi = (x1

i , . . . xN i ) with M

instances and N features. A feature selection algorithm can then be defined as a mapping F : D → f from D to an N-dimensional vector f = (f1, . . . , fN),

1

weighting: fi = wi denotes the weight of feature i

2

ranking: fi ∈ {1, 2, . . . , N} denotes the rank of feature i

3

subset selection: fi = 0/1 denotes the exclusion/inclusion of feature i in the selected subset

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 6 / 36

slide-12
SLIDE 12

Formalizing feature selection robustness

Research questions:

1

How stable are current feature selection techniques for high dimensional, small sample settings ?

◮ Analyze sensitivity of robustness to signature size, model parameters. 2

Can we increase the robustness of feature selection in this setting ?

Definition

A feature selection algorithm is stable if small variations in the input [training data] result in small variations in the output [selected features]: F is stable iff for D ≈ D′, it follows that S(f, f′) < ǫ Methodological requirements:

1

Framework to generate small changes in training data

2

Similarity measures for feature weightings/rankings/subsets

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 7 / 36

slide-13
SLIDE 13

Generating training set variations

A subsampling approach: Draw k subsamples of size ⌈xM⌉ (0 < x < 1) randomly without replacement from D, where the parameters k and x can be varied. In our experiments: k=500 x=0.9

Algorithm

1

Generate k subsamples of size xM, {D1, . . . , Dk}

2

Perform the basic feature selector F on each of these k subsamples ∀k : F(Dk) = fk

3

Perform all k(k−1)

2

pairwise comparisons, and average over them Stab(F) = 2 k

i=1

k

j=i+1 S(fi, fj)

k(k − 1) where S(. , .) denotes an appropriate similarity function between weightings/rankings/subsets

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 8 / 36

slide-14
SLIDE 14

Similarity measures for feature selection outputs

1

Weighting (Pearson CC): S(fi, fj) =

  • l(fl

i − µfi)(fl j − µfj)

  • l(fl

i − µfi)2 l(fl j − µfj)2

2

Ranking (Spearman rank CC): S(fi, fj) = 1 − 6

  • l

(fl

i − fl j)2

N(N2 − 1)

3

Subset selection (Jaccard index): S(fi, fj) = |fi ∩ fj| |fi ∪ fj| =

  • l I(fl

i = fl j = 1))

  • l I(fl

i + fl j > 0)

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 9 / 36

slide-15
SLIDE 15

Kuncheva’s index for comparing feature subsets

Definition

Let A and B be subsets of features, both of the same cardinality s. Let r = |A ∩ B| Requirements for a desirable stability index for feature subsets:

1

Monotonicity: for a fixed subset size s, and number of features N, the larger the intersection between the subsets, the higher the value of the consistency index.

2

Limits: index should be bound by constants that do not depend

  • n N or s. Maximum should be attained when the subsets are

identical: r = s

3

Correction for chance: index should have a constant value for independently drawn subsets of the same cardinality s.

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 10 / 36

slide-16
SLIDE 16

Kuncheva’s index for comparing feature subsets

General form of the index: Observed r − Expected r Maximum r − Expected r For randomly drawn A and B,the number of objects from A selected also in B is a random variable Y with hypergeometric distribution with probability mass function P(Y = r) = ( s

r )

N−s

s−r

  • ( N

s )

The expected value of Y for given s and N is s2

N Thus define

KI(A, B) = r − s2

N

s − s2

N

= rN − s2 s(N − s) KI is bound by −1 ≤ KI ≤ 1 [Kuncheva (2007)]

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 11 / 36

slide-17
SLIDE 17

Improving feature selection robustness

Methodology based on ensemble methods for classification. Can we transfer this to feature selection ? Previous work

◮ Use feature selection to construct an ensemble ◮ Works of Cherkauer, Opitz, Tsymbal and Cunningham ◮ Feature selection → ensemble

This work

◮ Use ensemble methods to perform feature selection ◮ Feature selection ← ensemble

Research questions: Can we improve feature selection robustness/stability using ensembles of feature selectors ? Statistical, computational and representational aspects of ensemble learning transferable to feature selection ? How does it affect classification performance ?

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 12 / 36

slide-18
SLIDE 18

Components of ensemble feature selection

Training set

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

slide-19
SLIDE 19

Components of ensemble feature selection

Training set

Feature selection algorithm 1

Ranked list 1

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

slide-20
SLIDE 20

Components of ensemble feature selection

Training set

Feature selection algorithm 1

Ranked list 1

Feature selection algorithm 2

Ranked list 2

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

slide-21
SLIDE 21

Components of ensemble feature selection

Training set

Feature selection algorithm 1

Ranked list 1

Feature selection algorithm 2

Ranked list 2

Feature selection algorithm t

Ranked list T … …

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

slide-22
SLIDE 22

Components of ensemble feature selection

Training set

Feature selection algorithm 1

Ranked list 1

Feature selection algorithm 2

Ranked list 2

Feature selection algorithm t

Ranked list T … …

Aggregation

  • perator

Consensus Ranked list C

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 13 / 36

slide-23
SLIDE 23

Components of ensemble feature selection

Variation in the feature selectors

◮ Choosing different feature selection techniques ◮ Dataset perturbation ⋆ Instance level perturbation ⋆ Feature level perturbation ◮ Stochasticity in the feature selector ◮ Bayesian model averaging ◮ Combinations of these techniques Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 14 / 36

slide-24
SLIDE 24

Components of ensemble feature selection

Variation in the feature selectors

◮ Choosing different feature selection techniques ◮ Dataset perturbation ⋆ Instance level perturbation ⋆ Feature level perturbation ◮ Stochasticity in the feature selector ◮ Bayesian model averaging ◮ Combinations of these techniques

Aggregation of the results into a single output

◮ Rank aggregation ◮ Weighted rank aggregations ◮ Score aggregation ◮ Counting most frequently selected features Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 14 / 36

slide-25
SLIDE 25

Overview: 2 case studies

1

Bagging based ensemble feature selection

◮ Microarray data sets ◮ Feature ranking approach ◮ Rank aggregation method 2

Ensemble feature selection using model stochasticity

◮ Mass spectrometry data sets ◮ Feature selection approach ◮ Subset aggregation approach Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 15 / 36

slide-26
SLIDE 26

Case study 1: Bagging based ensemble feature selection

Generate feature selection diversity by instance perturbation

◮ Bootstrapping ◮ Generate t datasets by sampling the training set with replacement ◮ For each dataset, apply a feature selection algorithm (e.g. a ranker)

EFS = {F1, F2, . . . , Ft}

◮ Each feature selector Fk results in a ranking fi = (f 1

i , . . . , f N i ), where

f j

i denotes the rank of feature j in bootstrap i.

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 16 / 36

slide-27
SLIDE 27

Aggregation methods

Rank aggregation f = (

t

  • i=1

w1f1

i , . . . , t

  • i=1

wNfN

i )

◮ Complete linear aggregation (CLA)

wi = 1

◮ Complete weighted aggregation (CWA)

wi = OO-AUCi

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 17 / 36

slide-28
SLIDE 28

Overview methodology

SUBSAMPLING Full data set (100% of the samples) 90 % 90 % 90 % …

Marker selection algorithm

Ranked list 1

Marker selection algorithm

Ranked list 2

Marker selection algorithm

Ranked list K … … 90 %

Marker selection algorithm

Ranked list 1

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36

slide-29
SLIDE 29

Overview methodology

90 %

Marker selection algorithm

Ranked list 1 90 %

Marker selection algorithm

Ranked list 1

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36

slide-30
SLIDE 30

Overview methodology

90 %

Marker selection algorithm

Ranked list A BOOTSTRAPPING Bootstrap 1 Bootstrap 2 Bootstrap T …

Marker selection algorithm

Ranked list B

Marker selection algorithm

Ranked list T 90 %

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36

slide-31
SLIDE 31

Overview methodology

90 %

Consensus marker selection algorithm

Consensus ranked list

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 18 / 36

slide-32
SLIDE 32

Experiments

Microarray datasets

Name # Class 1 # Class 2 Size # Features SDR Reference Colon 40 22 62 2000 0.031 Alon et al. (1999) Leukemia 47 25 72 7129 0.010 Golub et al. (1999) Lymphoma 22 23 45 4026 0.011 Alizadeh et al. (2000) Prostate 52 55 107 6033 0.017 Singh et al. (2002)

Baseline classifier/feature selection algorithm Linear SVM SVM Recursive Feature Elimination (RFE, Guyon et al. (2002))

1

Train linear SVM on full feature set

2

Rank features based on |w|

3

Eliminate 50% worst features

4

Retrain SVM on remaining features

5

Go to step 2

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 19 / 36

slide-33
SLIDE 33

Results: stability distributions

ensemble baseline

colon

10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000

leukemia

10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000

lymphoma

10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000

prostate

10 20 30 40 50 60 70 80 90 100 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 20 / 36

slide-34
SLIDE 34

Results: stability

Colon Leukemia

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva index Percentage of selected features

CLA CWA Baseline

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva index Percentage of selected features

CLA CWA Baseline

Lymphoma Prostate

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva index Percentage of selected features

CLA CWA Baseline

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva index Percentage of selected features

CLA CWA Baseline

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 21 / 36

slide-35
SLIDE 35

Results: classification performance

Colon Leukemia

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

CLA CWA Baseline

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

CLA CWA Baseline

Lymphoma Prostate

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

CLA CWA Baseline

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

CLA CWA Baseline

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 22 / 36

slide-36
SLIDE 36

Bagging based EFS: first conclusions

Ensemble feature selection (EFS) increases model performance:

◮ More stable biomarker selection ◮ Increased predictive performance

EFS is easy to parallelize As signature sizes get smaller, EFS progressively improves upon the baseline Robust, small signatures are interesting candidates for prognostic tests Linear aggregation method is preferred

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 23 / 36

slide-37
SLIDE 37

Sensitivity analysis: number of bootstraps

Effect on stability

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Colon

20 Boot 40 Boot 60 Boot Baseline

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Leukemia

20 Boot 40 Boot 60 Boot Baseline

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Lymphoma

20 Boot 40 Boot 60 Boot Baseline

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Prostate

20 Boot 40 Boot 60 Boot Baseline

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 24 / 36

slide-38
SLIDE 38

Sensitivity analysis: number of bootstraps

Effect on classification performance

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Colon

20 Boot 40 Boot 60 Boot Baseline

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Leukemia

20 Boot 40 Boot 60 Boot Baseline

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Lymphoma

20 Boot 40 Boot 60 Boot Baseline

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Prostate

20 Boot 40 Boot 60 Boot Baseline

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 25 / 36

slide-39
SLIDE 39

Sensitivity analysis: RFE elimination percentage

Effect on stability

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Colon

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Leukemia

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Lymphoma

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

0.3 0.4 0.5 0.6 0.7 0.8 100 50 25 10 5 2 1 0.5

Kuncheva Index Percentage of selected features

Prostate

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 26 / 36

slide-40
SLIDE 40

Sensitivity analysis: RFE elimination percentage

Effect on classification performance

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Colon

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Leukemia

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Lymphoma

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

0.75 0.8 0.85 0.9 0.95 1 100 50 25 10 5 2 1 0.5

AUC Percentage of selected features

Prostate

CLA,E=20% CLA,E=50% CLA,E=100% Baseline,E=20% Baseline,E=50% Baseline,E=100%

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 27 / 36

slide-41
SLIDE 41

Bagging based EFS: final conclusions

Ensemble feature selection (EFS) increases model performance:

◮ More stable biomarker selection ◮ Increased predictive performance

Number of bootstraps only effects stability RFE elimination percentage does not affect EFS RFE elimination percentage has a strong impact on baseline:

◮ Single run SVM performs best in terms of stability ◮ Smaller impact on classification performance Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 28 / 36

slide-42
SLIDE 42

Case study 2: Ensemble FS using model stochasticity

Traditional approach:

◮ Run a stochastic FS method many times (e.g. MCMC, Genetic

Algorithm, stochastic iterative sampling)

◮ Compare all feature subsets found ◮ Make a final selection ⋆ Intersection of the results ⋆ Most frequently selected features Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 29 / 36

slide-43
SLIDE 43

Case study 2: Ensemble FS using model stochasticity

Traditional approach:

◮ Run a stochastic FS method many times (e.g. MCMC, Genetic

Algorithm, stochastic iterative sampling)

◮ Compare all feature subsets found ◮ Make a final selection ⋆ Intersection of the results ⋆ Most frequently selected features

Computationally more efficient approach:

◮ Don’t use only the single best results of the sampling procedure ◮ Average over the whole distribution Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 29 / 36

slide-44
SLIDE 44

Estimation of distribution algorithms (EDA)

Instead of working on one solution, work on a set of solutions (distribution) Use stochastic iterative sampling, combined with probabilistic graphical models to model good solutions

  • 1. Generate ini
  • 2. Select a number of samples
  • 3. Estimate probability distribution
  • 4. Generate new samples by

sampling the estimated distribution Termination criteria met ? No Yes End

  • 5. Create new solution set
  • 1. Generate initial solution set S0
  • 2. Select a number
  • 3. Estimate probability distribution
  • 4. Generate new i

sampling the estimated distribution Termination criteria met ? No Yes End

  • 5. Create new

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 30 / 36

slide-45
SLIDE 45

Estimating the probability distribution

EDA

X8 X8 X2 X2 X3 X3 X4 X4 X6 X6 X5 X5 X7 X7 X1 X1 X8 X2 X3 X4 X6 X5 X7 X1 X2 X3 X4 X6 X5 X7 X1

UMDA BMDA BOA, EBNA

Xi p(xi

j) j=0,1

X1 p(x1

j)

X2 p(x2

j)

X3 p(x3

j)

X4 p(x4

j)

X5 p(x5

j)

X6 p(x6

j)

X7 p(x7

j)

X8 p(x8

j)

Xi p(xi

j) j,k=0,1

X1 p(x1

j)

X2 p(x2

j)

X3 p(x3

j)

X4 p(x4

j | x1 k)

X5 p(x5

j | x3 k)

X6 p(x6

j | x4 k)

X7 p(x7

j | x3 k)

X8 p(x8

j | x5 k)

Xi p(xi

j) j,k,l=0,1

X1 p(x1

j)

X2 p(x2

j)

X3 p(x3

j

X4 p(x4

j | x1 k)

X5 p(x5

j | x3 k , x4 l)

X6 p(x6

j | x4 k)

X7 p(x7

j | x3 k)

X8 p(x8

j | x5 k)

X8

Graphical model Probability distribution

| x1

k)

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 31 / 36

slide-46
SLIDE 46

Experiments

Mass spectrometry datasets

Name # C1 # C2 Size # Features SDR Reference Ovarian cancer profiling 121 79 200 45,200 0,0044 Petricoin et al. (2002) Detection of drug-induced toxicity 28 34 62 45,200 0,00137 Petricoin et al. (2004) Hepatocellular carcinoma 78 72 150 36,802 0,0041 Ressom et al. (2006)

Estimation algorithms: UMDA, BMDA Classifiers: Naive Bayes, k-NN, SVM Average all EDA results over 500 multistarts

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 32 / 36

slide-47
SLIDE 47

Results [preliminary]

Usage for knowledge discovery: peak frequency plots

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 33 / 36

slide-48
SLIDE 48

Future challenges

Better dealing with correlated features

◮ First cluster correlated features, then choose representatives from

each cluster, and build a model with the representatives

◮ Adapt similarity measures to deal with correlated features Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 34 / 36

slide-49
SLIDE 49

Future challenges

Better dealing with correlated features

◮ First cluster correlated features, then choose representatives from

each cluster, and build a model with the representatives

◮ Adapt similarity measures to deal with correlated features

Increasing stability by transfer learning

◮ Assume 2 related datasets D1 and D2 ◮ Use feature selection on D1 as “prior” for feature selection on D2 ◮ Preliminary research shows that this “transferral” of feature

selection information increases the stability of feature selection on D2 [Helleputte and Dupont (2009)]

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 34 / 36

slide-50
SLIDE 50

Future challenges

Better dealing with correlated features

◮ First cluster correlated features, then choose representatives from

each cluster, and build a model with the representatives

◮ Adapt similarity measures to deal with correlated features

Increasing stability by transfer learning

◮ Assume 2 related datasets D1 and D2 ◮ Use feature selection on D1 as “prior” for feature selection on D2 ◮ Preliminary research shows that this “transferral” of feature

selection information increases the stability of feature selection on D2 [Helleputte and Dupont (2009)]

A comparative evaluation of different ensemble FS techniques

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 34 / 36

slide-51
SLIDE 51

Acknowledgements

Thomas Abeel (Ghent University) Yves Van de Peer (Ghent University) Thibault Helleputte (UC Louvain) Pierre Dupont (UC Louvain) Ruben Armañanzas (Universidad Politecnica de Madrid) Iñaki Inza (University of the Basque Country) Pedro Larrañaga (Universidad Politecnica de Madrid)

Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 35 / 36

slide-52
SLIDE 52

Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., & Levine, A. (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745–6750. Golub, T., Slonim, D. K., Tamayo, P ., Huard, C., Gaasenbeek, M., Mesirov, J. P ., Coller, H., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422. Helleputte, T., & Dupont, P . (2009). Feature selection by transfer learning with linear regularized models. Lecture Notes in Artificial Intelligence, 5781, 533–547. Kuncheva, L. (2007). A stability index for feature selection. Proceedings of the 25th International Multi-Conference on Artificial Intelligence and Applications (pp. 309–395). Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P . J., Fusaro, V. A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C., & Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577. Petricoin, E. F., Rajapaske, V., Herman, E. H., Arekani, A. M., Ross, S., Johann, D., Knapton, A., Zhang, J., Hitt, B. A., Conrads,

  • T. P

., Veenstra, T. D., Liotta, L. A., & Sistare, F. D. (2004). Toxicoproteomics: Serum proteomic pattern diagnostics for early detection of drug induced cardiac toxicities and cardioprotection. Toxicologic Pathology, 32, 122–130. Ressom, H. W., Varghese, R. S., Orvisky, E., Drake, S. K., Hortin, G. L., Abdel-Hamid, M., Loffredo, C. A., & Goldman, R. (2006). Ant colony optimization for biomarker identification from MALDI-TOF mass spectra. Proceedings of the 28th International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 4560–4563). Singh, D., Febbo, P ., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P ., Renshaw, A., D’Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P ., TR, T. G., & Sellers, W. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1, 203–209. Yvan Saeys (UGent) Towards robust feature selection Marseille 2010 36 / 36