Improving Cross-Validation Classifier Selection Accuracy through - - PowerPoint PPT Presentation

improving cross validation classifier selection accuracy
SMART_READER_LITE
LIVE PREVIEW

Improving Cross-Validation Classifier Selection Accuracy through - - PowerPoint PPT Presentation

Improving Cross-Validation Classifier Selection Accuracy through Meta- learning Jesse H. Krijthe Jesse H. Krijthe, Tin Kam Ho, Marco Loog Classifier Selection Problem Classifiers Classification problem D QDA Classification problem SVM 4 LDA


slide-1
SLIDE 1

Improving Cross-Validation Classifier Selection Accuracy through Meta- learning

Jesse H. Krijthe Jesse H. Krijthe, Tin Kam Ho, Marco Loog

slide-2
SLIDE 2

Classifier Selection Problem

−10 −5 5 −8 −6 −4 −2 2 4 Feature 1 Feature 2 Classification problem

Classification problem

Classifiers

SVM Fisher C4.5 ID3 LDA QDA Nearest Mean Nearest Neighbor GBM Random Forest

Which classifier gives the lowest error rate when evaluated on a large test set?

D

e

slide-3
SLIDE 3

A Practical Solution

  • In practice: have no large test set to determine
  • Alternative: estimate through a cross-validation procedure,
  • Procedure is practically unbiased and intuitive
  • Use the estimates of each classifier to select the best one
  • Used for:

– Classifier selection – Parameter tuning – Feature selection – Performance estimation

ˆ ecv

e e

slide-4
SLIDE 4

Goal

Is it possible to use meta-learning techniques to improve the accuracy (rather than the computational efficiency) of classifier selection using cross-validation?

slide-5
SLIDE 5

Cross-validation revisited (1/2)

  • C={c1,..cm} a set of classifiers, D a dataset
  • Calculate the k-fold cross-validation error
  • 1. Randomly assign the n objects in the dataset to k parts

(folds)

  • 2. Use fold 2 to k to train a classifier
  • 3. Use fold 1 to test its accuracy
  • 4. Cycle through, using each fold as the test set once
  • 5. Average the accuracies over all the folds

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold10

D

n k n k n k n k n k n k n k n k n k n k

e1

slide-6
SLIDE 6

Cross-validation revisited (1/2)

  • C={c1,..cm} a set of classifiers, D a dataset
  • Calculate the k-fold cross-validation error
  • 1. Randomly assign the n objects in the dataset to k parts

(folds)

  • 2. Use fold 2 to k to train a classifier
  • 3. Use fold 1 to test its accuracy
  • 4. Cycle through, using each fold as the test set once
  • 5. Average the accuracies over all the folds

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold10

D

n k n k n k n k n k n k n k n k n k n k

e1 e2

slide-7
SLIDE 7

Cross-validation revisited (1/2)

  • C={c1,..cm} a set of classifiers, D a dataset
  • Calculate the k-fold cross-validation error
  • 1. Randomly assign the n objects in the dataset to k parts

(folds)

  • 2. Use fold 2 to k to train a classifier
  • 3. Use fold 1 to test its accuracy
  • 4. Cycle through, using each fold as the test set once
  • 5. Average the accuracies over all the folds

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold10

D

n k n k n k n k n k n k n k n k n k n k

e1 e2

ˆ ecv = ei k

i=1 k

e3

slide-8
SLIDE 8

Cross-validation revisited (2/2)

  • Select the classifier with lowest
  • Bias decreases as k increases

– Unbiased as estimator for – Small bias for reasonable k, large n

  • For a particular dataset, interested in the difference and
  • Variance

– High as k goes to n – High as k goes to 2 – Lowest usually around 5-10 – Higher than for bootstrap and resubstition

ˆ ecv

n − n k

ˆ ecv

e

slide-9
SLIDE 9

Why would cross-validation fail?

  • As Braga-Neto et. al. 2004 and others note,

if n is small, variance of the cross-validation error estimate becomes large

  • Cross-validation error estimates become

unreliable for a given dataset

  • Specifically: classifier selection based on these

estimates may suffer

slide-10
SLIDE 10

Meta-learning (1/2)

  • Learning which classifier to select based on

characteristics of the dataset

  • Classifier selection as just another classification

problem

– Classes: the most accurate classifier – Features: statistics on the dataset (meta-features)

  • Meta-features are preferably

– Computationally efficient – Predictive – Interpretable

slide-11
SLIDE 11

Meta-learning (2/2)

D1 D2 D3 0.05 0.1 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25 0.3

D1 D2 D3

2 fold CV Error Linear Discriminant 2 fold CV Error Quadratic Discriminant

Datasets Measures Parameterization of dataset space

slide-12
SLIDE 12

Cross-validation selection as meta-learning

  • Cross-validation errors are measures on

the dataset as well

  • Idea: Treat them as meta-features
  • Meta-classifier in this case:

– Select the classifier with the lowest cross- validation error – Static diagonal rule

Meta-classes Best classifier (m) Meta-features Cross-validation error (m) Meta-classifier Static ‘diagonal’ rule

slide-13
SLIDE 13

Cross-validation Meta-problem

D1 D2 D3 0.05 0.1 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25 0.3

D1 D2 D3

2 fold CV Error Linear Discriminant 2 fold CV Error Quadratic Discriminant

Datasets Measures Parameterization of dataset space

slide-14
SLIDE 14

Cross-validation Meta-problem

D1 D2 D3 0.05 0.1 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25 0.3

D1 D2 D3

2 fold CV Error Linear Discriminant 2 fold CV Error Quadratic Discriminant

Datasets Measures Parameterization of dataset space

Is this simple, static rule justified?

slide-15
SLIDE 15

A Meta-learning Universe (1/3)

  • Choice between two simple classifiers:

– Nearest Mean – 1-Nearest Neighbor

  • Two simple problem types

– Each suited to one of the classifiers – Small training samples (20-100) – Generate enough data to estimate the real error (~20000) – Problem types have equal priors

  • Slightly contrived

– Visualization – Illustrate Concept

slide-16
SLIDE 16

A Meta-learning Universe (2/3)

  • Randomly vary the

distance

  • Generate 500 problems
  • G={G1,G2,…,G500}
  • High Bayes error
  • Randomly vary the width

(variance)

  • Generate 500 problems
  • B={B1,B2,…,B500}
  • Low Bayes error
slide-17
SLIDE 17

Error: 0.16 -> 0.06 (learning makes a difference)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10−fold CV error Nearest Mean 10−fold CV error 1−NN Meta−problem NM, G 1−NN, G NM, B 1−NN, B

slide-18
SLIDE 18

Additional meta-features (1/2)

  • Classifiers: Nearest mean and Least Squares
  • Elongated boundary problem (100 dimensions)
  • Randomness

– Class priors – Number of objects (20-100)

  • Extra features

– Number of objects n – Variance of the cross-validation errors

Can characteristics of the data improve classifier selection afuer we know the cross validation errors?

slide-19
SLIDE 19

Additional meta-features (2/2)

Classifier CV errors +n +Variance +n & Variance CV- selection 0.237 k-NN 0.238 0.151 0.221 0.127 LDA 0.241 0.159 0.239 0.110

0.05 0.1 0.15 20 40 60 80 100 0.05 0.1 0.15 20 40 60 80 100 − − − −

slide-20
SLIDE 20

Pseudo real-world data

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 10−fold CV error Fisher 10−fold CV error Parzen Real−world data meta−problem Fisher Best Parzen Best

slide-21
SLIDE 21

Pseudo real-world data

Classifier CV errors +Variance CV- selection 0.695 k-NN 0.605 0.587 LDA 0.618 0.599

Classifier Best on Nearest Mean 236 k-Nearest Neighbor 118 Fisher 243 Quadratic Discriminant 32 Parzen Density 286 Decision Stump (Purity Criterion) 221 Linear Support Vector Machine 164 Radial Basis Support Vector Machine 200

slide-22
SLIDE 22

Conclusion

  • There are universes were me

meta- a-le learning arning can an

  • utperf
  • utperform
  • rm cr

cross

  • ss-validation

alidation based classifier selection

  • Additional statistics of the data can aid in classifier

selection

  • Some indication this works on real-world datasets,

more experiments are needed

  • Evidence to support meta-learning not just as a time-

efficient alternative to cross-validation, but potentially more accurate