Introduction Questions: Assessment of the expected error of a - - PowerPoint PPT Presentation

▶

Jan 01, 2024 233 likes •293 views

Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction Questions: Assessment of the expected error of a learning algorithm: Is

SLIDE 1

Introduction Training Measuring Classifier Performance Comparing Classifiers

Designing ML Experiments

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Introduction Training Measuring Classifier Performance Comparing Classifiers

Introduction

Questions:

Assessment of the expected error of a learning algorithm: Is the error rate of 1 − NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP?

Training/validation/test sets

2 Introduction Training Measuring Classifier Performance Comparing Classifiers

Training

1

Introduction

2

Training Response Surface Design Cross-Validation & Resampling

3

Measuring Classifier Performance

4

Comparing Classifiers Comparing Two Classifiers Comparing Multiple Classifiers Comparing Over Multiple Datasets

3 Introduction Training Measuring Classifier Performance Comparing Classifiers

Algorithm Preference

Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability

SLIDE 2

Introduction Training Measuring Classifier Performance Comparing Classifiers

Factors and Response

5 Introduction Training Measuring Classifier Performance Comparing Classifiers

Algorithm Preference

Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Select desired response function measuring the desired criteria.

6 Introduction Training Measuring Classifier Performance Comparing Classifiers

Response surface design

For approximating and maximizing the response function in terms

f the controllable factors.

7 Introduction Training Measuring Classifier Performance Comparing Classifiers

Resampling and K-Fold Cross-Validation

The need for multiple training/validation sets {Ti, Vi}i: Training/validation sets of fold i K-fold cross-validation: Divide X into k sets, Xi V1 = X1 T1 = X2 ∪ X3 ∪ . . . ∪ Xk = X − X1 V2 = X2 T2 = X1 ∪ X3 ∪ . . . ∪ Xk = X − X2 . . . . . . Vk = Xk Tk = X − Xk Each pair of Ti share k − 2 parts

SLIDE 3

Introduction Training Measuring Classifier Performance Comparing Classifiers

5x2 Cross-Validation

Perform 2-Fold Cross-Validation 5 times V1 = X (1)

1

T1 = X (1)

2

V1 = X (2)

1

T1 = X (2)

2

V1 = X (3)

1

T1 = X (3)

2

V1 = X (4)

1

T1 = X (4)

2

V1 = X (5)

1

T1 = X (5)

2

using 5 different divisions into half

9 Introduction Training Measuring Classifier Performance Comparing Classifiers

Measuring Classifier Performance

Predicted Class True Class Yes No Yes TP: true positive FN: false negative No FP: false positive TN: true negative Error rate = # of errors/# of instances = (FN + FP)/N Recall = # of found positives / # of positives = TP/(TP + FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP/(TP + FP) Specificity = TN/(TN + FP) False alarm rate = FP/(FP + TN) = 1 − Specificity

10 Introduction Training Measuring Classifier Performance Comparing Classifiers

Receiver Operating Characteristics

11 Introduction Training Measuring Classifier Performance Comparing Classifiers

Comparing Classifiers

1

Introduction

2

Training Response Surface Design Cross-Validation & Resampling

3

Measuring Classifier Performance

4

Comparing Classifiers Comparing Two Classifiers Comparing Multiple Classifiers Comparing Over Multiple Datasets

SLIDE 4

Introduction Training Measuring Classifier Performance Comparing Classifiers

McNemar’s Test

H0: µ0 = µ1 Single training/validation set e00: Number of examples misclassified by both e01: Number of examples misclassified by 1 but not by 2 e10: Number of examples misclassified by 2 but not by 1 e11: Number of exam- ples correctly classified by both Under H0 we expect e01 = e10 (|e01 − e10| − 1)2 e01 + e10 ∼ χ2

1

Accept with confidence 1 − α if < χ2

α,1

13 Introduction Training Measuring Classifier Performance Comparing Classifiers

K-fold Cross-Validated Paired t Test

Use K-fold c-v to get K training/validation folds p1

i , p2 i : Errors of classifiers 1 and 2 on fold i

pi = p1

i p2 i : Paired difference on fold i

H0: pi has mean 0 m = K

i=1 pi

K s2 = K

i=1(pi − m)2

K − 1 m √ K s ∼ tk−1 Accept if in (−tα/2,K−1, tα/2,K−1)

14 Introduction Training Measuring Classifier Performance Comparing Classifiers

Comparing L > 2 Classifiers

Analysis of variance (Anova) H0 : µ1 = µ2 = . . . = µL Errors of L algorithms on K folds Xij ∼ N(µj, σ2) j = 1, . . . , L i = 1, . . . , K

15 Introduction Training Measuring Classifier Performance Comparing Classifiers

Comparing L > 2 Classifiers (cont.)

Anova constructs two estimates of σ2 If H0 is true, σ2

b ≈ K L j=1 (mj−m)2 L−1

and SSb

σ2 ∼ χ2 L−1 where

SSb = K

L

(mj − m)2 Regardless of the truth of H0, σ2

w ≈

j
i(Xij−mj)2

L(K−1)

and

SSw σ2 ∼ χ2 L,K−1 where

SSw =

(Xij − mj)2

σ2

w ∼ FL−1,L(K−1)

Accept H0 if < Fα,L−1,L(K−1)

SLIDE 5

Introduction Training Measuring Classifier Performance Comparing Classifiers

Comparing Over Multiple Datasets

Comparing two algorithms: Sign test: Count how many times A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate Comparing multiple algorithms Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference