designing ml experiments
play

Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Introduction Training Measuring Classifier Performance Comparing Classifiers Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction


  1. Introduction Training Measuring Classifier Performance Comparing Classifiers Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall 2010 1

  2. Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error rate of 1 − NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP? Training/validation/test sets 2

  3. Introduction Training Measuring Classifier Performance Comparing Classifiers Training Introduction 1 Training 2 Response Surface Design Cross-Validation & Resampling Measuring Classifier Performance 3 Comparing Classifiers 4 Comparing Two Classifiers Comparing Multiple Classifiers Comparing Over Multiple Datasets 3

  4. Introduction Training Measuring Classifier Performance Comparing Classifiers Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability 4

  5. Introduction Training Measuring Classifier Performance Comparing Classifiers Factors and Response 5

  6. Introduction Training Measuring Classifier Performance Comparing Classifiers Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Select desired response function measuring the desired criteria. 6

  7. Introduction Training Measuring Classifier Performance Comparing Classifiers Response surface design For approximating and maximizing the response function in terms of the controllable factors. 7

  8. Introduction Training Measuring Classifier Performance Comparing Classifiers Resampling and K-Fold Cross-Validation The need for multiple training/validation sets { T i , V i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k sets, X i T 1 = X 2 ∪ X 3 ∪ . . . ∪ X k = X − X 1 V 1 = X 1 V 2 = X 2 T 2 = X 1 ∪ X 3 ∪ . . . ∪ X k = X − X 2 . . . . . . T k = X − X k V k = X k Each pair of T i share k − 2 parts 8

  9. Introduction Training Measuring Classifier Performance Comparing Classifiers 5x2 Cross-Validation Perform 2-Fold Cross-Validation 5 times V 1 = X (1) T 1 = X (1) 1 2 V 1 = X (2) T 1 = X (2) 1 2 V 1 = X (3) T 1 = X (3) 1 2 V 1 = X (4) T 1 = X (4) 1 2 V 1 = X (5) T 1 = X (5) 1 2 using 5 different divisions into half 9

  10. Introduction Training Measuring Classifier Performance Comparing Classifiers Measuring Classifier Performance Predicted Class True Class Yes No Yes TP: true positive FN: false negative No FP: false positive TN: true negative Error rate = # of errors / # of instances = (FN + FP) / N Recall = # of found positives / # of positives = TP / (TP + FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP + FP) Specificity = TN / (TN + FP) FP / (FP + TN) = 1 − Specificity False alarm rate = 10

  11. Introduction Training Measuring Classifier Performance Comparing Classifiers Receiver Operating Characteristics 11

  12. Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing Classifiers Introduction 1 Training 2 Response Surface Design Cross-Validation & Resampling Measuring Classifier Performance 3 Comparing Classifiers 4 Comparing Two Classifiers Comparing Multiple Classifiers Comparing Over Multiple Datasets 12

  13. Introduction Training Measuring Classifier Performance Comparing Classifiers McNemar’s Test H 0 : µ 0 = µ 1 Single training/validation set e 00 : Number of examples e 01 : Number of examples misclassified by both misclassified by 1 but not by 2 e 10 : Number of examples e 11 : Number of exam- misclassified by 2 but not ples correctly classified by by 1 both Under H 0 we expect e 01 = e 10 ( | e 01 − e 10 | − 1) 2 ∼ χ 2 1 e 01 + e 10 Accept with confidence 1 − α if < χ 2 α, 1 13

  14. Introduction Training Measuring Classifier Performance Comparing Classifiers K-fold Cross-Validated Paired t Test Use K-fold c-v to get K training/validation folds p 1 i , p 2 i : Errors of classifiers 1 and 2 on fold i p i = p 1 i p 2 i : Paired difference on fold i H 0 : p i has mean 0 � K � K i =1 ( p i − m ) 2 i =1 p i s 2 = m = K − 1 K √ m K ∼ t k − 1 s Accept if in ( − t α/ 2 , K − 1 , t α/ 2 , K − 1 ) 14

  15. Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing L > 2 Classifiers Analysis of variance (Anova) H 0 : µ 1 = µ 2 = . . . = µ L Errors of L algorithms on K folds X ij ∼ N ( µ j , σ 2 ) j = 1 , . . . , L i = 1 , . . . , K 15

  16. Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing L > 2 Classifiers (cont.) Anova constructs two estimates of σ 2 ( m j − m ) 2 b ≈ K � L If H 0 is true, σ 2 and SS b σ 2 ∼ χ 2 L − 1 where j =1 L − 1 L � ( m j − m ) 2 SS b = K j =1 i ( X ij − m j ) 2 � � Regardless of the truth of H 0 , σ 2 w ≈ j and L ( K − 1) SS w σ 2 ∼ χ 2 L , K − 1 where ( X ij − m j ) 2 � � SS w = j i σ 2 w ∼ F L − 1 , L ( K − 1) b σ 2 Accept H 0 if < F α, L − 1 , L ( K − 1) 16

  17. Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing Over Multiple Datasets Comparing two algorithms: Sign test : Count how many times A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate Comparing multiple algorithms Kruskal-Wallis test : Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend