Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Introduction Training Measuring Classifier Performance Comparing Classifiers Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall 2010 1

Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error rate of 1 − NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP? Training/validation/test sets 2

Introduction Training Measuring Classifier Performance Comparing Classifiers Training Introduction 1 Training 2 Response Surface Design Cross-Validation & Resampling Measuring Classifier Performance 3 Comparing Classifiers 4 Comparing Two Classifiers Comparing Multiple Classifiers Comparing Over Multiple Datasets 3

Introduction Training Measuring Classifier Performance Comparing Classifiers Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability 4

Introduction Training Measuring Classifier Performance Comparing Classifiers Factors and Response 5

Introduction Training Measuring Classifier Performance Comparing Classifiers Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Select desired response function measuring the desired criteria. 6

Introduction Training Measuring Classifier Performance Comparing Classifiers Response surface design For approximating and maximizing the response function in terms of the controllable factors. 7

Introduction Training Measuring Classifier Performance Comparing Classifiers Resampling and K-Fold Cross-Validation The need for multiple training/validation sets { T i , V i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k sets, X i T 1 = X 2 ∪ X 3 ∪ . . . ∪ X k = X − X 1 V 1 = X 1 V 2 = X 2 T 2 = X 1 ∪ X 3 ∪ . . . ∪ X k = X − X 2 . . . . . . T k = X − X k V k = X k Each pair of T i share k − 2 parts 8

Introduction Training Measuring Classifier Performance Comparing Classifiers 5x2 Cross-Validation Perform 2-Fold Cross-Validation 5 times V 1 = X (1) T 1 = X (1) 1 2 V 1 = X (2) T 1 = X (2) 1 2 V 1 = X (3) T 1 = X (3) 1 2 V 1 = X (4) T 1 = X (4) 1 2 V 1 = X (5) T 1 = X (5) 1 2 using 5 different divisions into half 9

Introduction Training Measuring Classifier Performance Comparing Classifiers Measuring Classifier Performance Predicted Class True Class Yes No Yes TP: true positive FN: false negative No FP: false positive TN: true negative Error rate = # of errors / # of instances = (FN + FP) / N Recall = # of found positives / # of positives = TP / (TP + FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP + FP) Specificity = TN / (TN + FP) FP / (FP + TN) = 1 − Specificity False alarm rate = 10

Introduction Training Measuring Classifier Performance Comparing Classifiers Receiver Operating Characteristics 11

Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing Classifiers Introduction 1 Training 2 Response Surface Design Cross-Validation & Resampling Measuring Classifier Performance 3 Comparing Classifiers 4 Comparing Two Classifiers Comparing Multiple Classifiers Comparing Over Multiple Datasets 12

Introduction Training Measuring Classifier Performance Comparing Classifiers McNemar’s Test H 0 : µ 0 = µ 1 Single training/validation set e 00 : Number of examples e 01 : Number of examples misclassified by both misclassified by 1 but not by 2 e 10 : Number of examples e 11 : Number of exam- misclassified by 2 but not ples correctly classified by by 1 both Under H 0 we expect e 01 = e 10 ( | e 01 − e 10 | − 1) 2 ∼ χ 2 1 e 01 + e 10 Accept with confidence 1 − α if < χ 2 α, 1 13

Introduction Training Measuring Classifier Performance Comparing Classifiers K-fold Cross-Validated Paired t Test Use K-fold c-v to get K training/validation folds p 1 i , p 2 i : Errors of classifiers 1 and 2 on fold i p i = p 1 i p 2 i : Paired difference on fold i H 0 : p i has mean 0 � K � K i =1 ( p i − m ) 2 i =1 p i s 2 = m = K − 1 K √ m K ∼ t k − 1 s Accept if in ( − t α/ 2 , K − 1 , t α/ 2 , K − 1 ) 14

Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing L > 2 Classifiers Analysis of variance (Anova) H 0 : µ 1 = µ 2 = . . . = µ L Errors of L algorithms on K folds X ij ∼ N ( µ j , σ 2 ) j = 1 , . . . , L i = 1 , . . . , K 15

Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing L > 2 Classifiers (cont.) Anova constructs two estimates of σ 2 ( m j − m ) 2 b ≈ K � L If H 0 is true, σ 2 and SS b σ 2 ∼ χ 2 L − 1 where j =1 L − 1 L � ( m j − m ) 2 SS b = K j =1 i ( X ij − m j ) 2 � � Regardless of the truth of H 0 , σ 2 w ≈ j and L ( K − 1) SS w σ 2 ∼ χ 2 L , K − 1 where ( X ij − m j ) 2 � � SS w = j i σ 2 w ∼ F L − 1 , L ( K − 1) b σ 2 Accept H 0 if < F α, L − 1 , L ( K − 1) 16

Introduction Training Measuring Classifier Performance Comparing Classifiers Comparing Over Multiple Datasets Comparing two algorithms: Sign test : Count how many times A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate Comparing multiple algorithms Kruskal-Wallis test : Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference 17

Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Introduction Training Measuring Classifier Performance Comparing Classifiers Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Class 14 Slides SLIDE what is the designing principle how does designing principle

Experiments on deflection of charged Experiments on deflection of charged Experiments on

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Designing Services for Resilience Experiments: Lessons from Netflix Nora Jones, Senior Chaos

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

More Than Two Factors Designing experiments with two factors extends easily to experiments with

Chapter 8. Experiments Chapter 8. Experiments Experimental Research Experimental Research

Experimental Design and the Search for Quasi-Experiments Department of Government London School

Experiments Philosophy of Economics University of Virginia Matthias Brinkmann Contents 1.

Designing Better Places: Designing Better Places: Hands- H Hands H d d -On Design Training

Designing for Conversational UI Angie T errell Design Director, Big Nerd Ranch Designing for

Object Object- -oriented software oriented software engineering for designing an aerial

Designing for differences Dan Smith 2 Designing for Differences - Goals Be familiar with

Designing Applications that See Designing Applications that See Lecture 2: Human Vision and

Randomization methods Tamuno Alfred, PhD Biostatistician DataCamp Designing and Analyzing

Assessing Professional Development Needs Dial : 877-853-5257 Webinar ID : 963 8545 9574 5

Cross-Domain Semantic Parsing via Paraphrasing Yu Su & Xifeng Yan, EMNLP 2017 presented by

Keeping the Lights On Helping Washington Businesses Grow & Succeed What Is Covered Keep

17 th ANNUAL WESTERN HUD LENDERS CONFERENCE September 7 th 9 th , 2016 Parc 55 A Hilton

Ensembles of Classifiers Larry Holder CSE 6363 Machine Learning Computer Science and

Preserving Organizational Knowledge PINNACLE GROUP The 658s: Cameron Asbell, Stacy Brown, Teryn

Air Force Retraining Program Air Force Retraining Program These slides are intended for those

Lecture 4: Language Model Evaluation and Advanced methods Kai-Wei Chang CS @ University of

Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Introduction Training Measuring Classifier Performance Comparing Classifiers Designing ML Experiments Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Training Measuring Classifier Performance Comparing Classifiers Introduction

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Class 14 Slides SLIDE what is the designing principle how does designing principle

Experiments on deflection of charged Experiments on deflection of charged Experiments on

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Designing Services for Resilience Experiments: Lessons from Netflix Nora Jones, Senior Chaos

ACMS 20340 Statistics for Life Sciences Chapter 8: Designing Experiments Fishers Experiments

More Than Two Factors Designing experiments with two factors extends easily to experiments with

Chapter 8. Experiments Chapter 8. Experiments Experimental Research Experimental Research

Experimental Design and the Search for Quasi-Experiments Department of Government London School

Experiments Philosophy of Economics University of Virginia Matthias Brinkmann Contents 1.

Designing Better Places: Designing Better Places: Hands- H Hands H d d -On Design Training

Designing for Conversational UI Angie T errell Design Director, Big Nerd Ranch Designing for

Object Object- -oriented software oriented software engineering for designing an aerial

Designing for differences Dan Smith 2 Designing for Differences - Goals Be familiar with

Designing Applications that See Designing Applications that See Lecture 2: Human Vision and

Randomization methods Tamuno Alfred, PhD Biostatistician DataCamp Designing and Analyzing

Assessing Professional Development Needs Dial : 877-853-5257 Webinar ID : 963 8545 9574 5

Cross-Domain Semantic Parsing via Paraphrasing Yu Su &amp; Xifeng Yan, EMNLP 2017 presented by

Keeping the Lights On Helping Washington Businesses Grow &amp; Succeed What Is Covered Keep

17 th ANNUAL WESTERN HUD LENDERS CONFERENCE September 7 th 9 th , 2016 Parc 55 A Hilton

Ensembles of Classifiers Larry Holder CSE 6363 Machine Learning Computer Science and

Preserving Organizational Knowledge PINNACLE GROUP The 658s: Cameron Asbell, Stacy Brown, Teryn

Air Force Retraining Program Air Force Retraining Program These slides are intended for those

Lecture 4: Language Model Evaluation and Advanced methods Kai-Wei Chang CS @ University of

Cross-Domain Semantic Parsing via Paraphrasing Yu Su & Xifeng Yan, EMNLP 2017 presented by

Keeping the Lights On Helping Washington Businesses Grow & Succeed What Is Covered Keep