M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 - PowerPoint PPT Presentation

M1 − Apprentissage Mich` ele Sebag − Benoit Barbot LRI − LSV 14 octobre 2013 1

Validation issues 1. What is the result ? 2. My results look good. Are they ? 3. Does my system outperform yours ? 4. How to set up my system ? 2

Validation: Three questions Define a good indicator of quality ◮ Misclassification cost ◮ Area under the ROC curve Computing an estimate thereof ◮ Validation set ◮ Cross-Validation ◮ Leave one out ◮ Bootstrap Compare estimates: Tests and confidence levels 3

Overview Performance indicators Measuring a performance indicator Scalable validation: Bags of little bootstrap 4

Which indicator, which estimate: depends. Settings ◮ Large/few data Data distribution ◮ Dependent/independent examples ◮ balanced/imbalanced classes 5

Performance indicators Binary class ◮ h ∗ the truth ◮ ˆ h the learned hypothesis Confusion matrix ˆ h / h ∗ 1 0 1 a b a+b 0 c d c+d a+c b+d a + b + c + d 6

Performance indicators, 2 ˆ h / h ∗ 1 0 1 a b a+b 0 c d c+d a+c b+d a + b + c + d ◮ Misclassification rate b + c a + b + c + d ◮ Sensitivity (recall), True positive rate (TP) a a + c ◮ Specificity, False negative rate (FN) b b + d a ◮ Precision a + b Note : always compare to random guessing / baseline alg. 7

Performance indicators, 3 The Area under the ROC curve ◮ ROC: Receiver Operating Characteristics ◮ Origin: Signal Processing, Medicine Principle h : X �→ I h ( x ) measures the risk of patient x R h leads to order the examples: + + + − + − + + + + − − − + − − − + − − − − − − − − − − −− 8

Performance indicators, 3 The Area under the ROC curve ◮ ROC: Receiver Operating Characteristics ◮ Origin: Signal Processing, Medicine Principle h : X �→ I h ( x ) measures the risk of patient x R h leads to order the examples: + + + − + − + + + + − − − + − − − + − − − − − − − − − − −− Given a threshold θ , h yields a classifier: Yes iff h ( x ) > θ . + + + − + − + + ++ | − − − + − − − + − − − − − − − − − − −− Here, TP ( θ ) = .8; FN ( θ ) = .1 8

The ROC curve R 2 : M ( θ ) = (1 − TNR , FPR ) θ �→ I Ideal classifier: (0 False negative,1 True positive) Diagonal (True Positive = False negative) ≡ nothing learned. 10

ROC Curve, Properties Properties ROC depicts the trade-off True Positive / False Negative. Standard: misclassification cost (Domingos, KDD 99) Error = # false positive + c × # false negative In a multi-objective perspective, ROC = Pareto front. Best solution: intersection of Pareto front with ∆( − c , − 1) 11

ROC Curve, Properties, foll’d Used to compare learners Bradley 97 multi-objective-like insensitive to imbalanced distributions shows sensitivity to error cost. 12

Area Under the ROC Curve Often used to select a learner Don’t ever do this ! Hand, 09 Sometimes used as learning criterion Mann Whitney Wilcoxon AUC = Pr ( h ( x ) > h ( x ′ ) | y > y ′ ) WHY Rosset, 04 ◮ More stable O ( n 2 ) vs O ( n ) ◮ With a probabilistic interpretation Clemen¸ con et al. 08 HOW ◮ SVM-Ranking Joachims 05; Usunier et al. 08, 09 ◮ Stochastic optimization 13

Validation, principle Desired: performance on further instances WORLD Dataset Further examples h Quality Assumption : Dataset is to World, like Training set is to Dataset. DATASET Training set Test examples h Quality 15

Validation, 2 DATASET Training set Test examples h Learning parameters perf(h) Unbiased Assessment of Learning Algorithms T. Scheffer and R. Herbrich, 97 16

Validation, 2 DATASET Training set Test examples h Learning parameters perf(h) parameter*, h*, perf (h*) Unbiased Assessment of Learning Algorithms T. Scheffer and R. Herbrich, 97 16

Validation, 2 DATASET Training set Test examples h Learning parameters perf(h) parameter*, h*, perf (h*) Validation set True performance Unbiased Assessment of Learning Algorithms T. Scheffer and R. Herbrich, 97 16

Confidence intervals Definition Given a random variable X on I R , a p%-confidence interval is I ⊂ I R such that Pr ( X ∈ I ) > p Binary variable with probability ǫ Probability of r events out of n trials: n ! r !( n − r )! ǫ r (1 − ǫ ) n − r P n ( r ) = ◮ Mean: n ǫ ◮ Variance: σ 2 = n ǫ (1 − ǫ ) Gaussian approximation 1 2 2 πσ 2 exp − 1 x − µ P ( x ) = √ 2 σ 17

Confidence intervals Bounds on (true value, empirical value) for n trials, n > 30 � x n . (1 − ˆ ˆ x n ) Pr ( | ˆ x n − x ∗ | > 1 . 96 ) < . 05 n z ε z .67 1. 1.28 1.64 1.96 2.33 2.58 Table 50 32 20 10 5 2 1 ε 18

Empirical estimates When data abound (MNIST) Training Test Validation Cross validation Fold 1 2 3 N 1 2 Run N N−fold Cross Validation of h Error = Average (error on learned from ) 19

Empirical estimates, foll’d Cross validation → Leave one out Fold 1 2 3 n 1 2 Run n Leave one out Same as N-fold CV, with N = number of examples. Properties Low bias; high variance; underestimate error if data not independent 20

Empirical estimates, foll’d Bootstrap Training set uniform sampling with replacement Test set. rest of examples Dataset Average indicator over all (Training set, Test set) samplings. 21

Beware Multiple hypothesis testing ◮ If you test many hypotheses on the same dataset ◮ one of them will appear confidently true... More ◮ Tutorial slides: http://www.lri.fr/ sebag/Slides/Validation Tutorial 11.pdf ◮ Video and slides (soon): ICML 2012, Videolectures, Tutorial Japkowicz & Shah http://www.mohakshah.com/tutorials/icml2012/ 22

Validation, summary What is the performance criterion ◮ Cost function ◮ Account for class imbalance ◮ Account for data correlations Assessing a result ◮ Compute confidence intervals ◮ Consider baselines ◮ Use a validation set If the result looks too good, don’t believe it 23

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 - PowerPoint PPT Presentation

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 octobre 2013 1 Validation issues 1. What is the result ? 2. My results look good. Are they ? 3. Does my system outperform yours ? 4. How to set up my system ? 2

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV Sept. 2013 1 Where we are

I A O Inference, Apprentissage & Optimisation Head: Michele Sebag Joint INRIA project

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &

T A O Themes Apprentissage & Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal

Representation Learning UCA Deep Learning School - Deep in France Nice 2017 Soufiane Belharbi

Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Sup

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine

Using Prior Wave Information and Paradata: Can They Help to Predict Response Outcomes and Call

False Alarm Reduction for Active Sonars using Deep Learning Architectures Matthias Bu

ROC Analysis for Evaluation of Machine Learning Algorithms Larry Holder School of Electrical

Evaluating Machine Learning Methods: Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

Regularized coherent network analysis pipeline for triggered searches Kazuhiro Hayama Center

Sepsis: Diagnosis and Treatment Henry F. Chambers, MD I have nothing to disclose 1 In theory

Testing and Error Estimation Machine Learning Prof Hans Georg Schaathun Hgskolen i lesund

Statistically-Indistinguishable Ensembles and the Evaluation of Climate Models Corey Dethier

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 - PowerPoint PPT Presentation

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 octobre 2013 1 Validation issues 1. What is the result ? 2. My results look good. Are they ? 3. Does my system outperform yours ? 4. How to set up my system ? 2

A &amp; O Apprentissage &amp; Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage &amp; Optimization

Master Recherche IAC Apprentissage Statistique, Optimisation &amp; Applications Anne Auger

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV Sept. 2013 1 Where we are

I A O Inference, Apprentissage &amp; Optimisation Head: Michele Sebag Joint INRIA project

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &amp;

T A O Themes Apprentissage &amp; Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal

Representation Learning UCA Deep Learning School - Deep in France Nice 2017 Soufiane Belharbi

Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Sup

Research in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine Learning

Software development in AppStat B. K egl / AppStat 1 AppStat: Applied Statistics and Machine

Using Prior Wave Information and Paradata: Can They Help to Predict Response Outcomes and Call

False Alarm Reduction for Active Sonars using Deep Learning Architectures Matthias Bu

ROC Analysis for Evaluation of Machine Learning Algorithms Larry Holder School of Electrical

Evaluating Machine Learning Methods: Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

Regularized coherent network analysis pipeline for triggered searches Kazuhiro Hayama Center

Sepsis: Diagnosis and Treatment Henry F. Chambers, MD I have nothing to disclose 1 In theory

Testing and Error Estimation Machine Learning Prof Hans Georg Schaathun Hgskolen i lesund

Statistically-Indistinguishable Ensembles and the Evaluation of Climate Models Corey Dethier

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger

I A O Inference, Apprentissage & Optimisation Head: Michele Sebag Joint INRIA project

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &

T A O Themes Apprentissage & Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA