introduction to machine learning evaluation simple
play

Introduction to Machine Learning Evaluation: Simple Measures for - PowerPoint PPT Presentation

Introduction to Machine Learning Evaluation: Simple Measures for Classification Learning goals Know the definitions of misclassification error rate (MCE) and accuracy (ACC) Understand the entries of a confusion matrix Understand the idea of


  1. Introduction to Machine Learning Evaluation: Simple Measures for Classification Learning goals Know the definitions of misclassification error rate (MCE) and accuracy (ACC) Understand the entries of a confusion matrix Understand the idea of costs Know defintions of Brier score and log loss

  2. LABELS VS PROBABILITIES In classification we predict: Class labels → ˆ h ( x ) = ˆ y 1 Class probabilities → ˆ π k ( x ) 2 → We evaluate based on those � c Introduction to Machine Learning – 1 / 9

  3. LABELS: MCE The misclassification error rate (MCE) counts the number of incorrect predictions and presents them as a rate: n MCE = 1 [ y ( i ) � = ˆ � y ( i ) ] ∈ [ 0 ; 1 ] n i = 1 Accuracy is defined in a similar fashion for correct classifications: n ACC = 1 [ y ( i ) = ˆ � y ( i ) ] ∈ [ 0 ; 1 ] n i = 1 If the data set is small this can be brittle The MCE says nothing about how good/skewed predicted probabilities are Errors on all classes are weighed equally (often inappropriate) � c Introduction to Machine Learning – 2 / 9

  4. LABELS: CONFUSION MATRIX True classes in columns. Predicted classes in rows. setosa versicolor virginica -err.- -n- setosa 50 0 0 0 50 versicolor 0 46 4 4 50 virginica 0 4 46 4 50 -err.- 0 4 4 8 NA -n- 50 50 50 NA 150 We can see class sizes (predicted and true) and where errors occur. � c Introduction to Machine Learning – 3 / 9

  5. LABELS: CONFUSION MATRIX In binary classification True Class y + − + Pred. True Positive False Positive (TP) (FP) ˆ − y False Negative True Negative (FN) (TN) � c Introduction to Machine Learning – 4 / 9

  6. LABELS: COSTS We can also assign different costs to different errors via a cost matrix. n Costs = 1 � C [ y ( i ) , ˆ y ( i ) ] n i = 1 Example: Predict if person has a ticket (yes / no). Should train conductor check ticket of a person? Costs: Ticket checking: 3 EUR Fee for fare-dodging: 40 EUR http: //www.oslobilder.no/OMU/OB.%C3%9864/2902 � c Introduction to Machine Learning – 5 / 9

  7. LABELS: COSTS Predict if person has a ticket (yes / no). Costs: Cost matrix C Ticket checking: 3 EUR predicted Fee for fare-dodging: 40 EUR true no yes no -37 0 yes 3 0 Our model says that we should not trust anyone and check the tickets of all Confusion matrix passengers. predicted true no yes no 7 0 yes 93 0 n Costs = 1 � C [ y ( i ) , ˆ y ( i ) ] n Confusion matrix * C i = 1 predicted 1 true no yes = 100 ( − 37 · 7 + 0 · 0 + 3 · 93 + 0 · 0 ) no -259 0 yes 279 0 = 20 100 = 0 . 2 � c Introduction to Machine Learning – 6 / 9

  8. PROBABILITIES: BRIER SCORE Measures squared distances of probabilities from the true class labels: n BS 1 = 1 π ( x ( i ) ) − y ( i ) � 2 � � ˆ n i = 1 Fancy name for MSE on probabilities Usual definition for binary case, y ( i ) must be coded as 0 and 1. wrong wrong 1.00 0.75 true.label BS 0.50 0 1 0.25 right right 0.00 0.00 0.25 0.50 0.75 1.00 ^ ( x ) π � c Introduction to Machine Learning – 7 / 9

  9. PROBABILITIES: BRIER SCORE g n BS 2 = 1 � 2 � π k ( x ( i ) ) − o ( i ) � � ˆ k n i = 1 k = 1 Original by Brier, works also for multiple classes o ( i ) = [ y ( i ) = k ] is a 0-1-one-hot coding for labels k For the binary case, BS2 is twice as large as BS1, because in BS2 we sum the squared difference for each observation regarding class 0 and class 1, not only the true class. � c Introduction to Machine Learning – 8 / 9

  10. PROBABILITIES: LOG-LOSS Logistic regression loss function, a.k.a. Bernoulli or binomial loss, y ( i ) coded as 0 and 1. n LL = 1 � − y ( i ) log(ˆ � � π ( x ( i ) )) − ( 1 − y ( i ) ) log( 1 − ˆ π ( x ( i ) )) n i = 1 4 true.label 3 LL 0 2 wrong wrong 1 1 right right 0 0.00 0.25 0.50 0.75 1.00 ^ ( x ) π Optimal value is 0, “confidently wrong” is penalized heavily g n o ( i ) Multiclass version: LL = − 1 π k ( x ( i ) )) � � k log(ˆ n i = 1 k = 1 � c Introduction to Machine Learning – 9 / 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend