Week 2 Video 3 Diagnostic Metrics Different Methods, Different - PowerPoint PPT Presentation

Week 2 Video 3 Diagnostic Metrics

Different Methods, Different Measures ¨ Today we’ll continue our focus on classifiers ¨ Later this week we’ll discuss regressors ¨ And other methods will get worked in later in the course

Last class ¨ We discussed accuracy and Kappa ¨ Today, we’ll discuss additional metrics for assessing classifier goodness

ROC ¨ Receiver-Operating Characteristic Curve

ROC ¨ You are predicting something which has two values ¤ Correct/Incorrect ¤ Gaming the System/not Gaming the System ¤ Dropout/Not Dropout

ROC ¨ Your prediction model outputs a probability or other real value ¨ How good is your prediction model?

Example PREDICTION TRUTH 0.1 0 0.7 1 0.44 0 0.4 0 0.8 1 0.55 0 0.2 0 0.1 0 0.09 0 0.19 0 0.51 1 0.14 0 0.95 1 0.3 0

ROC ¨ Take any number and use it as a cut-off ¨ Some number of predictions (maybe 0) will then be classified as 1’s ¨ The rest (maybe 0) will be classified as 0’s

Threshold = 0.5 PREDICTION TRUTH 0.1 0 0.7 1 0.44 0 0.4 0 0.8 1 0.55 0 0.2 0 0.1 0 0.09 0 0.19 0 0.51 1 0.14 0 0.95 1 0.3 0

Threshold = 0.6 PREDICTION TRUTH 0.1 0 0.7 1 0.44 0 0.4 0 0.8 1 0.55 0 0.2 0 0.1 0 0.09 0 0.19 0 0.51 1 0.14 0 0.95 1 0.3 0

Four possibilities ¨ True positive ¨ False positive ¨ True negative ¨ False negative

Threshold = 0.6 PREDICTION TRUTH 0.1 0 TRUE NEGATIVE 0.7 1 TRUE POSITIVE 0.44 0 TRUE NEGATIVE 0.4 0 TRUE NEGATIVE 0.8 1 TRUE POSITIVE 0.55 0 TRUE NEGATIVE 0.2 0 TRUE NEGATIVE 0.1 0 TRUE NEGATIVE 0.09 0 TRUE NEGATIVE 0.19 0 TRUE NEGATIVE 0.51 1 FALSE NEGATIVE 0.14 0 TRUE NEGATIVE 0.95 1 TRUE POSITIVE 0.3 0 TRUE NEGATIVE

Threshold = 0.5 PREDICTION TRUTH 0.1 0 TRUE NEGATIVE 0.7 1 TRUE POSITIVE 0.44 0 TRUE NEGATIVE 0.4 0 TRUE NEGATIVE 0.8 1 TRUE POSITIVE 0.55 0 FALSE POSITIVE 0.2 0 TRUE NEGATIVE 0.1 0 TRUE NEGATIVE 0.09 0 TRUE NEGATIVE 0.19 0 TRUE NEGATIVE 0.51 1 TRUE POSITIVE 0.14 0 TRUE NEGATIVE 0.95 1 TRUE POSITIVE 0.3 0 TRUE NEGATIVE

Threshold = 0.99 PREDICTION TRUTH 0.1 0 TRUE NEGATIVE 0.7 1 FALSE NEGATIVE 0.44 0 TRUE NEGATIVE 0.4 0 TRUE NEGATIVE 0.8 1 FALSE NEGATIVE 0.55 0 TRUE NEGATIVE 0.2 0 TRUE NEGATIVE 0.1 0 TRUE NEGATIVE 0.09 0 TRUE NEGATIVE 0.19 0 TRUE NEGATIVE 0.51 1 FALSE NEGATIVE 0.14 0 TRUE NEGATIVE 0.95 1 FALSE NEGATIVE 0.3 0 TRUE NEGATIVE

ROC curve ¨ X axis = Percent false positives (versus true negatives) ¤ False positives to the right ¨ Y axis = Percent true positives (versus false negatives) ¤ True positives going up

Example

Is this a good model or a bad model?

Chance model

Good model (but note stair steps)

Poor model

So bad it’s good

AUC ROC ¨ Also called AUC, or A’ ¨ The area under the ROC curve

AUC ¨ Is mathematically equivalent to the Wilcoxon statistic (Hanley & McNeil, 1982) ¤ The probability that if the model is given an example from each category, it will accurately identify which is which

AUC ¨ Equivalence to Wilcoxon is useful ¨ It means that you can compute statistical tests for ¤ Whether two AUC values are significantly different n Same data set or different data sets! ¤ Whether an AUC value is significantly different than chance

Notes ¨ Not really a good way to compute AUC for 3 or more categories ¤ There are methods, but the semantics change somewhat

Comparing Two Models ( ANY two models) #$% & − #$% ( ! = )*(#$% & ) ( +)*(#$% ( ) (

Comparing Model to Chance #$% & − 0.5 ! = +,(#$% & ) / +0

Equations )*+ 2 − )*+ − )*+ - ) ! " = (% " − 1)( ! . = (% . − 1)(2 ∗ )*+ - 1 + )*+ − )*+ - ) )*+ 1 − )*+ + ! " + ! . 12 )*+ = % " ∗ % .

Complication ¨ This test assumes independence ¨ If you have data for multiple students, you usually should compute AUC and significance for each student and then integrate across students (Baker et al., 2008) ¤ There are reasons why you might not want to compute AUC within-student, for example if there is no intra- student variance (see discussion in Pelanek, 2017) ¤ If you don’t do this, don’t do a statistical test

More Caution ¨ The implementations of AUC remain buggy in many data mining and statistical packages in 2018 ¨ But it works in sci-kit learn ¨ And there is a correct package for r called auctestr ¨ If you use other tools, see my webpage for a command-line and GUI implementation of AUC http://www.upenn.edu/learninganalytics/ryanbaker/edmtools.html

AUC and Kappa

AUC and Kappa ¨ AUC ¤ more difficult to compute ¤ only works for two categories (without complicated extensions) ¤ meaning is invariant across data sets (AUC=0.6 is always better than AUC=0.55) ¤ very easy to interpret statistically

AUC ¨ AUC values are almost always higher than Kappa values ¨ AUC takes confidence into account

Precision and Recall ¨ Precision = TP TP + FP ¨ Recall = TP TP + FN

What do these mean? ¨ Precision = The probability that a data point classified as true is actually true ¨ Recall = The probability that a data point that is actually true is classified as true

Terminology ¨ FP = False Positive = Type 1 error ¨ FN = False Negative = Type 2 error

Still active debate about these metrics ¨ (Jeni et al., 2013) finds evidence that AUC is more robust to skewed distributions than Kappa and also several other metrics ¨ (Dhanani et al., 2014) finds evidence that models selected with RMSE (which we’ll talk about next time) come closer to true parameter values than AUC ¨ (Pelanek, 2017) argues that AUC only pays attention to relative differences between models and that absolute differences matter too

Next lecture ¨ Metrics for regressors

Week 2 Video 3 Diagnostic Metrics Different Methods, Different - PowerPoint PPT Presentation

Week 2 Video 3 Diagnostic Metrics Different Methods, Different Measures Today well continue our focus on classifiers Later this week well discuss regressors And other methods will get worked in later in the course Last class We

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Vermont M nt Marble: A e: Americas s nt Stone Monument Sto Class S s Schedule e Week

Week 1: Christ: The Source of True Happiness Week 2: Happiness, the Gospel and Living Well Week

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

Islands of the Pacific Northwest One or Two Week Cruise Week 1: September 14 th 20 th Week 2:

Menu Day Week 1 Week 2 Week 3 Week 4 Monday +Pork and Apple Casserole or +Meat Loaf or Lamb

www. velpaprojects .com Finishing your property the VELPA way Time plan Week 1 - 4 Week 5 - 8

Case-X Progress Report By: MELRR Engineering Group #3 Weekly Updates Week Week Week Week

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

ECON 950 Winter 2020 Prof. James MacKinnon 10. Performance of Classification Methods For

Learning Methods: Part 2 CS 760@UW-Madison Goals for the last lecture you should understand the

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 1 Instructor: Yizhou Sun

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

CSE543 - Computer and Network Security Module: Intrusion Detection Professor Trent Jaeger

Introduction to Static Analysis for Assurance John Rushby Computer Science Laboratory SRI

Bloom Filters References A. Broder and M. Mitzenmacher, Network applications of Bloom A.

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London