Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, - PowerPoint PPT Presentation

Week 2 Video 2 Diagnostic Metrics, Part 1

Different Methods, Different Measures ¨ Today we’ll focus on metrics for classifiers ¨ Later this week we’ll discuss metrics for regressors ¨ And metrics for other methods will be discussed later in the course

Metrics for Classifiers

Accuracy

Accuracy ¨ One of the easiest measures of model goodness is accuracy ¨ Also called agreement , when measuring inter-rater reliability # of agreements total number of codes/assessments

Accuracy ¨ There is general agreement across fields that accuracy is not a good metric

Accuracy ¨ Let’s say that my new Kindergarten Failure Detector achieves 92% accuracy ¨ Good, right?

Non-even assignment to categories ¨ Accuracy does poorly when there is non-even assignment to categories ¤ Which is almost always the case ¨ Imagine an extreme case ¤ 92% of students pass Kindergarten ¤ My detector always says PASS ¨ Accuracy of 92% ¨ But essentially no information

Kappa (Agreement – Expected Agreement) (1 – Expected Agreement)

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is the percent agreement?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is the percent agreement? • 80%

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is Data’s expected frequency for on-task?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is Data’s expected frequency for on-task? • 75%

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is Detector’s expected frequency for on-task?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is Detector’s expected frequency for on-task? • 65%

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is the expected on-task agreement?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 On-Task • What is the expected on-task agreement? • 0.65*0.75= 0.4875

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 (48.75) On-Task • What is the expected on-task agreement? • 0.65*0.75= 0.4875

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 (48.75) On-Task • What are Data and Detector’s expected frequencies for off-task behavior?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 (48.75) On-Task • What are Data and Detector’s expected frequencies for off- task behavior? • 25% and 35%

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 (48.75) On-Task • What is the expected off-task agreement?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 5 Off-Task Data 15 60 (48.75) On-Task • What is the expected off-task agreement? • 0.25*0.35= 0.0875

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 (8.75) 5 Off-Task Data 15 60 (48.75) On-Task • What is the expected off-task agreement? • 0.25*0.35= 0.0875

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 (8.75) 5 Off-Task Data 15 60 (48.75) On-Task • What is the total expected agreement?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 (8.75) 5 Off-Task Data 15 60 (48.75) On-Task • What is the total expected agreement? • 0.4875+0.0875 = 0.575

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 (8.75) 5 Off-Task Data 15 60 (48.75) On-Task • What is kappa?

Computing Kappa (Simple 2x2 example) Detector Detector Off-Task On-Task Data 20 (8.75) 5 Off-Task Data 15 60 (48.75) On-Task • What is kappa? • (0.8 – 0.575) / (1-0.575) • 0.225/0.425 • 0.529

So is that any good? Detector Detector Off-Task On-Task Data 20 (8.75) 5 Off-Task Data 15 60 (48.75) On-Task • What is kappa? • (0.8 – 0.575) / (1-0.575) • 0.225/0.425 • 0.529

Interpreting Kappa ¨ Kappa = 0 ¤ Agreement is at chance ¨ Kappa = 1 ¤ Agreement is perfect ¨ Kappa = -1 ¤ Agreement is perfectly inverse ¨ Kappa > 1 ¤ You messed up somewhere

Kappa<0 ¨ This means your model is worse than chance ¨ Very rare to see unless you’re using cross-validation ¨ Seen more commonly if you’re using cross-validation ¤ It means your model is junk

0<Kappa<1 ¨ What’s a good Kappa? ¨ There is no absolute standard

0<Kappa<1 ¨ For data mined models, ¤ Typically 0.3-0.5 is considered good enough to call the model better than chance and publishable ¤ In affective computing, lower is still often OK

Why is there no standard? ¨ Because Kappa is scaled by the proportion of each category ¨ When one class is much more prevalent ¤ Expected agreement is higher than ¨ If classes are evenly balanced

Because of this… ¨ Comparing Kappa values between two data sets, in a principled fashion, is highly difficult ¤ It is OK to compare two Kappas, in the same data set, that have at least one variable in common ¨ A lot of work went into statistical methods for comparing Kappa values in the 1990s ¨ No real consensus ¨ Informally, you can compare two data sets if the proportions of each category are “similar”

Quiz Detector Detector Insult during No Insult during Collaboration Collaboration Data 16 7 Insult Data 8 19 No Insult • What is kappa? A: 0.645 B: 0.502 C: 0.700 D: 0.398

Quiz Detector Detector Academic Suspension No Academic Suspension Data 1 2 Suspension Data 4 141 No Suspension • What is kappa? A: 0.240 B: 0.947 C: 0.959 D: 0.007

Next lecture ¨ ROC curves ¨ A’ ¨ Precision ¨ Recall

Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, - PowerPoint PPT Presentation

Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, Different Measures Today well focus on metrics for classifiers Later this week well discuss metrics for regressors And metrics for other methods will be discussed later

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Vermont M nt Marble: A e: Americas s nt Stone Monument Sto Class S s Schedule e Week

Week 1: Christ: The Source of True Happiness Week 2: Happiness, the Gospel and Living Well Week

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

Islands of the Pacific Northwest One or Two Week Cruise Week 1: September 14 th 20 th Week 2:

Menu Day Week 1 Week 2 Week 3 Week 4 Monday +Pork and Apple Casserole or +Meat Loaf or Lamb

www. velpaprojects .com Finishing your property the VELPA way Time plan Week 1 - 4 Week 5 - 8

Case-X Progress Report By: MELRR Engineering Group #3 Weekly Updates Week Week Week Week

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

Translating BNGL models into Kappa - Our experience Kim Quyn L DI - NS August 29, 2017

S I G M A K A P P A Organization and Chapter History Sigma Kappa sorority was founded in 1874

Nawarat Patanakarn Pcl. Stock Exchange of Thailand Opportunity Day 8 September 2011 Current

Nathanael Sargent, Jazmine Woodson and Thomas Cardenas The Student Loan Crisis By Nathanael

NSM2006 Nonstandard Methods Congress, Pisa May 25-31, 2006. June 6, 2006 Salma Kuhlmann 1

S u r v e y s in marketing research SU R VE Y AN D ME ASU R E ME N T D E VE L OP ME N T IN R

SBFM12 Formal model reduction Jrme Feret Laboratoire dInformatique de lcole

Discovering Information Explaining API Types Using Text Course Instructor: Classification