Quantitative Evaluation Adapted in part from: - PDF document

Quantitative Evaluation Adapted in part from: http://www.cs.cornell.edu/Courses/cs578/2003fa/performance_measures.pdf

Accuracy • Target: 0/1, -1/+1, True/False, … • Prediction = f(inputs) = f(x): 0/1 or Real • Threshold: f(x) > thresh => 1, else => 0 • threshold(f(x)): 0/1 1 - ( target i - threshold ( f ( r 2 Â ( ) x i ))) i = 1 K N accuracy = N • #right / #total • p(“ correct” ): p(threshold(f(x)) = target) 3

Confusion Matrix Predicted 1 Predicted 0 correct True 0 True 1 a b c d incorrect threshold accuracy = (a+d) / (a+b+c+d) 4

Predicted 1 Predicted 0 Predicted 1 Predicted 0 True 0 True 1 True 0 True 1 true false FN TP positive negative false true FP TN positive negative Predicted 1 Predicted 0 Predicted 1 Predicted 0 True 0 True 1 True 0 True 1 misses P(pr0|tr1) hits P(pr1|tr1) false correct P(pr1|tr0) P(pr0|tr0) alarms rejections 23

Problems with Accuracy • Assumes equal cost for both kinds of errors – cost(b-type-error) = cost (c-type-error) • is 99% accuracy good? – can be excellent, good, mediocre, poor, terrible – depends on problem • is 10% accuracy bad? – information retrieval • BaseRate = accuracy of predicting predominant class (on most problems obtaining BaseRate accuracy is easy) 8

Precision and Recall • typically used in document retrieval • Precision: – how many of the returned documents are correct – precision(threshold) • Recall: – how many of the positives does the model return – recall(threshold) • Precision/Recall Curve: sweep thresholds 17

Precision/Recall Predicted 1 Predicted 0 True 0 True 1 a b PRECISION = a /( a + c ) RECALL = a /( a + b ) c d threshold 18

Summary Stats: F & BreakEvenPt PRECISION = a /( a + c ) harmonic average of RECALL = a /( a + b ) precision and recall F = 2 * ( PRECISION ¥ RECALL ) ( PRECISION + RECALL ) BreakEvenPo int = PRECISION = RECALL 20

better performance worse performance 21

ROC Plot and ROC Area • Receiver Operator Characteristic • Developed in WWII to statistically model false positive and false negative detections of radar operators • Better statistical foundations than most other measures • Standard measure in medicine and biology • Becoming more popular in ML 24

ROC Plot • Sweep threshold and plot – TPR vs. FPR – Sensitivity vs. 1-Specificity – P(true|true) vs. P(true|false) • Sensitivity = a/(a+b) = Recall = LIFT numerator • 1 - Specificity = 1 - d/(c+d) 25

diagonal line is random prediction 26

Properties of ROC • ROC Area: – 1.0: perfect prediction – 0.9: excellent prediction – 0.8: good prediction – 0.7: mediocre prediction – 0.6: poor prediction – 0.5: random prediction – <0.5: something wrong! 27

Properties of ROC • Slope is non-increasing • Each point on ROC represents different tradeoff (cost ratio) between false positives and false negatives • Slope of line tangent to curve defines the cost ratio • ROC Area represents performance averaged over all possible cost ratios • If two ROC curves do not intersect, one method dominates the other • If two ROC curves intersect, one method is better for some cost ratios, and other method is better for other cost ratios 28

Lift • not interested in accuracy on entire dataset • want accurate predictions for 5%, 10%, or 20% of dataset • don’t care about remaining 95%, 90%, 80%, resp. • typical application: marketing lift ( threshold ) = % positives > threshold % dataset > threshold • how much better than random prediction on the fraction of the dataset predicted true (f(x) > threshold) 13

Lift Predicted 1 Predicted 0 True 0 True 1 a b a ( a + b ) lift = ( a + c ) ( a + b + c + d ) c d threshold 14

Visualizing Lift Cumulative Response Lift(c) = CR(c) / c 100 90 Example: Lift(25%)= CR(25%) / 25% 80 = 62% / 25% 70 % respondents = 2.5 60 50 If we send to 25% of our prospects using the model, 40 they are 2.5 times as likely 30 to respond than if we were 20 to select them randomly. 10 0 0 10 20 30 40 50 60 70 80 90 100 % prospects

Computing Profit ● Assume cut-off at some value c ● Let: – T = total number of prospects – H = total number of respondents – n = cost per mailing – p = profit per response ● Then: – Profit( c ) = CR( c ). H . p revenue generated by respondents - c . T . n cost of sending the mailings + (1- c ). T . n saving from not sending mailings - (1-CR( c )). H . p cost of missed revenue

Understanding Profit (I) ● Profit( c ) = 2.CR( c ). H . p – 2. c . T . n + T . n – H . p = 2.[CR( c ). H . p – c . T . n ] – [ H . p – T . n ] ● Since: – 2 is a constant (scaling) – H . p – T . n is a constant (translation) ● Then, – Profit( c ) ~ CR( c ). H . p – c . T . n ● Let – E = H / T response rate – Profit( c ) ~ CR( c ). E . p – c . n

Understanding Profit (II) ● Note that: – Lift( c ) = CR( c )/ c – Lift would be maximum if we could send to only exactly all of the respondents; we would then have c = E (= H / T ) and CR( E ) = 100% – The maximum value for lift is thus: 1/ E ● Returning to profit: – Case 1: p < n ● Profit( c ) < 0 => not viable – Case 2: p = n ● Profit( c )  0 only if Lift( c )  1/ E => impossible – Case 3: p > n ● Profit( c )  0 => OK

Summary • the measure you optimize to makes a difference • the measure you report makes a difference • use measure appropriate for problem/community • accuracy often is not sufficient/appropriate • ROC is gaining popularity in the ML community • only accuracy generalizes to >2 classes! 32

Quantitative Evaluation Adapted in part from: - PDF document

Quantitative Evaluation Adapted in part from: http://www.cs.cornell.edu/Courses/cs578/2003fa/performance_measures.pdf Accuracy Target: 0/1, -1/+1, True/False, Prediction = f(inputs) = f(x): 0/1 or Real Threshold: f(x) >

Quantitative Quantitative Quantitative Quantitative Modal Modal Transition Transition

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Amplia quantitative equity strategy Quant Core Contents 1) Quantitative asset management

Notes on Quantitative UX Research at Google Chris Chapman Quantitative UX Researcher Overview

Quantitative Reasoning + Skills Reasoning (QR): what + why Challenges New Faculty Winter

Welcome to the course! Quantitative Risk Management in R About me Professor in

Quantitative Ethics Victor Piercey Joint Math Meetings 2015 San Antonio, TX Quantitative Reasoning

Grieve 2007: Quantitative Authorship Attribution: An Vocabulary Richness Measures Evaluation of

Quantitative Aggregate Theory Finn E. Kydland Prize Lecture December 8, 2004 Quantitative

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Workload-Driven Architectural Evaluation Evaluation in Uniprocessors Decisions made only after

Impact Evaluation of Takaful and Karama I. Quantitative Component II. Qualitative Component

Evaluation of example tools for hairy tasks. Presenter: Hardik Sahi (20743327) Outline

Lecture #2: Advanced hashing and concentration bounds o Bloom filters o Cuckoo hashing o Load

Extending Binary Linear Classification One-Versus-All Classification (OVA) } In the presence of

Evaluation metrics and model selection Marta Arias Dept. CS, UPC Fall 2018 Quantifying the

Quantitative Text Analysis. Applications to Social Media Research Pablo Barber a London

Average Individual Fairness Aaron Roth Based on Joint Work with: Michael Kearns and Saeed

Data Uncertainty INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder March 13,

COVID-19 and LTC May 28, 2020 Questions and Answer Session Use the QA box in the webinar