Binary Classification Many slides attributable to: Prof. Mike - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 1

Logistics • Waitlist: We have some room, contact me • HW2 due TONIGHT (Wed 2/6 at 11:59pm) • What you submit: PDF and zip • Please annotate pages in Gradescope! • HW3 out later tonight, due a week from today • What you submit: PDF and zip • Please annotate pages in Gradescope! • Next recitation is Mon 2/11 • Practical binary classifiers in Python with sklearn • Numerical issues and how to address them Mike Hughes - Tufts COMP 135 - Spring 2019 2

Objectives : Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting class probabilities • Training • Evaluation • Performance Metrics • A “taste” of 3 Methods • Logistic Regression • K-Nearest Neighbors • Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 3

What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 4

Before: Regression y is a numeric variable Supervised e.g. sales in $$ Learning regression y Unsupervised Learning Reinforcement Learning x Mike Hughes - Tufts COMP 135 - Spring 2019 5

Task: Binary Classification y is a binary variable Supervised (red or blue) Learning binary classification x 2 Unsupervised Learning Reinforcement Learning x 1 Mike Hughes - Tufts COMP 135 - Spring 2019 6

Example: Hotdog or Not Mike Hughes - Tufts COMP 135 - Spring 2019 7

Task: Multi-class Classification y is a discrete variable Supervised (red or blue or green or purple) Learning multi-class classification x 2 Unsupervised Learning Reinforcement Learning x 1 Mike Hughes - Tufts COMP 135 - Spring 2019 8

Classification Example: Swype Many possible letters: Multi-class classification Mike Hughes - Tufts COMP 135 - Spring 2019 9

Binary Prediction Step Goal: Predict label (0 or 1) given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” • Output: y i ∈ { 0 , 1 } Binary label (0 or 1) “responses” “labels” Mike Hughes - Tufts COMP 135 - Spring 2019 10

Binary Prediction Step >>> # Given: pretrained regression object model >>> # Given: 2D array of features x >>> x_NF.shape (N, F) >>> yhat_N = model.predict(x_NF) >>> yhat_N[:5] # peek at predictions [0, 0, 1, 0, 1] >>> yhat_N.shape (N,) Mike Hughes - Tufts COMP 135 - Spring 2019 11

Types of binary predictions TN : true negative FP : false positive FN : false negative TP : true positive 12

Example: Which outcome is this? TN : true negative FP : false positive FN : false negative TP : true positive 13

Example: Which outcome is this? Answer: True Positive TN : true negative FP : false positive FN : false negative TP : true positive 14

Example: Which outcome is this? Answer: True Negative (TN) TN : true negative FP : false positive FN : false negative TP : true positive 16

Example: Which outcome is this? Answer: False Negative (FN) TN : true negative FP : false positive FN : false negative TP : true positive 18

Example: Which outcome is this? Answer: False Positive (FP) TN : true negative FP : false positive FN : false negative TP : true positive 20

Probability Prediction Step Goal: Predict probability p(Y=1) given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” • Output: ˆ Probability between 0 and 1 p i e.g. 0.001, 0.513, 0.987 “probabilities” Mike Hughes - Tufts COMP 135 - Spring 2019 21

Probability Prediction Step >>> # Given: pretrained regression object model >>> # Given: 2D array of features x >>> x_NF.shape (N, F) >>> yproba_N2 = model.predict_proba(x_NF) >>> yproba_N2.shape (N, 2) Column index 1 gives probability of positive label p(Y = 1) >>> yproba_N2[:, 1] [0.003, 0.358, 0.987, 0.111, 0.656] Mike Hughes - Tufts COMP 135 - Spring 2019 22

Thresholding to get Binary Decisions Credit: Wattenberg, Viégas, Hardt Mike Hughes - Tufts COMP 135 - Spring 2019 23 23

Thresholding to get Binary Decisions Credit: Wattenberg, Viégas, Hardt Mike Hughes - Tufts COMP 135 - Spring 2019 24

Thresholding to get Binary Decisions Credit: Wattenberg, Viégas, Hardt Mike Hughes - Tufts COMP 135 - Spring 2019 25

Pair Exercise Interactive Demo: https://research.google.com/bigpicture/attacking- discrimination-in-ml/ Loan and pay back: +$300 Loan and not pay back: -$700 Goals: • What threshold maximizes accuracy? • What threshold maximizes profit? • What needs to be true of costs so threshold is the same for profit and accuracy? Mike Hughes - Tufts COMP 135 - Spring 2019 26

Classifier: Training Step Goal: Given a labeled dataset, learn a function that can perform prediction well • Input: Pairs of features and labels/responses { x n , y n } N n =1 y ( · ) : R F → { 0 , 1 } ˆ • Output: Useful to break into two steps: 1) Produce probabilities in [0, 1] OR real-valued scores 2) Threshold to make binary decisions Mike Hughes - Tufts COMP 135 - Spring 2019 27

Classifier: Training Step >>> # Given: 2D array of features x >>> # Given: 1D array of binary labels y >>> y_N.shape (N,) >>> x_NF.shape (N, F) >>> model = BinaryClassifier() >>> model.fit(x_NF, y_N) >>> # Now can call predict or predict_proba Mike Hughes - Tufts COMP 135 - Spring 2019 28

Classifier: Evaluation Step Goal: Assess quality of predictions Many ways in practice: 1) Evaluate probabilities / scores directly logistic loss, hinge loss, … 2) Evaluate binary decisions at specific threshold accuracy, TPR, TNR, PPV, NPV, etc. 3) Evaluate across range of thresholds ROC curve, Precision-Recall curve Mike Hughes - Tufts COMP 135 - Spring 2019 29

Metric: Confusion Matrix Counting mistakes in binary predictions #TN : num. true negative #TP : num. true positive #FN : num. false negative #FP : num. false positive #TP #FP #FN #TP 30

Metric: Accuracy accuracy = fraction of correct predictions TP + TN = TP + TN + FN + FP Potential problem: Suppose your dataset has 1 positive example and 99 negative examples What is the accuracy of the classifier that always predicts ”negative”? Mike Hughes - Tufts COMP 135 - Spring 2019 31

Metric: Accuracy accuracy = fraction of correct predictions TP + TN = TP + TN + FN + FP Potential problem: Suppose your dataset has 1 positive example and 99 negative examples What is the accuracy of the classifier that always predicts ”negative”? 99%! Mike Hughes - Tufts COMP 135 - Spring 2019 32

Metrics for Binary Decisions “sensitivity”, “recall” “specificity”, 1 - FPR “precision” Emphasize the metrics appropriate for your application. 33

Goal: App to classify cats vs. dogs from images Which metric might be most important? Could we just use accuracy? Mike Hughes - Tufts COMP 135 - Spring 2019 34

Goal: Classifier to find relevant tweets to list on website Which metric might be most important? Could we just use accuracy? Mike Hughes - Tufts COMP 135 - Spring 2019 35

Goal: Detector for tumors based on medical image Which metric might be most important? Could we just use accuracy? Mike Hughes - Tufts COMP 135 - Spring 2019 36

ROC Curve (across thresholds) perfect Specific thresh TPR (sensitivity) random guess FPR (1 – specificity) 37

Area under ROC curve (aka AUROC or AUC or “C statistic”) Area varies from 0.0 – 1.0. 0.5 is random guess. 1.0 is perfect. TPR Graphical: (sensitivity) FPR Probabilistic: (1 – specificity) AUROC , Pr(ˆ y ( x i ) > ˆ y ( x j ) | y i = 1 , y j = 0) For random pair of examples, one positive and one negative, What is probability classifier will rank positive one higher? 38

Precision-Recall Curve precision recall (= TPR) Mike Hughes - Tufts COMP 135 - Spring 2019 39

AUROC not always best choice AUROC: red is better Blue much better for alarm fatigue 40

Classifier: Evaluation Metrics https://scikit-learn.org/stable/modules/model_evaluation.html 1) To evaluate predicted scores / probabilities 2) To evaluate specific binary decisions 3) To make ROC or PR curves and compute areas Mike Hughes - Tufts COMP 135 - Spring 2019 41

Binary Classification Many slides attributable to: Prof. Mike - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Trees Linear Vs non-linear data structures Types of binary trees Binary tree traversals

Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree

The Power of Binary 0, 1, 10, 11, 100, 101, 110, 111... What is Binary? a binary number

Binary Trees, Heaps Binary Trees, Heaps K08

Mine Methane Capture Protocol 2 nd Stakeholder Meeting Ontario & Quebec Adaptation March 17,

Task 6.0 Heat Transfer Coefficient Measurements 11/05/2019 Task PI, Professor MAE &

Cleanroom Design and Operations Philip J Denny USPAS Course: SRF Technology: Cleanroom Design

Edison Electric Institute Financial Conference November 12 13, 2014 Cautionary Statements

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

POST: A Secure, Resilient, Cooperative Messaging System A. Mislove, A. Post, C. Reis, P.

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University

Education post-2015 Outline: 1. Purpose of these slides 2. Key learning from the current MDGs

Binary Classification Many slides attributable to: Prof. Mike - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Trees Linear Vs non-linear data structures Types of binary trees Binary tree traversals

Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree

The Power of Binary 0, 1, 10, 11, 100, 101, 110, 111... What is Binary? a binary number

Binary Trees, Heaps Binary Trees, Heaps K08

Mine Methane Capture Protocol 2 nd Stakeholder Meeting Ontario &amp; Quebec Adaptation March 17,

Task 6.0 Heat Transfer Coefficient Measurements 11/05/2019 Task PI, Professor MAE &amp;

Cleanroom Design and Operations Philip J Denny USPAS Course: SRF Technology: Cleanroom Design

Edison Electric Institute Financial Conference November 12 13, 2014 Cautionary Statements

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

POST: A Secure, Resilient, Cooperative Messaging System A. Mislove, A. Post, C. Reis, P.

Easy Freshness with Pequod Cache Joins Bryan Kate, Eddie Kohler, Mike Kester Harvard University

Education post-2015 Outline: 1. Purpose of these slides 2. Key learning from the current MDGs

Mine Methane Capture Protocol 2 nd Stakeholder Meeting Ontario & Quebec Adaptation March 17,

Task 6.0 Heat Transfer Coefficient Measurements 11/05/2019 Task PI, Professor MAE &