Binary Classification Many slides attributable to: Prof. Mike - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 2

Today’s objectives (day 07) Binary Classification Basics • 3 steps of a classification task • Prediction • Predicting probabilities of each binary class • Making hard binary decisions • Training • Evaluation (much more in next class) • A “taste” of 2 Methods • Logistic Regression • K-Nearest Neighbors Mike Hughes - Tufts COMP 135 - Fall 2020 3

What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Fall 2020 4

Before: Regression y is a numeric variable Supervised e.g. sales in $$ Learning regression y Unsupervised Learning Reinforcement Learning x Mike Hughes - Tufts COMP 135 - Fall 2020 5

Task: Binary Classification y is a binary variable Supervised (red or blue) Learning binary classification x 2 Unsupervised Learning Reinforcement Learning x 1 Mike Hughes - Tufts COMP 135 - Fall 2020 6

Example: Hotdog or Not https://www.theverge.com/tldr/2017/5/14/15639784/hbo- silicon-valley-not-hotdog-app-download Mike Hughes - Tufts COMP 135 - Fall 2020 7

Task: Multi-class Classification y is a discrete variable Supervised (red or blue or green or purple) Learning multi-class classification x 2 Unsupervised Learning Reinforcement Learning x 1 Mike Hughes - Tufts COMP 135 - Fall 2020 8

Binary Prediction Step Goal: Predict label (0 or 1) given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” • Output: y i ∈ { 0 , 1 } Binary label (0 or 1) “responses” “labels” Mike Hughes - Tufts COMP 135 - Fall 2020 9

Binary Prediction Step >>> # Given: pretrained binary classifier model >>> # Given: 2D array of features x_NF >>> x_NF.shape (N, F) >>> yhat_N = model. predict (x_NF) >>> yhat_N[:5] # peek at predictions [0, 0, 1, 0, 1] >>> yhat_N.shape (N,) Mike Hughes - Tufts COMP 135 - Fall 2020 10

Types of binary predictions TN : true negative FP : false positive FN : false negative TP : true positive 11

Probability Prediction Step Goal: Predict probability of event y=1 given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” • Output: ˆ Probability between 0 and 1 p i e.g. 0.001, 0.513, 0.987 “probabilities” Mike Hughes - Tufts COMP 135 - Fall 2020 12

Probability Prediction Step >>> # Given: pretrained regression object model >>> # Given: 2D array of features x_NF >>> x_NF.shape (N, F) >>> yproba_N2 = model.predict_proba(x_NF) >>> yproba_N2.shape Column index 1 gives (N, 2) probability of positive label given input features >>> yproba_N2[:, 1] p(Y = 1 | X) [0.003, 0.358, 0.987, 0.111, 0.656] Mike Hughes - Tufts COMP 135 - Fall 2020 13

Thresholding to get Binary Decisions Credit: Wattenberg, Viégas, Hardt Mike Hughes - Tufts COMP 135 - Fall 2020 14 14

Thresholding to get Binary Decisions Credit: Wattenberg, Viégas, Hardt Mike Hughes - Tufts COMP 135 - Fall 2020 15

Thresholding to get Binary Decisions Credit: Wattenberg, Viégas, Hardt Mike Hughes - Tufts COMP 135 - Fall 2020 16

Classifier: Training Step Goal: Given a labeled dataset, learn a function that can perform (probabilistic) prediction well • Input: Pairs of features and labels/responses { x n , y n } N n =1 y ( · ) : R F → { 0 , 1 } ˆ • Output: Useful to break into two steps: 1) Produce real-valued scores OR probabilities in [0, 1] 2) Threshold to make binary decisions Mike Hughes - Tufts COMP 135 - Fall 2020 17

Classifier: Training Step >>> # Given: 2D array of features x_NF >>> # Given: 1D array of binary labels y_N >>> y_N.shape (N,) >>> x_NF.shape (N, F) >>> model = BinaryClassifier() >>> model.fit(x_NF, y_N) >>> # Now can call predict or predict_proba Mike Hughes - Tufts COMP 135 - Fall 2020 18

Classifier: Evaluation Step Goal: Assess quality of predictions Many ways in practice: 1) Evaluate probabilities / scores directly cross entropy loss (aka log loss), hinge loss, … 2) Evaluate binary decisions at specific threshold accuracy, TPR, TNR, PPV, NPV, etc. 3) Evaluate across range of thresholds ROC curve, Precision-Recall curve Mike Hughes - Tufts COMP 135 - Fall 2020 19

Metric: Confusion Matrix Counting mistakes in binary predictions #TN : num. true negative #TP : num. true positive #FN : num. false negative #FP : num. false positive #TN #FP #FN #TP 20

Metric: Accuracy accuracy = fraction of correct predictions TP + TN = TP + TN + FN + FP Potential problem: Suppose your dataset has 1 positive example and 99 negative examples What is the accuracy of the classifier that always predicts ”negative”? Mike Hughes - Tufts COMP 135 - Fall 2020 21

Metric: Accuracy accuracy = fraction of correct predictions TP + TN = TP + TN + FN + FP Potential problem: Suppose your dataset has 1 positive example and 99 negative examples What is the accuracy of the classifier that always predicts ”negative”? 99%! Mike Hughes - Tufts COMP 135 - Fall 2020 22

Objectives : Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting class probabilities • Training • Evaluation • A “taste” of 2 Methods • Logistic Regression • K-Nearest Neighbors Mike Hughes - Tufts COMP 135 - Fall 2020 23

Logistic Sigmoid Function Goal: Transform real values into probabilities probability 1 sigmoid( z ) = 1 + e − z Mike Hughes - Tufts COMP 135 - Fall 2020 24

Logistic Regression Parameters: w = [ w 1 , w 2 , . . . w f . . . w F ] weight vector b bias scalar Prediction: 0 1 F X p ( x i , w, b ) = ˆ p ( y i = 1 | x i ) , sigmoid w f x if + b @ A f =1 Training: find weights and bias that minimize “loss” Mike Hughes - Tufts COMP 135 - Fall 2020 25

Measuring prediction quality for a probabilistic classifier Use the log loss (aka “binary cross entropy”) from sklearn.metrics import log_loss log loss( y, ˆ p ) = − y log ˆ p − (1 − y ) log(1 − ˆ p ) Advantages: • smooth • easy to take derivatives! Mike Hughes - Tufts COMP 135 - Fall 2020 26

Logistic Regression: Training Optimization: Minimize total log loss on train set N X min log loss( y n , ˆ p ( x n , w, b )) w,b n =1 Algorithm: Gradient descent Avoid overfitting: Use L2 or L1 penalty on weights Much more in depth in next class Mike Hughes - Tufts COMP 135 - Fall 2020 27

Visualizing predicted probas for Logistic Regression Mike Hughes - Tufts COMP 135 - Fall 2020 28

Visualizing predicted probas for Logistic Regression Mike Hughes - Tufts COMP 135 - Fall 2020 29

Nearest Neighbor Classifier Parameters: none Prediction: - find “nearest” training vector to given input x - predict y value of this neighbor Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Fall 2020 30

K nearest neighbor classifier Parameters: K : number of neighbors Prediction: - find K “nearest” training vectors to input x - predict: vote most common y in neighborhood - predict_proba: report fraction of labels Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Fall 2020 31

Visualizing predicted probas for K-Nearest Neighbors Mike Hughes - Tufts COMP 135 - Fall 2020 32

Visualizing predicted probas for K-Nearest Neighbors Mike Hughes - Tufts COMP 135 - Fall 2020 33

Summary of Methods Function class Hyperparameters to Interpret? flexibility select (control complexity) Logistic Linear L2/L1 penalty on weights Inspect Regression weights K Nearest Piecewise constant Number of Neighbors Inspect Neighbors Distance metric neighbors Classifier Mike Hughes - Tufts COMP 135 - Fall 2020 34

Binary Classification Many slides attributable to: Prof. Mike - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Trees Linear Vs non-linear data structures Types of binary trees Binary tree traversals

Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree

The Power of Binary 0, 1, 10, 11, 100, 101, 110, 111... What is Binary? a binary number

Binary Trees, Heaps Binary Trees, Heaps K08

An introduction to R: Organisation and Basics of Algorithmics emie Becker, Sonja Grath & Dirk

Introduction to Computer Science I Variables, Primitive Data Types, Expressions Janyl Jumadinova

LECTURE 1 INTRO & NUMBER SYSTEMS MCS 260 Fall 2020 David Dumas / ZOOM 101 Scale the

CS 241 Data Organization Binary January 30, 2018 Combinations and Permutations In English we

Chapter 27 Entropy, Randomness, and Information CS 573: Algorithms, Fall 2013 December 5, 2013

Symbolic Model Checking Binary Decision Diagrams 2 Combinatorial Circuits 3 Eight Queen

The Mixed-integer Conic Optimizer in MOSEK 23rd International Symposium on Mathematical

R04 - Regression with Categorical Explanatory Variables STAT 587 (Engineering) Iowa State

Binary Classification Many slides attributable to: Prof. Mike - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Binary Classification Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Trees Linear Vs non-linear data structures Types of binary trees Binary tree traversals

Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree

The Power of Binary 0, 1, 10, 11, 100, 101, 110, 111... What is Binary? a binary number

Binary Trees, Heaps Binary Trees, Heaps K08

An introduction to R: Organisation and Basics of Algorithmics emie Becker, Sonja Grath &amp; Dirk

Introduction to Computer Science I Variables, Primitive Data Types, Expressions Janyl Jumadinova

LECTURE 1 INTRO &amp; NUMBER SYSTEMS MCS 260 Fall 2020 David Dumas / ZOOM 101 Scale the

CS 241 Data Organization Binary January 30, 2018 Combinations and Permutations In English we

Chapter 27 Entropy, Randomness, and Information CS 573: Algorithms, Fall 2013 December 5, 2013

Symbolic Model Checking Binary Decision Diagrams 2 Combinatorial Circuits 3 Eight Queen

The Mixed-integer Conic Optimizer in MOSEK 23rd International Symposium on Mathematical

R04 - Regression with Categorical Explanatory Variables STAT 587 (Engineering) Iowa State

An introduction to R: Organisation and Basics of Algorithmics emie Becker, Sonja Grath & Dirk

LECTURE 1 INTRO & NUMBER SYSTEMS MCS 260 Fall 2020 David Dumas / ZOOM 101 Scale the