Measuring model performance or error Introduction to Machine - - PowerPoint PPT Presentation

measuring model performance or error
SMART_READER_LITE
LIVE PREVIEW

Measuring model performance or error Introduction to Machine - - PowerPoint PPT Presentation

INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Introduction to Machine Learning Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks


slide-1
SLIDE 1

INTRODUCTION TO MACHINE LEARNING

Measuring model performance or error

slide-2
SLIDE 2

Introduction to Machine Learning

Is our model any good?

  • Context of task
  • Accuracy
  • Computation time
  • Interpretability
  • 3 types of tasks
  • Classification
  • Regression
  • Clustering
slide-3
SLIDE 3

Introduction to Machine Learning

Classification

  • Accuracy and Error
  • System is right or wrong
  • Accuracy goes up when Error goes down

Accuracy = correctly classified instances total amount of classified instances Error = 1 - Accuracy

slide-4
SLIDE 4

Introduction to Machine Learning

Example

  • Squares with 2 features: small/big and solid/doed
  • Label: colored/not colored
  • Binary classification problem
slide-5
SLIDE 5

Introduction to Machine Learning

Example

✔ ✔ ✔ ✔ ✔ ✔ ✘ ✘ = = 3 5 60%

Truth Predicted

✔ ✔ ✔ ✘ ✘

slide-6
SLIDE 6

Introduction to Machine Learning

Example

✔ ✔ ✔ ✔ ✔ ✔ ✘ ✘ = = 3 5 60%

Truth Predicted

✔ ✔ ✔ ✘ ✘

slide-7
SLIDE 7

Introduction to Machine Learning

Limits of accuracy

  • Classifying very rare heart disease
  • Classify all as negative (not sick)
  • Predict 99 correct (not sick) and miss 1
  • Accuracy: 99%
  • Bogus… you miss every positive case!
slide-8
SLIDE 8

Introduction to Machine Learning

Confusion matrix

  • Rows and columns contain all available labels
  • Each cell contains frequency of instances that are classified

in a certain way

slide-9
SLIDE 9

Introduction to Machine Learning

Confusion matrix

  • Binary classifier: positive or negative (1 or 0)

Prediction P N Truth p TP FN n FP TN

slide-10
SLIDE 10

Introduction to Machine Learning

Confusion matrix

Prediction P N Truth p TP FN n FP TN

True Positives Prediction: P Truth: P

  • Binary classifier: positive or negative (1 or 0)
slide-11
SLIDE 11

Introduction to Machine Learning

Confusion matrix

Prediction P N Truth p TP FN n FP TN

True Negatives Prediction: N Truth: N

  • Binary classifier: positive or negative (1 or 0)
slide-12
SLIDE 12

Introduction to Machine Learning

Confusion matrix

  • Binary classifier: positive or negative (1 or 0)

Prediction P N Truth p TP FN n FP TN

False Negatives Prediction: N Truth: P

slide-13
SLIDE 13

Introduction to Machine Learning

  • Binary classifier: positive or negative (1 or 0)

Confusion matrix

Prediction P N Truth p TP FN n FP TN

False Positives Prediction: P Truth: N

slide-14
SLIDE 14

Introduction to Machine Learning

  • Accuracy
  • Precision
  • Recall

Ratios in the confusion matrix

Prediction P N Truth p TP FN n FP TN

slide-15
SLIDE 15

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

Precision TP/(TP+FP)

  • Accuracy
  • Precision
  • Recall
slide-16
SLIDE 16

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

Precision TP/(TP+FP)

  • Accuracy
  • Precision
  • Recall
slide-17
SLIDE 17

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

Recall TP/(TP+FN)

  • Accuracy
  • Precision
  • Recall
slide-18
SLIDE 18

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

Recall TP/(TP+FN)

  • Accuracy
  • Precision
  • Recall
slide-19
SLIDE 19

Introduction to Machine Learning

Back to the squares

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

slide-20
SLIDE 20

Introduction to Machine Learning

Back to the squares

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

slide-21
SLIDE 21

Introduction to Machine Learning

Back to the squares

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

✔ ✔

slide-22
SLIDE 22

Introduction to Machine Learning

Back to the squares

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

slide-23
SLIDE 23

Introduction to Machine Learning

Back to the squares

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

slide-24
SLIDE 24

Introduction to Machine Learning

Back to the squares

  • Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
  • Precision: TP/(TP+FP) = 1/(1+1) = 50%
  • Recall: TP/(TP+FN) = 1/(1+1) = 50%

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

✔ ✔ ✔

slide-25
SLIDE 25

Introduction to Machine Learning

Back to the squares

  • Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
  • Precision: TP/(TP+FP) = 1/(1+1) = 50%
  • Recall: TP/(TP+FN) = 1/(1+1) = 50%

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

✔ ✔ ✔ ✘ ✘

slide-26
SLIDE 26

Introduction to Machine Learning

Back to the squares

  • Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
  • Precision: TP/(TP+FP) = 1/(1+1) = 50%
  • Recall: TP/(TP+FN) = 1/(1+1) = 50%

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

✔ ✘

slide-27
SLIDE 27

Introduction to Machine Learning

Back to the squares

Prediction P N Truth p 1 1 n 1 2

Truth Predicted

  • Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
  • Precision: TP/(TP+FP) = 1/(1+1) = 50%
  • Recall: TP/(TP+FN) = 1/(1+1) = 50%

slide-28
SLIDE 28

Introduction to Machine Learning

Rare heart disease

  • Accuracy: 99/(99+1) = 99%
  • Recall: 0/1 = 0%
  • Precision: undefined — no positive predictions

Prediction P N Truth p 1 n 99

slide-29
SLIDE 29

Introduction to Machine Learning

Regression: RMSE

  • Root Mean Squared Error (RMSE)
  • Mean distance between estimates and regression line
  • 6

7 8 9 10 11 12 6 7 8 9 10 11 12 X1 X2

slide-30
SLIDE 30

Introduction to Machine Learning

Clustering

  • No label information
  • Need distance metric between points
slide-31
SLIDE 31

Introduction to Machine Learning

Clustering

  • Performance measure consists of 2 elements
  • Similarity within each cluster
  • Similarity between clusters

slide-32
SLIDE 32

Introduction to Machine Learning

  • −5

5 10 −5 5 10 X1 X2

  • Within cluster similarity
  • Within sum of squares (WSS)
  • Diameter
  • Minimize
slide-33
SLIDE 33

Introduction to Machine Learning

  • −5

5 10 −5 5 10 X1 X2

  • Between cluster similarity
  • Between cluster sum of squares (BSS)
  • Intercluster distance
  • Maximize
slide-34
SLIDE 34

Introduction to Machine Learning

Dunn’s index

  • −5

5 10 −5 5 10 X1 X2

  • minimal intercluster distance

maximal diameter

slide-35
SLIDE 35

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

slide-36
SLIDE 36

INTRODUCTION TO MACHINE LEARNING

Training set and test set

slide-37
SLIDE 37

Introduction to Machine Learning

Machine learning - statistics

  • Predictive power vs. descriptive power
  • Supervised learning: model must predict
  • unseen observations
  • Classical statistics: model must fit data
  • explain or describe data
slide-38
SLIDE 38

Introduction to Machine Learning

Predictive model

  • Training
  • not on complete dataset
  • training set
  • Test set to evaluate performance of model
  • Sets are disjoint: NO OVERLAP
  • Model tested on unseen observations 

  • > Generalization!
slide-39
SLIDE 39

Introduction to Machine Learning

Split the dataset

  • N instances (=observations): X
  • K features: F
  • Class labels: y

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Training set Test set

slide-40
SLIDE 40

Introduction to Machine Learning

Split the dataset

  • N instances (=observations): X
  • K features: F
  • Class labels: y

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Test set Training set

slide-41
SLIDE 41

Introduction to Machine Learning

Split the dataset

  • N instances (=observations): X
  • K features: F
  • Class labels: y

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Training set

Test set

slide-42
SLIDE 42

Introduction to Machine Learning

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Split the dataset

  • N instances (=observations): X
  • K features: F
  • Class labels: y

Training set

Test set

slide-43
SLIDE 43

Introduction to Machine Learning

Split the dataset

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Training set Test set

slide-44
SLIDE 44

Introduction to Machine Learning

Split the dataset

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Use to predict y: ŷ

Training set Test set

slide-45
SLIDE 45

Introduction to Machine Learning

Split the dataset

f1 f2 … fK y x1 x1,1 x1,2 … x1,K y1 x2 x2,1 x2,2 … x2,K y2 … … … … … … xr xr,1 xr,2 … xr,K yr xr+1 xr+1,1 xr+1,2 … xr+1,K yr+1 xr+2 xr+2,1 xr+2,2 … xr+2,K yr+2 … … … … … … xN xN,1 xN,2 … xN,K yN

Use to predict y: ŷ real y

compare them

Training set Test set

slide-46
SLIDE 46

Introduction to Machine Learning

When to use training/test set?

  • Supervised learning
  • Not for unsupervised (clustering)
  • Data not labeled
slide-47
SLIDE 47

Introduction to Machine Learning

Predictive power of model

Train model Test model Training set Test set Performance measure Predictive power Use model

slide-48
SLIDE 48

Introduction to Machine Learning

How to split the sets?

  • Which observations go where?
  • Training set larger test set
  • Typically about 3/1
  • Quite arbitrary
  • Generally: more data = beer model
  • Test set not too small
slide-49
SLIDE 49

Introduction to Machine Learning

Distribution of the sets

  • Classification
  • classes must have similar distributions
  • avoid a class not being available in a set
  • Classification & regression
  • shuffle dataset before spliing
slide-50
SLIDE 50

Introduction to Machine Learning

Effect of sampling

  • Sampling can affect performance measures
  • Add robustness to these measures: cross-validation
  • Idea: sample multiple times, with different separations
slide-51
SLIDE 51

Introduction to Machine Learning

Cross-validation

Test set Test set Test set Test set Training set Training set Training set Training set

  • E.g.: 4-fold cross-validation
slide-52
SLIDE 52

Introduction to Machine Learning

Cross-validation

  • E.g.: 4-fold cross-validation

Test set Test set Test set Test set Training set Training set Training set Training set

slide-53
SLIDE 53

Introduction to Machine Learning

Cross-validation

  • E.g.: 4-fold cross-validation

Test set Test set Test set Test set Training set Training set Training set Training set

slide-54
SLIDE 54

Introduction to Machine Learning

Cross-validation

  • E.g.: 4-fold cross-validation

aggregate results for robust measure

Test set Test set Test set Test set Training set Training set Training set Training set

slide-55
SLIDE 55

Introduction to Machine Learning

n-fold cross-validation

  • Fold test set over dataset n times
  • Each test set is 1/n size of total dataset
slide-56
SLIDE 56

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

slide-57
SLIDE 57

INTRODUCTION TO MACHINE LEARNING

Bias and Variance

slide-58
SLIDE 58

Introduction to Machine Learning

What you’ve learned?

  • Accuracy and other performance measures
  • Training and test set
slide-59
SLIDE 59

Introduction to Machine Learning

Kniing it all together

  • Effect of spliing dataset (train/test) on accuracy
  • Over- and underfiing
slide-60
SLIDE 60

Introduction to Machine Learning

Introducing

BIAS VARIANCE

slide-61
SLIDE 61

Introduction to Machine Learning

Bias and Variance

  • Main goal of supervised learning: prediction
  • Prediction error ~ reducible + irreducible error
slide-62
SLIDE 62

Introduction to Machine Learning

Irreducible - reducible error

  • Irreducible: noise — don’t minimize
  • Reducible: error due to unfit model — minimize
  • Reducible error is split into bias and variance
slide-63
SLIDE 63

Introduction to Machine Learning

Bias

  • Error due to bias: wrong assumptions
  • Difference predictions and truth
  • using models trained by specific learning algorithm
slide-64
SLIDE 64

Introduction to Machine Learning

Example

slide-65
SLIDE 65

Introduction to Machine Learning

Example

  • Quadratic data
slide-66
SLIDE 66

Introduction to Machine Learning

Example

  • Quadratic data
  • Assumption: data is linear

— use linear regression

slide-67
SLIDE 67

Introduction to Machine Learning

Example

  • Quadratic data
  • Assumption: data is linear

— use linear regression

  • Error due to bias is high:

more restrictions on model

slide-68
SLIDE 68

Introduction to Machine Learning

Bias

  • Complexity of model
  • More restrictions lead to high bias
slide-69
SLIDE 69

Introduction to Machine Learning

Variance

  • Error due to variance: error due to the sampling of the

training set

  • Model with high variance fits training set closely
slide-70
SLIDE 70

Introduction to Machine Learning

Example

  • Quadratic data
  • Few restrictions: fit polynomial

perfectly through training set

  • If you change training set,

model will change completely

high variance : generalizes bad to test set

slide-71
SLIDE 71

Introduction to Machine Learning

Bias-variance tradeoff

BIAS VARIANCE

low variance - high bias low bias - high variance

slide-72
SLIDE 72

Introduction to Machine Learning

Overfiing

  • Accuracy will depend on dataset split (train/test)
  • High variance will heavily depend on split
  • Overfiing = model fits training set a lot beer than test set
  • Too specific
slide-73
SLIDE 73

Introduction to Machine Learning

Underfiing

  • Restricting your model too much
  • High bias
  • Too general
slide-74
SLIDE 74

Introduction to Machine Learning

Example - spam or not?

Truth A lot of capital letters?

yes

A lot of exclamation marks?

yes

spam

Emails training set

capital letters exclamation marks

no spam no spam

no no

exception with
 50 capital letters 30 exclamation marks is no spam

slide-75
SLIDE 75

Introduction to Machine Learning

Emails training set

capital letters exclamation marks

exception with
 50 capital letters 30 exclamation marks is no spam

no spam

yes yes

Overfit A lot of capital letters? A lot of exclamation marks?

yes

no spam no spam

no no yes

50 capital letters?

spam

no

30 exclamation marks?

spam

no

Example - spam or not?

too specific!

slide-76
SLIDE 76

Introduction to Machine Learning

Underfit More than 10 capital letters?

yes

spam

Emails training set

capital letters exclamation marks

no spam

no

Example - spam or not?

too general!

slide-77
SLIDE 77

INTRODUCTION TO MACHINE LEARNING

Let’s practice!