and Evaluation CMSC 678 UMBC Central Question: How Well Are We - - PowerPoint PPT Presentation

and evaluation
SMART_READER_LITE
LIVE PREVIEW

and Evaluation CMSC 678 UMBC Central Question: How Well Are We - - PowerPoint PPT Presentation

Experimental Setup, Multi-class vs. Multi-label classification, and Evaluation CMSC 678 UMBC Central Question: How Well Are We Doing? Precision, Recall, F1 Accuracy Log-loss Classification ROC-AUC


slide-1
SLIDE 1

Experimental Setup, Multi-class vs. Multi-label classification, and Evaluation

CMSC 678 UMBC

slide-2
SLIDE 2

Central Question: How Well Are We Doing?

Classification Regression Clustering

the task: what kind

  • f problem are you

solving?

  • Precision,

Recall, F1

  • Accuracy
  • Log-loss
  • ROC-AUC
  • (Root) Mean Square Error
  • Mean Absolute Error
  • Mutual Information
  • V-score
slide-3
SLIDE 3

Central Question: How Well Are We Doing?

Classification Regression Clustering

the task: what kind

  • f problem are you

solving?

  • Precision,

Recall, F1

  • Accuracy
  • Log-loss
  • ROC-AUC
  • (Root) Mean Square Error
  • Mean Absolute Error
  • Mutual Information
  • V-score

This does not have to be the same thing as the loss function you

  • ptimize
slide-4
SLIDE 4

Outline

Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics

slide-5
SLIDE 5

Experimenting with Machine Learning Models

All your data Training Data

Dev Data Test Data

slide-6
SLIDE 6

Rule #1

slide-7
SLIDE 7

Experimenting with Machine Learning Models

What is “correct?” What is working “well?”

Training Data

Dev Data Test Data

Learn model parameters from training set

set hyper- parameters

slide-8
SLIDE 8

Experimenting with Machine Learning Models

What is “correct?” What is working “well?”

Training Data

Dev Data Test Data

set hyper- parameters

Evaluate the learned model on dev with that hyperparameter setting Learn model parameters from training set

slide-9
SLIDE 9

Experimenting with Machine Learning Models

What is “correct?” What is working “well?”

Training Data

Dev Data Test Data

set hyper- parameters

Evaluate the learned model on dev with that hyperparameter setting Learn model parameters from training set perform final evaluation on test, using the hyperparameters that

  • ptimized dev performance and

retraining the model

slide-10
SLIDE 10

Experimenting with Machine Learning Models

What is “correct?” What is working “well?”

Training Data

Dev Data Test Data

set hyper- parameters

Evaluate the learned model on dev with that hyperparameter setting Learn model parameters from training set perform final evaluation on test, using the hyperparameters that

  • ptimized dev performance and

retraining the model

Rule 1: DO NOT ITERATE ON THE TEST DATA

slide-11
SLIDE 11

On-board Exercise

Produce dev and test tables for a linear regression model with learned weights and set/fixed (non-learned) bias

slide-12
SLIDE 12

Outline

Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics

slide-13
SLIDE 13

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

Multi-label Classification

slide-14
SLIDE 14

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task

Multi-label Classification

slide-15
SLIDE 15

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for finite K), then a multi-class classification task Q: What are some examples

  • f multi-class classification?

Multi-label Classification

slide-16
SLIDE 16

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for finite K), then a multi-class classification task Q: What are some examples

  • f multi-class classification?

A: Many possibilities. See A2, Q{1,2,4-7}

Multi-label Classification

slide-17
SLIDE 17

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for finite K), then a multi-class classification task

Multi-label Classification

Single

  • utput

Multi-

  • utput

If multiple 𝑧𝑚 are predicted, then a multi- label classification task

slide-18
SLIDE 18

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for finite K), then a multi-class classification task

Multi-label Classification

Single

  • utput

Multi-

  • utput

Given input 𝑦, predict multiple discrete labels 𝑧 = (𝑧1, … , 𝑧𝑀)

If multiple 𝑧𝑚 are predicted, then a multi- label classification task

slide-19
SLIDE 19

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for finite K), then a multi-class classification task

Multi-label Classification

Single

  • utput

Multi-

  • utput

Given input 𝑦, predict multiple discrete labels 𝑧 = (𝑧1, … , 𝑧𝑀)

If multiple 𝑧𝑚 are predicted, then a multi- label classification task Each 𝑧𝑚 could be binary or multi-class

slide-20
SLIDE 20

Multi-Label Classification…

Will not be a primary focus of this course Many of the single output classification methods apply to multi-label classification Predicting “in the wild” can be trickier Evaluation can be trickier

slide-21
SLIDE 21

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

slide-22
SLIDE 22

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

Loss function may (or may not) need to be extended & the model structure may need to change (big or small)

slide-23
SLIDE 23

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

Loss function may (or may not) need to be extended & the model structure may need to change (big or small) Common change:

instead of a single weight vector 𝑥, keep a weight vector 𝑥(𝑑) for each class c Compute class specific scores, e.g., ෢ 𝑧𝑗

(𝑑) = 𝑥(𝑑) 𝑈𝑦 + 𝑐(𝑑)

slide-24
SLIDE 24

Multi-class Option 1: Linear Regression/Perceptron

𝐱 𝑦 𝑧 𝑧 = 𝐱𝑈𝑦 + 𝑐

  • utput:

if y > 0: class 1 else: class 2

slide-25
SLIDE 25

Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View

𝐱 𝑦 𝑧 𝑧 = 𝐱𝑈𝑦 + 𝑐 𝐱𝟑 𝑦 𝑧 𝑧1 = 𝐱𝟐

𝑈𝑦 + 𝑐1

𝐱𝟐 𝑧2 𝑧2 = 𝐱𝟑

𝑈𝑦 + 𝑐2

𝑧1

  • utput:

if y > 0: class 1 else: class 2

  • utput:

i = argmax {y1, y2} class i

binary version is special case

slide-26
SLIDE 26

Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View (alternative)

𝐱 𝑦 𝑧 𝑧 = 𝐱𝑈𝑦 + 𝑐 𝐱𝟑 𝑦 𝑧 𝑧1 = 𝒙𝟐; 𝒙𝟑 𝑼[𝑦; 𝟏] + 𝑐1 𝐱𝟐 𝑧2 𝑧1

  • utput:

if y > 0: class 1 else: class 2

  • utput:

i = argmax {y1, y2} class i 𝑧2 = 𝒙𝟐; 𝒙𝟑 𝑼[𝟏; 𝑦] + 𝑐2 concatenation Q: (For discussion) Why does this work?

slide-27
SLIDE 27

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

With C classes: Train C different binary classifiers 𝛿𝑑(𝑦)

𝛿𝑑(𝑦) predicts 1 if x is likely class c, 0 otherwise

slide-28
SLIDE 28

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

With C classes: Train C different binary classifiers 𝛿𝑑(𝑦)

𝛿𝑑(𝑦) predicts 1 if x is likely class c, 0 otherwise

To test/predict a new instance z:

Get scores 𝑡𝑑 = 𝛿𝑑(𝑨) Output the max of these scores, ො 𝑧 = argmax𝑑 𝑡𝑑

slide-29
SLIDE 29

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

With C classes: Train 𝐷

2 different binary

classifiers 𝛿𝑑1,𝑑2(𝑦)

slide-30
SLIDE 30

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

With C classes: Train 𝐷

2 different binary

classifiers 𝛿𝑑1,𝑑2(𝑦)

𝛿𝑑1,𝑑2(𝑦) predicts 1 if x is likely class 𝑑1, 0 otherwise (likely class 𝑑2)

slide-31
SLIDE 31

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

With C classes: Train 𝐷

2 different binary

classifiers 𝛿𝑑1,𝑑2(𝑦)

𝛿𝑑1,𝑑2(𝑦) predicts 1 if x is likely class 𝑑1, 0 otherwise (likely class 𝑑2)

To test/predict a new instance z:

Get scores or predictions 𝑡𝑑1,𝑑2 = 𝛿𝑑1,𝑑2 𝑨

slide-32
SLIDE 32

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA) classifier

(there can be others)

With C classes: Train 𝐷

2 different binary

classifiers 𝛿𝑑1,𝑑2(𝑦)

𝛿𝑑1,𝑑2(𝑦) predicts 1 if x is likely class 𝑑1, 0 otherwise (likely class 𝑑2)

To test/predict a new instance z:

Get scores or predictions 𝑡𝑑1,𝑑2 = 𝛿𝑑1,𝑑2 𝑨 Multiple options for final prediction:

(1) count # times a class c was predicted (2) margin-based approach

slide-33
SLIDE 33

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier

(there can be others)

Q: (to discuss) Why might you want to use option 1 or options OvA/AvA? What are the benefits of OvA vs. AvA?

slide-34
SLIDE 34

We’ve only developed binary classifiers so far…

Option 1: Develop a multi- class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA) classifier

(there can be others)

Q: (to discuss) Why might you want to use

  • ption 1 or options

OvA/AvA? What are the benefits of OvA vs. AvA?

What if you start with a balanced dataset, e.g., 100 instances per class?

slide-35
SLIDE 35

Outline

Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics

slide-36
SLIDE 36

Regression Metrics

(Root) Mean Square Error

𝑆𝑁𝑇𝐹 = 1 𝑂 ෍

𝑗 𝑂

𝑧𝑗 − ෝ 𝑧𝑗 2

slide-37
SLIDE 37

Regression Metrics

(Root) Mean Square Error Mean Absolute Error

𝑆𝑁𝑇𝐹 = 1 𝑂 ෍

𝑗 𝑂

𝑧𝑗 − ෝ 𝑧𝑗 2 𝑁𝐵𝐹 = 1 𝑂 ෍

𝑗 𝑂

|𝑧𝑗 − ෝ 𝑧𝑗|

slide-38
SLIDE 38

Regression Metrics

(Root) Mean Square Error Mean Absolute Error

𝑆𝑁𝑇𝐹 = 1 𝑂 ෍

𝑗 𝑂

𝑧𝑗 − ෝ 𝑧𝑗 2 𝑁𝐵𝐹 = 1 𝑂 ෍

𝑗 𝑂

|𝑧𝑗 − ෝ 𝑧𝑗|

Q: How can these reward/punish predictions differently?

slide-39
SLIDE 39

Regression Metrics

(Root) Mean Square Error Mean Absolute Error

𝑆𝑁𝑇𝐹 = 1 𝑂 ෍

𝑗 𝑂

𝑧𝑗 − ෝ 𝑧𝑗 2 𝑁𝐵𝐹 = 1 𝑂 ෍

𝑗 𝑂

|𝑧𝑗 − ෝ 𝑧𝑗|

Q: How can these reward/punish predictions differently? A: RMSE punishes outlier predictions more harshly

slide-40
SLIDE 40

Outline

Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics

slide-41
SLIDE 41

Training Loss vs. Evaluation Score

In training, compute loss to update parameters Sometimes loss is a computational compromise

  • surrogate loss

The loss you use might not be as informative as you’d like Binary classification: 90 of 100 training examples are +1, 10 of 100 are -1

slide-42
SLIDE 42

Some Classification Metrics

Accuracy Precision Recall AUC (Area Under Curve) F1 Confusion Matrix

slide-43
SLIDE 43

Classification Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed Not selected/ not guessed

Classes/Choices

slide-44
SLIDE 44

Classification Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) Not selected/ not guessed

Classes/Choices

Correct Guessed

slide-45
SLIDE 45

Classification Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) False Positive (FP) Not selected/ not guessed

Classes/Choices

Correct Guessed Correct Guessed

slide-46
SLIDE 46

Classification Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) False Positive (FP) Not selected/ not guessed False Negative (FN)

Classes/Choices

Correct Guessed Correct Guessed Correct Guessed

slide-47
SLIDE 47

Classification Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) False Positive (FP) Not selected/ not guessed False Negative (FN) True Negative (TN)

Classes/Choices

Correct Guessed Correct Guessed Correct Guessed Correct Guessed

slide-48
SLIDE 48

Classification Evaluation: Accuracy, Precision, and Recall

Accuracy: % of items correct

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP + TN TP + FP + FN + TN

slide-49
SLIDE 49

Classification Evaluation: Accuracy, Precision, and Recall

Accuracy: % of items correct Precision: % of selected items that are correct

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP TP + FP TP + TN TP + FP + FN + TN

slide-50
SLIDE 50

Classification Evaluation: Accuracy, Precision, and Recall

Accuracy: % of items correct Precision: % of selected items that are correct Recall: % of correct items that are selected

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP TP + FP TP TP + FN TP + TN TP + FP + FN + TN

slide-51
SLIDE 51

Classification Evaluation: Accuracy, Precision, and Recall

Accuracy: % of items correct Precision: % of selected items that are correct Recall: % of correct items that are selected

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP TP + FP TP TP + FN TP + TN TP + FP + FN + TN Min: 0 ☹︐ Max: 1 😁

slide-52
SLIDE 52

Precision and Recall Present a Tradeoff

precision recall 1 1 Q: Where do you want your ideal model?

model

slide-53
SLIDE 53

Precision and Recall Present a Tradeoff

precision recall 1 1 Q: Where do you want your ideal model?

model

Q: You have a model that always identifies correct

  • instances. Where
  • n this graph is it?

model

slide-54
SLIDE 54

Precision and Recall Present a Tradeoff

precision recall 1 1 Q: You have a model that always identifies correct

  • instances. Where
  • n this graph is it?

model

Q: You have a model that only make correct

  • predictions. Where
  • n this graph is it?

model

Q: Where do you want your ideal model?

model

slide-55
SLIDE 55

Precision and Recall Present a Tradeoff

precision recall 1 1 Q: You have a model that always identifies correct

  • instances. Where
  • n this graph is it?

model

Q: You have a model that only make correct

  • predictions. Where
  • n this graph is it?

model

Q: Where do you want your ideal model?

model

slide-56
SLIDE 56

Precision and Recall Present a Tradeoff

precision recall 1 1 Q: You have a model that always identifies correct

  • instances. Where
  • n this graph is it?

model

Q: You have a model that only make correct

  • predictions. Where
  • n this graph is it?

model

Q: Where do you want your ideal model?

model

Idea: measure the tradeoff between precision and recall Remember those hyperparameters: Each point is a differently trained/tuned model

slide-57
SLIDE 57

Precision and Recall Present a Tradeoff

precision recall 1 1 Q: You have a model that always identifies correct

  • instances. Where
  • n this graph is it?

model

Q: You have a model that only make correct

  • predictions. Where
  • n this graph is it?

model

Q: Where do you want your ideal model?

model

Idea: measure the tradeoff between precision and recall Improve overall model: push the curve that way

slide-58
SLIDE 58

Measure this Tradeoff: Area Under the Curve (AUC)

AUC measures the area under this tradeoff curve

precision recall

1 1

Improve overall model: push the curve that way

Min AUC: 0 ☹︐ Max AUC: 1 😁

slide-59
SLIDE 59

Measure this Tradeoff: Area Under the Curve (AUC)

AUC measures the area under this tradeoff curve

  • 1. Computing the curve

You need true labels & predicted labels with some score/confidence estimate Threshold the scores and for each threshold compute precision and recall

precision recall

1 1

Improve overall model: push the curve that way

Min AUC: 0 ☹︐ Max AUC: 1 😁

slide-60
SLIDE 60

Measure this Tradeoff: Area Under the Curve (AUC)

AUC measures the area under this tradeoff curve 1. Computing the curve

You need true labels & predicted labels with some score/confidence estimate Threshold the scores and for each threshold compute precision and recall

2. Finding the area

How to implement: trapezoidal rule (&

  • thers)

In practice: external library like the sklearn.metrics module

precision recall

1 1

Improve overall model: push the curve that way

Min AUC: 0 ☹︐ Max AUC: 1 😁

slide-61
SLIDE 61

Measure A Slightly Different Tradeoff: ROC-AUC

AUC measures the area under this tradeoff curve 1. Computing the curve

You need true labels & predicted labels with some score/confidence estimate Threshold the scores and for each threshold compute metrics

2. Finding the area

How to implement: trapezoidal rule (& others)

In practice: external library like the sklearn.metrics module

True positive rate False positive rate

1 1

Improve overall model: push the curve that way

Min ROC-AUC: 0.5 ☹︐ Max ROC-AUC: 1 😁 Main variant: ROC-AUC

Same idea as before but with some flipped metrics

slide-62
SLIDE 62

A combined measure: F

Weighted (harmonic) average of Precision & Recall 𝐺 = 1 𝛽 1 𝑄 + (1 − 𝛽) 1 𝑆

slide-63
SLIDE 63

A combined measure: F

Weighted (harmonic) average of Precision & Recall 𝐺 = 1 𝛽 1 𝑄 + (1 − 𝛽) 1 𝑆 = 1 + 𝛾2 ∗ 𝑄 ∗ 𝑆 (𝛾2 ∗ 𝑄) + 𝑆

algebra (not important)

slide-64
SLIDE 64

A combined measure: F

Weighted (harmonic) average of Precision & Recall Balanced F1 measure: β=1 𝐺 = 1 + 𝛾2 ∗ 𝑄 ∗ 𝑆 (𝛾2 ∗ 𝑄) + 𝑆 𝐺

1 = 2 ∗ 𝑄 ∗ 𝑆

𝑄 + 𝑆

slide-65
SLIDE 65

P/R/F in a Multi-class Setting: Micro- vs. Macro-Averaging

If we have more than one class, how do we combine multiple performance measures into one quantity? Macroaveraging: Compute performance for each class, then average. Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.

  • Sec. 15.2.4
slide-66
SLIDE 66

P/R/F in a Multi-class Setting: Micro- vs. Macro-Averaging

Macroaveraging: Compute performance for each class, then average. Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.

  • Sec. 15.2.4

microprecision = σc TP

c

σc TP

c + σc FP c

macroprecision = ෍

𝑑

TP

c

TP

c + FP c

= ෍

𝑑

precision𝑑

slide-67
SLIDE 67

P/R/F in a Multi-class Setting: Micro- vs. Macro-Averaging

Macroaveraging: Compute performance for each class, then average. Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.

  • Sec. 15.2.4

microprecision = σc TP

c

σc TP

c + σc FP c

macroprecision = ෍

𝑑

TP

c

TP

c + FP c

= ෍

𝑑

precision𝑑

when to prefer the macroaverage? when to prefer the microaverage?

slide-68
SLIDE 68

Micro- vs. Macro-Averaging: Example

Truth : yes Truth : no Classifier: yes 10 10 Classifier: no 10 970 Truth : yes Truth : no Classifier: yes 90 10 Classifier: no 10 890 Truth : yes Truth : no Classifier: yes 100 20 Classifier: no 20 1860

Class 1 Class 2 Micro Ave. Table

  • Sec. 15.2.4

Macroaveraged precision: (0.5 + 0.9)/2 = 0.7 Microaveraged precision: 100/120 = .83 Microaveraged score is dominated by score on frequent classes

slide-69
SLIDE 69

Confusion Matrix: Generalizing the 2-by-2 contingency table

Correct Value Guessed Value # # # # # # # # #

slide-70
SLIDE 70

Confusion Matrix: Generalizing the 2-by-2 contingency table

Correct Value Guessed Value 80 9 11 7 86 7 2 8 9

Q: Is this a good result?

slide-71
SLIDE 71

Confusion Matrix: Generalizing the 2-by-2 contingency table

Correct Value Guessed Value 30 40 30 25 30 50 30 35 35

Q: Is this a good result?

slide-72
SLIDE 72

Confusion Matrix: Generalizing the 2-by-2 contingency table

Correct Value Guessed Value 7 3 90 4 8 88 3 7 90

Q: Is this a good result?

slide-73
SLIDE 73

Some Classification Metrics

Accuracy Precision Recall AUC (Area Under Curve) F1 Confusion Matrix

slide-74
SLIDE 74

Outline

Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics