Introduction to Machine Learning: Classification and The Noisy - - PowerPoint PPT Presentation

β–Ά
introduction to machine learning classification and the
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning: Classification and The Noisy - - PowerPoint PPT Presentation

Introduction to Machine Learning: Classification and The Noisy Channel Model CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier


slide-1
SLIDE 1

Introduction to Machine Learning: Classification and The Noisy Channel Model

CMSC 473/673 UMBC

Some slides adapted from 3SLP

slide-2
SLIDE 2

Outline

Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation

slide-3
SLIDE 3

Probabilistic Classification

π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

π‘ž 𝑍 π‘Œ) = β„Ž(π‘Œ; 𝑍)

Discriminatively trained classifier Generatively trained classifier

Directly model the posterior Model the posterior with Bayes rule

slide-4
SLIDE 4

Outline

Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation

slide-5
SLIDE 5

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region.

slide-6
SLIDE 6

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region.

slide-7
SLIDE 7

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.

Source: http://www.nytimes.com/2016/09/20/nyregion/cellphone-alerts-used-in-search-of- manhattan-bombing-suspect.html

slide-8
SLIDE 8

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.

Source: http://www.nytimes.com/2016/09/20/nyregion/cellphone-alerts-used-in-search-of- manhattan-bombing-suspect.html

slide-9
SLIDE 9

Classify with Uncertainty

Use probabilities

slide-10
SLIDE 10

Classify with Uncertainty

Use probabilities*

*There are non-probabilistic ways to handle uncertainty… but probabilities sure are handy!

slide-11
SLIDE 11

Classification

POLITICS .05 TERRORISM .48 SPORTS .0001 TECH .39 HEALTH .0001 FINANCE .0002 …

Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.

Source: http://www.nytimes.com/2016/09/20/nyregion/cellphone-alerts-used-in-search-of- manhattan-bombing-suspect.html

slide-12
SLIDE 12

Text Classification

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

slide-13
SLIDE 13

Text Classification

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

a document a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c from C

slide-14
SLIDE 14

Text Classification

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

a document linguistic blob a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c from C

slide-15
SLIDE 15

Text Classification: Hand-coded Rules?

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Rules based on combinations of words or other features

spam: black-list-address OR (β€œdollars” AND β€œhave been selected”)

Accuracy can be high

If rules carefully refined by expert

Building and maintaining these rules is expensive Can humans faithfully assign uncertainty?

slide-16
SLIDE 16

Text Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

a document d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

Output:

a learned classifier Ξ³ that maps documents to classes

slide-17
SLIDE 17

Text Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

a document d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

Output:

a learned classifier Ξ³ that maps documents to classes

NaΓ―ve Bayes Logistic regression Support-vector machines k-Nearest Neighbors …

slide-18
SLIDE 18

Text Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

NaΓ―ve Bayes Logistic regression Support-vector machines k-Nearest Neighbors …

Input:

a document d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

Output:

a learned classifier Ξ³ that maps documents to classes

slide-19
SLIDE 19

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

Multi-label Classification

slide-20
SLIDE 20

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task

Multi-label Classification

slide-21
SLIDE 21

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 βˆ’ 1} (for finite K), then a multi-class classification task Q: What are some examples

  • f multi-class classification?

Multi-label Classification

slide-22
SLIDE 22

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 βˆ’ 1} (for finite K), then a multi-class classification task

Multi-label Classification

Single

  • utput

Multi-

  • utput

If multiple π‘§π‘š are predicted, then a multi- label classification task

slide-23
SLIDE 23

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 βˆ’ 1} (for finite K), then a multi-class classification task

Multi-label Classification

Single

  • utput

Multi-

  • utput

Given input 𝑦, predict multiple discrete labels 𝑧 = (𝑧1, … , 𝑧𝑀)

If multiple π‘§π‘š are predicted, then a multi- label classification task

slide-24
SLIDE 24

Multi-class Classification

Given input 𝑦, predict discrete label 𝑧

If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False}), then a binary classification task If 𝑧 ∈ {0,1, … , 𝐿 βˆ’ 1} (for finite K), then a multi-class classification task

Multi-label Classification

Single

  • utput

Multi-

  • utput

Given input 𝑦, predict multiple discrete labels 𝑧 = (𝑧1, … , 𝑧𝑀)

If multiple π‘§π‘š are predicted, then a multi- label classification task Each π‘§π‘š could be binary or multi-class

slide-25
SLIDE 25

Outline

Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation

slide-26
SLIDE 26

Probabilistic Text Classification

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

π‘ž 𝑍 π‘Œ) = π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

class

  • bserved

data

slide-27
SLIDE 27

Probabilistic Text Classification

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

class

  • bserved

data prior probability of class

  • bservation likelihood (averaged over all classes)

class-based likelihood (language model)

π‘ž 𝑍 π‘Œ) = π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

slide-28
SLIDE 28

Probabilistic Text Classification

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

class

  • bserved

data class-based likelihood (language model) prior probability of class

  • bservation likelihood (averaged over all classes)

π‘ž 𝑍 π‘Œ) = π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

slide-29
SLIDE 29

Classification with Bayes Rule argmaxπ‘π‘ž 𝑍 π‘Œ)

slide-30
SLIDE 30

Classification with Bayes Rule argmax𝑍 π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

slide-31
SLIDE 31

Classification with Bayes Rule argmax𝑍 π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

constant with respect to Y

slide-32
SLIDE 32

Classification with Bayes Rule argmaxπ‘π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

Classification with Bayes Rule argmax𝑍 log π‘ž π‘Œ 𝑍) + log π‘ž(𝑍)

slide-36
SLIDE 36

Classification (labels) with Bayes Rule

argmax𝑍 log π‘ž π‘Œ 𝑍) + log π‘ž(𝑍)

how well does text X represent label Y? how likely is label Y overall?

For β€œsimple” or β€œflat” labels: * iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score

slide-37
SLIDE 37

Classification/Decoding with Bayes Rule

argmax𝑍 log π‘ž π‘Œ 𝑍) + log π‘ž(𝑍)

how well does text (complex input) X represent text (complex output) Y? how likely is text (complex

  • utput) Y overall?

* iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score

If Y is a string (or some complex structure), this can be complicated

slide-38
SLIDE 38

Terminology: Noisy Channel Model

For us, it means using Bayes rule to build a probabilistic classifier/system

Image source: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSZZUgncH-1nsicmUNzhOT2ttW8DZBEQS1HmK5JwqCET0X1gnX5qw

slide-39
SLIDE 39

Noisy Channel Model

slide-40
SLIDE 40

Noisy Channel Model

what I want to tell you β€œsports”

slide-41
SLIDE 41

Noisy Channel Model

what I want to tell you β€œsports” what you actually see β€œThe Os lost again…”

slide-42
SLIDE 42

Noisy Channel Model

what I want to tell you β€œsports” what you actually see β€œThe Os lost again…” Decode hypothesized intent β€œsad stories” β€œsports”

slide-43
SLIDE 43

Noisy Channel Model

what I want to tell you β€œsports” what you actually see β€œThe Os lost again…” Decode Rerank hypothesized intent β€œsad stories” β€œsports” reweight according to what’s likely β€œsports”

slide-44
SLIDE 44

Noisy Channel

Machine translation Speech-to-text Spelling correction Text normalization Part-of-speech tagging Morphological analysis Image captioning …

π‘ž 𝑍 π‘Œ) = π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

possible (clean)

  • utput
  • bserved

(noisy) text translation/ decode model (clean) language model

  • bservation (noisy) likelihood
slide-45
SLIDE 45

Noisy Channel

Machine translation Speech-to-text Spelling correction Text normalization Part-of-speech tagging Morphological analysis Image captioning …

possible (clean)

  • utput
  • bserved

(noisy) text (clean) language model

  • bservation (noisy) likelihood

translation/ decode model

π‘ž 𝑍 π‘Œ) = π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍) π‘ž(π‘Œ)

slide-46
SLIDE 46

Language Model

Use any of the language modeling algorithms we’ve learned Unigram, bigram, trigram Add-Ξ», interpolation, backoff (Later: Maxent, RNNs, hierarchical Bayesian LMs, …)

slide-47
SLIDE 47

Probabilistic Classification

π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

π‘ž 𝑍 π‘Œ) = β„Ž(π‘Œ; 𝑍)

Discriminatively trained classifier Generatively trained classifier

Directly model the posterior Model the posterior with Bayes rule

Noisy Channel Model Decoding Discriminative training (e.g., maxent models: we’ll cover these soon)

slide-48
SLIDE 48

Outline

Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Classification

slide-49
SLIDE 49

Problem: Develop a Probabilistic Email Classifier

Input: an email (all text) Output (Google categories): Primary, Social, Forums, Spam Approach #1: Using Bayes rule Approach #2 (later classes): Discriminatively trained

slide-50
SLIDE 50

Problem: Develop a Probabilistic Email Classifier

Input: an email (all text) Output (Google categories): Primary, Social, Forums, Spam Approach #1: Using Bayes rule Approach #2 (later classes): Discriminatively trained

Q: What type of classification problem is this?

slide-51
SLIDE 51

Problem: Develop a Probabilistic Email Classifier

Input: an email (all text) Output (Google categories): Primary, Social, Forums, Spam Approach #1: Using Bayes rule Approach #2 (later classes): Discriminatively trained

Q: What type of classification problem is this? A: multi-class (single label) classifier

slide-52
SLIDE 52

Classify Using Bayes Rule π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

slide-53
SLIDE 53

Classify Using Bayes Rule π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

Q: Why is p(Y | X) what we want to model?

slide-54
SLIDE 54

Classify Using Bayes Rule π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

Q: Why is p(Y | X) what we want to model? Q: To classify a document, do we need to find the normalizing constant?

slide-55
SLIDE 55

Classify Using Bayes Rule π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

Q: Why is p(Y | X) what we want to model? Q: To classify a document, do we need to find the normalizing constant? Q: If we can compute p(Y | X), up to a constant, how do we find the predicted label?

slide-56
SLIDE 56

Classify Using Bayes Rule π‘ž 𝑍 π‘Œ) ∝ π‘ž π‘Œ 𝑍) βˆ— π‘ž(𝑍)

Won’t you please donate?

π‘ž ) ∝ π‘ž )π‘ž( )

Primary Primary Primary

Won’t you please donate?

slide-57
SLIDE 57

A Closer Look at π‘ž )

This is a class specific language model

Primary

Won’t you please donate?

slide-58
SLIDE 58

A Closer Look at π‘ž )

This is a class specific language model π‘ž ) is different from π‘ž ) is different from π‘ž ) …

Primary

Won’t you please donate?

Primary

Won’t you please donate?

Social

Won’t you please donate?

Forums

Won’t you please donate?

slide-59
SLIDE 59

A Closer Look at π‘ž )

This is a class specific language model To learn π‘ž ): For each class Class:

Get a bunch of Class documents 𝐸Class Learn a new language model π‘žClass on just 𝐸Class

Primary

Won’t you please donate?

Class

Won’t you please donate?

slide-60
SLIDE 60

Two Ways to Learn Class-specific Count-based Language Models

  • 1. Create different count

table(s) 𝑑Class(… ) for each Class

e.g., record separate trigram counts for Primary vs. Social

  • vs. Forums vs. Spam
slide-61
SLIDE 61

Two Ways to Learn Class-specific Count-based Language Models

  • 1. Create different count table(s)

𝑑Class(… ) for each Class

e.g., record separate trigram counts for Primary vs. Social vs. Forums vs. Spam

OR

  • 2. Add a dimension to your existing

tables 𝑑(Class, … )

e.g., record how often each trigram

  • ccurs within Primary vs. Social vs.

Forums vs. Spam documents

slide-62
SLIDE 62

Two Ways to Learn Class-specific Count-based Language Models

  • 1. Create different count table(s)

𝑑Class(… ) for each Class

e.g., record separate trigram counts for Primary vs. Social vs. Forums vs. Spam

OR

  • 2. Add a dimension to your existing

tables 𝑑(Class, … )

e.g., record how often each trigram

  • ccurs within Primary vs. Social vs.

Forums vs. Spam documents

Q: Are these two conceptually the same?

slide-63
SLIDE 63

Two Ways to Learn Class-specific Count-based Language Models

  • 1. Create different count table(s)

𝑑Class(… ) for each Class

e.g., record separate trigram counts for Primary vs. Social vs. Forums vs. Spam

OR

  • 2. Add a dimension to your existing

tables 𝑑(Class, … )

e.g., record how often each trigram

  • ccurs within Primary vs. Social vs.

Forums vs. Spam documents

Q: Are these two conceptually the same? Q: How might the option you choose influence implementation (or vice versa)?

slide-64
SLIDE 64

Two Ways to Learn Class-specific Count-based Language Models

  • 1. Create different count table(s)

𝑑Class(… ) for each Class

e.g., record separate trigram counts for Primary vs. Social vs. Forums vs. Spam

OR

  • 2. Add a dimension to your existing

tables 𝑑(Class, … )

e.g., record how often each trigram

  • ccurs within Primary vs. Social vs.

Forums vs. Spam documents

Q: Are these two conceptually the same? Q: How might the option you choose influence implementation (or vice versa)? Q: Will one approach always be better than the

  • ther?
slide-65
SLIDE 65

A Closer Look at π‘ž( )

This is the prior probability of each class Answers the question: without knowing anything specific about a document, how likely is each class?

Primary

slide-66
SLIDE 66

A Closer Look at π‘ž( )

This is the prior probability of each class Answers the question: without knowing anything specific about a document, how likely is each class?

Primary

Q: What’s an easy way to estimate it?

slide-67
SLIDE 67

A Closer Look at π‘ž( )

This is the prior probability of each class Answers the question: without knowing anything specific about a document, how likely is each class?

Primary

Q: What’s an easy way to estimate it? Q: Could we use our smoothing techniques?

slide-68
SLIDE 68

Outline

Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation

slide-69
SLIDE 69

Experimenting with Machine Learning Models

All your data Training Data

Dev Data Test Data

slide-70
SLIDE 70

Rule #1

slide-71
SLIDE 71

Experimenting with Machine Learning Models

What is β€œcorrect?” What is working β€œwell?”

Training Data

Dev Data Test Data

Learn model parameters from training set

set hyper- parameters

slide-72
SLIDE 72

Experimenting with Machine Learning Models

What is β€œcorrect?” What is working β€œwell?”

Training Data

Dev Data Test Data

set hyper- parameters

Evaluate the learned model on dev with that hyperparameter setting Learn model parameters from training set

slide-73
SLIDE 73

Experimenting with Machine Learning Models

What is β€œcorrect?” What is working β€œwell?”

Training Data

Dev Data Test Data

set hyper- parameters

Evaluate the learned model on dev with that hyperparameter setting Learn model parameters from training set perform final evaluation on test, using the hyperparameters that

  • ptimized dev performance and

retraining the model

slide-74
SLIDE 74

Experimenting with Machine Learning Models

What is β€œcorrect?” What is working β€œwell?”

Training Data

Dev Data Test Data

set hyper- parameters

Evaluate the learned model on dev with that hyperparameter setting Learn model parameters from training set perform final evaluation on test, using the hyperparameters that

  • ptimized dev performance and

retraining the model

Rule 1: DO NOT ITERATE ON THE TEST DATA

slide-75
SLIDE 75

Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed Not selected/ not guessed

slide-76
SLIDE 76

Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed Not selected/ not guessed

Classes/Choices

slide-77
SLIDE 77

Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) Not selected/ not guessed

Classes/Choices

Correct Guessed

slide-78
SLIDE 78

Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) False Positive (FP) Not selected/ not guessed

Classes/Choices

Correct Guessed Correct Guessed

slide-79
SLIDE 79

Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) False Positive (FP) Not selected/ not guessed False Negative (FN)

Classes/Choices

Correct Guessed Correct Guessed Correct Guessed

slide-80
SLIDE 80

Evaluation: the 2-by-2 contingency table

Actually Correct Actually Incorrect Selected/ Guessed True Positive (TP) False Positive (FP) Not selected/ not guessed False Negative (FN) True Negative (TN)

Classes/Choices

Correct Guessed Correct Guessed Correct Guessed Correct Guessed

slide-81
SLIDE 81

Accuracy, Precision, and Recall

Accuracy: % of items correct

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP + TN TP + FP + FN + TN

slide-82
SLIDE 82

Accuracy, Precision, and Recall

Accuracy: % of items correct Precision: % of selected items that are correct

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP TP + FP TP + TN TP + FP + FN + TN

slide-83
SLIDE 83

Accuracy, Precision, and Recall

Accuracy: % of items correct Precision: % of selected items that are correct Recall: % of correct items that are selected

Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

TP TP + FP TP TP + FN TP + TN TP + FP + FN + TN

slide-84
SLIDE 84

A combined measure: F

Weighted (harmonic) average of Precision & Recall 𝐺 = 1 𝛽 1 𝑄 + (1 βˆ’ 𝛽) 1 𝑆

slide-85
SLIDE 85

A combined measure: F

Weighted (harmonic) average of Precision & Recall 𝐺 = 1 𝛽 1 𝑄 + (1 βˆ’ 𝛽) 1 𝑆 = 1 + 𝛾2 βˆ— 𝑄 βˆ— 𝑆 (𝛾2 βˆ— 𝑄) + 𝑆

algebra (not important)

slide-86
SLIDE 86

A combined measure: F

Weighted (harmonic) average of Precision & Recall Balanced F1 measure: Ξ²=1 𝐺 = 1 + 𝛾2 βˆ— 𝑄 βˆ— 𝑆 (𝛾2 βˆ— 𝑄) + 𝑆 𝐺

1 = 2 βˆ— 𝑄 βˆ— 𝑆

𝑄 + 𝑆

slide-87
SLIDE 87

Micro- vs. Macro-Averaging

If we have more than one class, how do we combine multiple performance measures into one quantity? Macroaveraging: Compute performance for each class, then average. Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.

  • Sec. 15.2.4
slide-88
SLIDE 88

Micro- vs. Macro-Averaging: Example

Truth : yes Truth : no Classifier: yes 10 10 Classifier: no 10 970 Truth : yes Truth : no Classifier: yes 90 10 Classifier: no 10 890 Truth : yes Truth : no Classifier: yes 100 20 Classifier: no 20 1860

Class 1 Class 2 Micro Ave. Table

  • Sec. 15.2.4

Macroaveraged precision: (0.5 + 0.9)/2 = 0.7 Microaveraged precision: 100/120 = .83 Microaveraged score is dominated by score on common classes

slide-89
SLIDE 89

Outline

Classification Why incorporate uncertainty Classification with Bayes Rule Example: Email Classifier Evaluation