Statistical Natural Language Processing Classifjcation ar ltekin - - PowerPoint PPT Presentation

statistical natural language processing
SMART_READER_LITE
LIVE PREVIEW

Statistical Natural Language Processing Classifjcation ar ltekin - - PowerPoint PPT Presentation

Statistical Natural Language Processing Classifjcation ar ltekin University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2019 Introduction Perceptron Summer Semester 2019 SfS / University of Tbingen .


slide-1
SLIDE 1

Statistical Natural Language Processing

Classifjcation Çağrı Çöltekin

University of Tübingen Seminar für Sprachwissenschaft

Summer Semester 2019

slide-2
SLIDE 2

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

When/why do we do classifjcation

  • Is a given email spam or not?
  • What is the gender of the author of a document?
  • Is a product review positive or negative?
  • Who is the author of a document?
  • What is the subject of an article?

As opposed to regression the outcome is a ‘category’.

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 1 / 40

slide-3
SLIDE 3

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

When/why do we do classifjcation

  • Is a given email spam or not?
  • What is the gender of the author of a document?
  • Is a product review positive or negative?
  • Who is the author of a document?
  • What is the subject of an article?

As opposed to regression the outcome is a ‘category’.

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 1 / 40

slide-4
SLIDE 4

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

The task

  • Given a set of training data

with (categorical) labels Train a model to predict future data points from the same distribution x2 x1 + + + + − − − −

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 2 / 40

slide-5
SLIDE 5

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

The task

  • Given a set of training data

with (categorical) labels

  • Train a model to predict

future data points from the same distribution x2 x1 ? + + + + − − − −

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 2 / 40

slide-6
SLIDE 6

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Outline

  • Perceptron
  • Logistic regression
  • Naive Bayes
  • Multi-class strategies for binary classifjers
  • Evaluation metrics for classifjcation
  • Brief notes on what we skipped

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 3 / 40

slide-7
SLIDE 7

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

The perceptron

x2 x1 . . . xn w1 w2 wn y y = f ( n ∑

i

wixi ) where f(x) = { +1 if ∑n

i wixi > 0

−1

  • therwise

Similar to the intercept in linear models, an additional input which is always set to one is often used (called bias in ANN literature)

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 4 / 40

slide-8
SLIDE 8

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

The perceptron

x2 x1 . . . xn w1 w2 wn y x0 = 1 w0 y = f ( n ∑

i

wixi ) where f(x) = { +1 if ∑n

i wixi > 0

−1

  • therwise

Similar to the intercept in linear models, an additional input x0 which is always set to one is often used (called bias in ANN literature)

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 4 / 40

slide-9
SLIDE 9

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

The perceptron: in plain words

x2 x1 . . . xn w1 w2 wn y x0 = 1 w0

  • Sum all input xi weighted with

corresponding weight wi

  • Classify the input using a threshold

function

positive the sum is larger than 0 negative otherwise

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 5 / 40

slide-10
SLIDE 10

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Learning with perceptron

  • We do not update the parameters if classifjcation is correct
  • For misclassifjed examples, we try to minimize

E(w) = − ∑

i

wxiyi where i ranges over all misclassifjed examples

  • Perceptron algorithm updates the weights such that

w ← w − η∇E(w) w ← w + ηxiyi for misclassifjed examples. η is the learning rate

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 6 / 40

slide-11
SLIDE 11

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

The perceptron algorithm

  • The perceptron algorithm can be
  • nline update weights for a single misclassifjed example

batch updates weights for all misclassifjed examples at once

  • The perceptron algorithm converges to the global minimum if the classes are

linearly separable

  • If the classes are not linearly separable, the perceptron algorithm will not stop
  • We do not know whether the classes are linearly separable or not before the

algorithm converges

  • In practice, one can set a stopping condition, such as

– Maximum number iterations/updates – Number of misclassifjed examples – Number of iterations without improvement

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 7 / 40

slide-12
SLIDE 12

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

− − − + + + +

(0)

  • 1. Randomly initialize w the decision

boundary is orthogonal to w

  • 2. Pick a misclassifjed example xi add

yixi to w

  • 3. Set w ← w + yixi, go to step 2 until

convergence

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 40

slide-13
SLIDE 13

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

− − − + + + +

w (1)

  • 1. Randomly initialize w the decision

boundary is orthogonal to w

  • 2. Pick a misclassifjed example xi add

yixi to w

  • 3. Set w ← w + yixi, go to step 2 until

convergence

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 40

slide-14
SLIDE 14

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

− − − + + + +

w (2)

  • 1. Randomly initialize w the decision

boundary is orthogonal to w

  • 2. Pick a misclassifjed example xi add

yixi to w

  • 3. Set w ← w + yixi, go to step 2 until

convergence

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 40

slide-15
SLIDE 15

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

− − − + + + +

w (3)

  • 1. Randomly initialize w the decision

boundary is orthogonal to w

  • 2. Pick a misclassifjed example xi add

yixi to w

  • 3. Set w ← w + yixi, go to step 2 until

convergence

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 40

slide-16
SLIDE 16

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

− − − + + + +

w (4)

  • 1. Randomly initialize w the decision

boundary is orthogonal to w

  • 2. Pick a misclassifjed example xi add

yixi to w

  • 3. Set w ← w + yixi, go to step 2 until

convergence

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 40

slide-17
SLIDE 17

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

− − − + + + +

w (5)

  • 1. Randomly initialize w the decision

boundary is orthogonal to w

  • 2. Pick a misclassifjed example xi add

yixi to w

  • 3. Set w ← w + yixi, go to step 2 until

convergence

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 8 / 40

slide-18
SLIDE 18

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Perceptron: a bit of history

  • The perceptron was developed in late 1950’s and early 1960’s (Rosenblatt

1958)

  • It caused excitement in many fjelds including computer science, artifjcial

intelligence, cognitive science

  • The excitement (and funding) died away in early 1970’s (after the criticism by

Minsky and Papert 1969)

  • The main issue was the fact that the perceptron algorithm cannot handle

problems that are not linearly separable

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 9 / 40

slide-19
SLIDE 19

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Logistic regression

  • Logistic regression is a classifjcation method
  • In logistic regression, we fjt a model that predicts P(y | x)
  • Logistic regression is an extension of linear regression

– it is a member of the family of models called generalized linear models

  • Typically formulated for binary classifjcation, but it has a natural extension to

multiple classes

  • The multi-class logistic regression is often called maximum-entropy model (or

max-ent) in the NLP literature

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 10 / 40

slide-20
SLIDE 20

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Data for logistic regression

an example with a single predictor

−2 −1 1 2 0.5 1

x y Why not just use linear regression? What is ? Is RMS error appropriate?

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 11 / 40

slide-21
SLIDE 21

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Data for logistic regression

an example with a single predictor

−2 −1 1 2 0.5 1

x y

  • Why not just use linear regression?
  • What is P(y | x = 2)?
  • Is RMS error appropriate?

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 11 / 40

slide-22
SLIDE 22

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Fixing the outcome: transforming the output variable

  • The prediction we are interested in is ˆ

y = P(y = 1|x)

  • We transform it with logit function:

logit(ˆ y) = log ˆ y 1 − ˆ y = w0 + w1x

  • ˆ

y 1−ˆ y (odds) is bounded between 0 and ∞

  • log

ˆ y 1−ˆ y (log odds) is bounded between −∞ and ∞

  • we can estimate logit(ˆ

y) with regression, transform with the inverse of logit() ˆ y = ew0+w1x 1 + ew0+w1x = 1 1 + e−w0−w1x which is called logistic (sigmoid) function

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 12 / 40

slide-23
SLIDE 23

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Logistic function

−6 −4 −2 2 4 6 0.5 1 x logistic(x) = 1 1 + e−x

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 13 / 40

slide-24
SLIDE 24

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

How to fjt a logistic regression model

with maximum-likelihood estimation

P(y = 1 | x) = p = 1 1 + e−wx P(y = 0 | x) = 1 − p = e−wx 1 + e−wx The likelihood of the training set is, L(w) = ∏

i

pyi(1 − p)1−yi In practice, we maximize log likelihood, or minimize − log likelihood: − log L(w) = − ∑

i

yi log p + (1 − yi) log(1 − p)

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 14 / 40

slide-25
SLIDE 25

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

How to fjt a logistic regression model (2)

  • Bad news: there is no analytic solution
  • Good news: the (negative) log likelihood is a convex function
  • We can use iterative methods such as gradient descent to fjnd parameters that

maximize the (log) likelihood

  • Using gradient descent, we repeat

w ← w − η∇J(w) until convergence, η is the learning rate

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 15 / 40

slide-26
SLIDE 26

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Example logistic-regression

back to the example with a single predictor

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 0.25 0.5 0.75 1

p =

1 1+e0.33+2.40x

2.40x + 0.33 = 0

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 16 / 40

slide-27
SLIDE 27

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Another example

two predictors

1 2 3 4 1 2 3 4

. 1 + 2 . 5 3 x1 − 2 . 5 8 x2 = p = 1 e−0.1−2.53x1+2.58x2

x1 y2 Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 17 / 40

slide-28
SLIDE 28

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Multi-class logistic regression

  • Generalizing logistic regression to more than two classes is straightforward
  • We estimate,

P(Ck | x) = ewkx ∑

j ewjx

where Ck is the kth class, j iterates over all classes.

  • The function is called the softmax function, used frequently in neural network

models as well

  • This model is also known as log-linear model, Maximum entropy model,

Boltzmann machine

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 18 / 40

slide-29
SLIDE 29

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes classifjer

  • Naive Bayes classifjer is a well-known simple classifjer
  • It was found to be efgective on a number tasks, primarily in document

classifjcation

  • Popularized by practical spam detection applications
  • Naive part comes from a strong independence assumption
  • Bayes part comes from use of Bayes’ formula for inverting conditional

probabilities However, learning is (typically) ‘not really’ Bayesian

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 19 / 40

slide-30
SLIDE 30

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes classifjer

  • Naive Bayes classifjer is a well-known simple classifjer
  • It was found to be efgective on a number tasks, primarily in document

classifjcation

  • Popularized by practical spam detection applications
  • Naive part comes from a strong independence assumption
  • Bayes part comes from use of Bayes’ formula for inverting conditional

probabilities

  • However, learning is (typically) ‘not really’ Bayesian

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 19 / 40

slide-31
SLIDE 31

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes: estimation

  • Given a set of features x, we want to know the class y of the object we want to

classify

  • During prediction time we pick the class, ˆ

y ˆ y = arg max

y

P(y | x)

  • Instead of directly estimating the conditional probability, we invert it using

the Bayes’ formula ˆ y = arg max

y

P(x | y)P(y) P(x) = arg max

y

P(x | y)P(y)

  • Now the task becomes estimating P(x | y) and P(y)

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 20 / 40

slide-32
SLIDE 32

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes: estimation (cont.)

  • Class distribution, P(y), is estimated using the MLE on the training set
  • With many features, x = (x1, x2, . . . xn), P(x | y) is diffjcult to estimate
  • Naive Bayes estimator makes a conditional independence assumption: given

the class, we assume that the features are independent of each other P(x | y) = P(x1, x2, . . . xn | y) =

n

i=1

P(xi | y)

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 21 / 40

slide-33
SLIDE 33

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes: estimation (cont.)

  • The probability distributions P(xi | y) and P(y) are typically estimated using

MLE (count and divide)

  • A smoothing technique may be used for unknown features (e.g., words)
  • Note that P(xi | y) can be

binomial e.g, whether a word occurs in the document or not categorical e.g, estimated using relative frequency of words continuous the data is distributed according to a known distribution

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 22 / 40

slide-34
SLIDE 34

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes

a simple example: spam detection

Training data: features present label good book NS now book free S medication lose weight S technology advanced book NS now advanced technology S A test instance: {book, technology} Another one: {good, medication} , medication 1/5 free 1/5 technology 1/5 1/5 advanced 1/5 1/5 book 1/5 2/5 now 1/5 lose 1/5 weight 1/5 good 1/5

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 23 / 40

slide-35
SLIDE 35

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes

a simple example: spam detection

Training data: features present label good book NS now book free S medication lose weight S technology advanced book NS now advanced technology S A test instance: {book, technology} Another one: {good, medication} P(S) = 3/5, P(NS) = 2/5 w P(w | S) P(w | NS) medication 1/5 free 1/5 technology 1/5 1/5 advanced 1/5 1/5 book 1/5 2/5 now 1/5 lose 1/5 weight 1/5 good 1/5

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 23 / 40

slide-36
SLIDE 36

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes

a simple example: spam detection

Training data: features present label good book NS now book free S medication lose weight S technology advanced book NS now advanced technology S

  • A test instance: {book, technology}

Another one: {good, medication} P(S) = 3/5, P(NS) = 2/5 w P(w | S) P(w | NS) medication 1/5 free 1/5 technology 1/5 1/5 advanced 1/5 1/5 book 1/5 2/5 now 1/5 lose 1/5 weight 1/5 good 1/5

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 23 / 40

slide-37
SLIDE 37

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Naive Bayes

a simple example: spam detection

Training data: features present label good book NS now book free S medication lose weight S technology advanced book NS now advanced technology S

  • A test instance: {book, technology}
  • Another one: {good, medication}

P(S) = 3/5, P(NS) = 2/5 w P(w | S) P(w | NS) medication 1/5 free 1/5 technology 1/5 1/5 advanced 1/5 1/5 book 1/5 2/5 now 1/5 lose 1/5 weight 1/5 good 1/5

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 23 / 40

slide-38
SLIDE 38

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Classifying classifjcation methods

another short digression

  • Some classifjcation algorithms are non-probabilistic, discriminative: they

return a label for a given input. Examples: perceptron, SVMs, decision trees

  • Some classifjcation algorithms are discriminative, probabilistic: they estimate

the conditional probability distribution p(c | x) directly. Examples: logistic regression, (most) neural networks

  • Some classifjcation algorithms are generative: they estimate the joint

distribution p(c, x). Examples: naive Bayes, Hidden Markov Models, (some) neural models

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 24 / 40

slide-39
SLIDE 39

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

More than two classes

  • Some algorithms can naturally be extended to handle multiple class labels
  • Any binary classifjer can be turned into a k-way classifjer by

OvR one-vs-rest or one-vs-all

  • train k classifjers: each learns to discriminate one of the classes from the others
  • at prediction time the classifjer with the highest confjdence wins
  • needs confjdence score from the base classifjers

OvO one-vs-one

  • train k(k−1)

2

classifjers: each learns to discriminate a pair of classes

  • decision is made by (weighted) majority vote
  • works without need for confjdence scores, but needs more classifjers

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 25 / 40

slide-40
SLIDE 40

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

One vs. Rest

x1 x2 + + + + − − − × × × ×

  • For 3 classes, we fjt 3 classifjers

separating one class from the rest Some regions of the feature space will be ambiguous We can assign labels based on probability or weight value, if classifjer returns one One-vs.-one and majority voting is another option

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 26 / 40

slide-41
SLIDE 41

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

One vs. Rest

x1 x2 + + + + − − − × × × ×

  • For 3 classes, we fjt 3 classifjers

separating one class from the rest Some regions of the feature space will be ambiguous We can assign labels based on probability or weight value, if classifjer returns one One-vs.-one and majority voting is another option

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 26 / 40

slide-42
SLIDE 42

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

One vs. Rest

x1 x2 + + + + − − − × × × ×

  • For 3 classes, we fjt 3 classifjers

separating one class from the rest Some regions of the feature space will be ambiguous We can assign labels based on probability or weight value, if classifjer returns one One-vs.-one and majority voting is another option

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 26 / 40

slide-43
SLIDE 43

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

One vs. Rest

x1 x2 + + + + − − − × × × ×

  • For 3 classes, we fjt 3 classifjers

separating one class from the rest Some regions of the feature space will be ambiguous We can assign labels based on probability or weight value, if classifjer returns one One-vs.-one and majority voting is another option

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 26 / 40

slide-44
SLIDE 44

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

One vs. Rest

x1 x2 + + + + − − − × × × ×

  • For 3 classes, we fjt 3 classifjers

separating one class from the rest

  • Some regions of the feature space

will be ambiguous We can assign labels based on probability or weight value, if classifjer returns one One-vs.-one and majority voting is another option

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 26 / 40

slide-45
SLIDE 45

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

One vs. Rest

x1 x2 + + + + − − − × × × ×

  • For 3 classes, we fjt 3 classifjers

separating one class from the rest

  • Some regions of the feature space

will be ambiguous

  • We can assign labels based on

probability or weight value, if classifjer returns one

  • One-vs.-one and majority voting is

another option

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 26 / 40

slide-46
SLIDE 46

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Maximum-margin methods (e.g., SVMs)

− − − + + + +

  • In perceptron, we stopped whenever

we found a linear discriminator Maximum-margin classifjers seek a discriminator that maximizes the margin SVMs have other interesting properties, and they have been one

  • f the best ‘out-of-the-box’ classifjers

for many problems

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 27 / 40

slide-47
SLIDE 47

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Maximum-margin methods (e.g., SVMs)

− − − + + + +

  • In perceptron, we stopped whenever

we found a linear discriminator

  • Maximum-margin classifjers seek a

discriminator that maximizes the margin

  • SVMs have other interesting

properties, and they have been one

  • f the best ‘out-of-the-box’ classifjers

for many problems

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 27 / 40

slide-48
SLIDE 48

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

A quick survey of some solutions

Decision trees

x1 x2 + + + + + + − − − − − − − ? a1 a2 x2 < a2

x1 < a1

+ −

yes n

  • no

y e s Note that the decision boundary is non-linear

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 28 / 40

slide-49
SLIDE 49

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

A quick survey of some solutions

Decision trees

x1 x2 + + + + + + − − − − − − − ? a1 a2 x2 < a2

x1 < a1

+ −

yes n

  • no

y e s

  • Note that the decision boundary is

non-linear

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 28 / 40

slide-50
SLIDE 50

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

A quick survey of some solutions

Instance/memory based methods

x1 x2 + + + + + + − − − − − − − ?

  • No training: just memorize the

instances

  • During test time, decide based on

the k nearest neighbors

  • Like decision trees, kNN is

non-linear

  • It can also be used for regression

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 29 / 40

slide-51
SLIDE 51

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

A quick survey of some solutions

Artifjcial neural networks

x1 x2 + + + + + + − − − − − − − x1 x2 y

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 30 / 40

slide-52
SLIDE 52

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Measuring success in classifjcation

Accuracy

  • In classifjcation, we do not care (much) about the average of the error function
  • We are interested in how many of our predictions are correct
  • Accuracy measures this directly

accuracy = number of correct predictions total number of predictions

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 31 / 40

slide-53
SLIDE 53

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Accuracy may go wrong

  • Think about a ‘dummy’ search engine that always returns an empty

document set (no results found)

  • If we have

– 1 000 000 documents – 1000 relevant documents (including the terms in the query)

the accuracy is: In general, if our class distribution is skewed, of imbalanced, accuracy will be a bad indicator of success

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 32 / 40

slide-54
SLIDE 54

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Accuracy may go wrong

  • Think about a ‘dummy’ search engine that always returns an empty

document set (no results found)

  • If we have

– 1 000 000 documents – 1000 relevant documents (including the terms in the query)

the accuracy is: 999 000 1 000 000 = 99.90 %

  • In general, if our class distribution is skewed, of imbalanced, accuracy will be a

bad indicator of success

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 32 / 40

slide-55
SLIDE 55

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Measuring success in classifjcation

Precision, recall, F-score

precision = TP TP + FP recall = TP TP + FN F1-score = 2 × precision × recall precision + recall true value positive negative pos. TP FP neg. FN TN predicted

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 33 / 40

slide-56
SLIDE 56

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Example: back to the search engine

  • We had a ‘dummy’ search engine that returned false for all queries
  • For a query

– 1 000 000 documents – 1000 relevant documents

accuracy = 999 000 1 000 000 = 99.90 % precision = 1 000 000 = 0 % recall = 1 000 000 = 0 % Precision and recall are asymmetric, the choice of the ‘positive’ class is important.

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 34 / 40

slide-57
SLIDE 57

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Classifjer evaluation: another example

Consider the following two classifjers: true value positive negative pos. 7 9 neg. 3 1 true value positive negative 1 3 9 7 predicted Accuracy both Precision and Recall and F-score and

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 35 / 40

slide-58
SLIDE 58

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Classifjer evaluation: another example

Consider the following two classifjers: true value positive negative pos. 7 9 neg. 3 1 true value positive negative 1 3 9 7 predicted Accuracy both 8/20 = 0.4 Precision 7/16 = 0.44 and 1/4 = 0.25 Recall 7/10 = 0.7 and 1/10 = 0.1 F-score 0.54 and 0.14

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 35 / 40

slide-59
SLIDE 59

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Multi-class evaluation

  • For multi-class problems, it is common to report average

precision/recall/f-score

  • For C classes, averaging can be done two ways:

precisionM = ∑C

i TPi TPi+FPi

C recallM = ∑C

i TPi TPi+FNi

C precisionµ = ∑C

i TPi

∑C

i TPi + FPi

recallµ = ∑C

i TPi

∑C

i TPi + FNi

(M = macro, µ = micro)

  • The averaging can also be useful for binary classifjcation, if there is no natural

positive class

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 36 / 40

slide-60
SLIDE 60

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Confusion matrix

  • A confusion matrix is often useful for multi-class classifjcation tasks

true class negative neutral positive negative 10 3 4 neutral 2 12 8 positive 7 7 predicted

  • Are the classes balanced?
  • What is the accuracy?
  • What is per-class, and averaged precision/recall?

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 37 / 40

slide-61
SLIDE 61

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Precision–recall trade-ofg

  • Increasing precision (e.g., by

changing a hyperparameter) results in decreasing recall

  • Precision–recall graphs are useful

for picking the correct models

  • Area under the curve (AUC) is

another indication of success of a classifjer

0.5 1 0.5 1 recall precision

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 38 / 40

slide-62
SLIDE 62

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Performance metrics a summary

  • Accuracy does not refmect the classifjer performance when class distribution is

skewed

  • Precision and recall are binary and asymmetric
  • For multi-class problems, calculating accuracy is straightforward, but others

measures need averaging

  • These are just the most common measures: there are more
  • You should understand what these metrics measure, and use/report the

metric that is useful for the purpose

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 39 / 40

slide-63
SLIDE 63

Introduction Perceptron Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation

Summary

  • We discussed three basic classifjcation techniques: perceptron, logistic

regression, naive Bayes

  • We left out many others: SVMs, decision trees, …
  • We also did not discuss a few other interesting cases, including multi-label

classifjcation

  • We will discuss some (non-linear) classifjcation methods next

Next Next ML evaluation, quick summary so far Fri Introduction to neural networks

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 40 / 40

slide-64
SLIDE 64

Additional reading, references, credits

  • Hastie, Tibshirani, and Friedman (2009) covers logistic regression in section

4.4 and perceptron in section 4.5

  • Jurafsky and Martin (2009) explains it in section 6.6, and it is moved to its
  • wn chapter (7) in the draft third edition

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second. Springer series in statistics. Springer-Verlag New York. isbn: 9780387848587. url: http://web.stanford.edu/~hastie/ElemStatLearn/. Jurafsky, Daniel and James H. Martin (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. second. Pearson Prentice Hall. isbn: 978-0-13-504196-3. Minsky, Marvin and Seymour Papert (1969). Perceptrons: An introduction to computational geometry. MIT Press. Rosenblatt, Frank (1958). “The perceptron: a probabilistic model for information storage and organization in the brain.”. In: Psychological review 65.6,

  • pp. 386–408.

Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 A.1