Introduction to Machine Learning Classification: Logistic Regression - - PowerPoint PPT Presentation

introduction to machine learning classification logistic
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning Classification: Logistic Regression - - PowerPoint PPT Presentation

Introduction to Machine Learning Classification: Logistic Regression compstat-lmu.github.io/lecture_i2ml MOTIVATION A discriminant approach for directly modeling the posterior probabilities ( x | ) of the labels is logistic regression . For


slide-1
SLIDE 1

Introduction to Machine Learning Classification: Logistic Regression

compstat-lmu.github.io/lecture_i2ml

slide-2
SLIDE 2

MOTIVATION

A discriminant approach for directly modeling the posterior probabilities π(x | θ) of the labels is logistic regression. For now, let’s focus on the binary case y ∈ {0, 1} and use empirical risk minimization. arg min

θ∈Θ

Remp(θ) = arg min

θ∈Θ

n

  • i=1

L

  • y(i), π
  • x(i) | θ
  • .

A naive approach would be to model

π(x | θ) = θT x.

NB: We will often suppress the intercept in notation. Obviously this could result in predicted probabilities π(x | θ) ∈ [0, 1].

c

  • Introduction to Machine Learning – 1 / 8
slide-3
SLIDE 3

LOGISTIC FUNCTION

To avoid this, logistic regression “squashes” the estimated linear scores

θT x to [0, 1] through the logistic function s: π(x | θ) = exp

  • θT x
  • 1 + exp (θT x) =

1 1 + exp (−θT x) = s

  • θT x
  • 0.00

0.25 0.50 0.75 1.00 −10 −5 5 10

f s(f) c

  • Introduction to Machine Learning – 2 / 8
slide-4
SLIDE 4

LOGISTIC FUNCTION

The intercept shifts s(f) horizontally s(θ0 + f) =

exp(θ0+f)

1+exp(θ0+f)

0.00 0.25 0.50 0.75 1.00 −10 −5 5 10

f s θ0

−3 3

Scaling f like s(αf) =

exp(αf)

1+exp(αf): controls the slope and direction.

0.00 0.25 0.50 0.75 1.00 −10 −5 5 10

f s α

−2 −0.3 1 6

c

  • Introduction to Machine Learning – 3 / 8
slide-5
SLIDE 5

BERNOULLI / LOG LOSS

We need to define a loss function for the ERM approach: L (y, π(x)) = −y ln(π(x)) − (1 − y) ln(1 − π(x)) Penalizes confidently wrong predictions heavily Called Bernoulli, log or cross-entropy loss We can derive it from the negative log-likelihood of Bernoulli / Logistic regression model in statistics Used for many other classifiers, e.g., in NNs or boosting

2 4 6 0.00 0.25 0.50 0.75 1.00

π(x) L(y, π(x)) y

1

c

  • Introduction to Machine Learning – 4 / 8
slide-6
SLIDE 6

LOGISTIC REGRESSION IN 1D

With one feature x ∈ R. The figure shows data and x → π(x).

  • 0.00

0.25 0.50 0.75 1.00 2 4 6

x π(x) c

  • Introduction to Machine Learning – 5 / 8
slide-7
SLIDE 7

LOGISTIC REGRESSION IN 2D

Obviously, logistic regression is a linear classifier, as

π(x | θ) = s

  • θT x
  • and s is isotonic.
  • 2

4 6 2 4 6

x1 x2 y

  • FALSE

TRUE

logreg: model=FALSE Train: mmce=0.075; CV: mmce.test.mean=0.125

c

  • Introduction to Machine Learning – 6 / 8
slide-8
SLIDE 8

LOGISTIC REGRESSION IN 2D

  • 0.00

0.25 0.50 0.75 1.00 −10 −5 5 10

score prob

c

  • Introduction to Machine Learning – 7 / 8
slide-9
SLIDE 9

SUMMARY

Hypothesis Space:

H =

  • π : X → [0, 1] | π(x) = s(θT x)
  • Risk: Logistic/Bernoulli loss function.

L (y, π(x)) = −y ln(π(x)) − (1 − y) ln(1 − π(x)) Optimization: Numerical optimization, typically gradient based methods.

c

  • Introduction to Machine Learning – 8 / 8