Introduction to Machine Learning Classification: Discriminant - - PowerPoint PPT Presentation

introduction to machine learning classification
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning Classification: Discriminant - - PowerPoint PPT Presentation

Introduction to Machine Learning Classification: Discriminant Analysis compstat-lmu.github.io/lecture_i2ml LINEAR DISCRIMINANT ANALYSIS (LDA) LDA follows a generative approach k ( x ) = P ( y = k | x ) = P ( x | y = k ) P ( y = k ) p ( x | y


slide-1
SLIDE 1

Introduction to Machine Learning Classification: Discriminant Analysis

compstat-lmu.github.io/lecture_i2ml

slide-2
SLIDE 2

LINEAR DISCRIMINANT ANALYSIS (LDA)

LDA follows a generative approach

πk(x) = P(y = k | x) = P(x|y = k)P(y = k) P(x) =

p(x|y = k)πk

g

  • j=1

p(x|y = j)πj where we now have to pick a distributional form for p(x|y = k).

c

  • Introduction to Machine Learning – 1 / 10
slide-3
SLIDE 3

LINEAR DISCRIMINANT ANALYSIS (LDA)

LDA assumes that each class density is modeled as a multivariate Gaussian: p(x|y = k) = 1

(2π)

p 2 |Σ| 1 2

exp

  • −1

2(x − µk)TΣ−1(x − µk)

  • with equal covariance, i. e. Σk = Σ

∀k.

5 10 15 5 10 15

X1 X2 c

  • Introduction to Machine Learning – 2 / 10
slide-4
SLIDE 4

LINEAR DISCRIMINANT ANALYSIS (LDA)

Parameters θ are estimated in a straight-forward manner by estimating

ˆ πk =

nk/n, where nk is the number of class k observations

ˆ µk =

1 nk

  • i:y(i)=k

x(i)

ˆ Σ =

1 n − g

g

  • k=1
  • i:y(i)=k

(x(i) − ˆ µk)(x(i) − ˆ µk)T

5 10 15 5 10 15

X1 X2 c

  • Introduction to Machine Learning – 3 / 10
slide-5
SLIDE 5

LDA AS LINEAR CLASSIFIER

Because of the equal covariance structure of all class-specific Gaussian, the decision boundaries of LDA are linear.

  • 0.0

0.5 1.0 1.5 2.0 2.5 2 4 6

Petal.Length Petal.Width Species

  • setosa

versicolor virginica

c

  • Introduction to Machine Learning – 4 / 10
slide-6
SLIDE 6

LDA AS LINEAR CLASSIFIER

We can formally show that LDA is a linear classifier, by showing that the posterior probabilities can be written as linear scoring functions - up to any isotonic / rank-preserving transformation.

πk(x) = πk · p(x|y = k)

p(x)

= πk · p(x|y = k)

g

  • j=1

πj · p(x|y = j)

As the denominator is the same for all classes we only need to consider

πk · p(x|y = k)

and show that this can be written as a linear function of x.

c

  • Introduction to Machine Learning – 5 / 10
slide-7
SLIDE 7

LDA AS LINEAR CLASSIFIER

πk · p(x|y = k) ∝ πk exp

  • − 1

2xTΣ−1x − 1 2µT k Σ−1µk + xTΣ−1µk

  • =

exp

  • log πk − 1

2µT k Σ−1µk + xTΣ−1µk

  • exp
  • − 1

2xTΣ−1x

  • =

exp

  • θ0k + xTθk
  • exp
  • − 1

2xTΣ−1x

exp

  • θ0k + xTθk
  • by defining θ0k := log πk − 1

2µT k Σ−1µk and θk := Σ−1µk.

We have again left-out all constants which are the same for all classes k, so the normalizing constant of our Gaussians and exp

  • − 1

2xTΣ−1x

  • )

By finally taking the log, we can write our transformed scores as linear: fk(x) = θ0k + xTθk

c

  • Introduction to Machine Learning – 6 / 10
slide-8
SLIDE 8

QUADRATIC DISCRIMINANT ANALYSIS (QDA)

QDA is a direct generalization of LDA, where the class densities are now Gaussians with unequal covariances Σk. p(x|y = k) = 1

(2π)

p 2 |Σk| 1 2

exp

  • −1

2(x − µk)TΣ−1

k (x − µk)

  • Parameters are estimated in a straight-forward manner by:

ˆ πk =

nk n , where nk is the number of class k observations

ˆ µk =

1 nk

  • i:y(i)=k

x(i)

ˆ Σk =

1 nk − 1

  • i:y(i)=k

(x(i) − ˆ µk)(x(i) − ˆ µk)T

c

  • Introduction to Machine Learning – 7 / 10
slide-9
SLIDE 9

QUADRATIC DISCRIMINANT ANALYSIS (QDA)

Covariance matrices can differ over classes. Yields better data fit but also requires estimation of more parameters.

5 10 15 5 10 15

X1 X2 c

  • Introduction to Machine Learning – 8 / 10
slide-10
SLIDE 10

QUADRATIC DISCRIMINANT ANALYSIS (QDA)

πk(x) ∝ πk · p(x|y = k) ∝ πk|Σk|− 1

2 exp(−1

2xTΣ−1

k x − 1

2µT

k Σ−1 k µk + xTΣ−1 k µk)

Taking the log of the above, we can define a discriminant function that is quadratic in x.

log πk − 1

2 log |Σk| − 1 2µT

k Σ−1 k µk + xTΣ−1 k µk − 1

2xTΣ−1

k x

c

  • Introduction to Machine Learning – 9 / 10
slide-11
SLIDE 11

QUADRATIC DISCRIMINANT ANALYSIS (QDA)

  • 0.0

0.5 1.0 1.5 2.0 2.5 2 4 6

Petal.Length Petal.Width Species

  • setosa

versicolor virginica

c

  • Introduction to Machine Learning – 10 / 10