Linear Models for Classification Henrik I Christensen Robotics - - PowerPoint PPT Presentation

linear models for classification
SMART_READER_LITE
LIVE PREVIEW

Linear Models for Classification Henrik I Christensen Robotics - - PowerPoint PPT Presentation

Introduction Disc. Func. LSQ for Classification Fishers Method Perceptrons Summary Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280


slide-1
SLIDE 1

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Linear Models for Classification

Henrik I Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I Christensen (RIM@GT) Linear Classification 1 / 33

slide-2
SLIDE 2

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Outline

1

Introduction

2

Linear Discriminant Functions

3

LSQ for Classification

4

Fisher’s Discriminant Method

5

Perceptrons

6

Summary

Henrik I Christensen (RIM@GT) Linear Classification 2 / 33

slide-3
SLIDE 3

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Introduction

Last time: prediction of new functional values Today: linear classification of data

Basic pattern recognition Separation of data: buy/sell Segmentation of line data, ...

Henrik I Christensen (RIM@GT) Linear Classification 3 / 33

slide-4
SLIDE 4

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Simple Example - Bolts or Needles

Henrik I Christensen (RIM@GT) Linear Classification 4 / 33

slide-5
SLIDE 5

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Classification

Given

An input vector: X A set of classes: ci ∈ C, i = 1, . . . , k

Mapping m : X → C Separation of space into decision regions Boundaries termed decision boundaries/surfaces

Henrik I Christensen (RIM@GT) Linear Classification 5 / 33

slide-6
SLIDE 6

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Basis Formulation

It is a 1-of-K coding problem Target vector: t = (0, . . . , 1, . . . , 0) Consideration of 3 different approaches

1

Optimization of discriminant function

2

Bayesian Formulation: p(ci|x)

3

Learning & Decision fusion

Henrik I Christensen (RIM@GT) Linear Classification 6 / 33

slide-7
SLIDE 7

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Code for experimentation

There are data sets and sample code available

NETLAB: http://www.ncrg.aston.ac.uk/netlab/index.php NAVTOOLBOX: http://www.cas.kth.se/toolbox/ SLAM Dataset: http://kaspar.informatik.uni-freiburg.de/ ~slamEvaluation/datasets.php

Henrik I Christensen (RIM@GT) Linear Classification 7 / 33

slide-8
SLIDE 8

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Outline

1

Introduction

2

Linear Discriminant Functions

3

LSQ for Classification

4

Fisher’s Discriminant Method

5

Perceptrons

6

Summary

Henrik I Christensen (RIM@GT) Linear Classification 8 / 33

slide-9
SLIDE 9

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Discriminant Functions

Objective: input vector x assigned to a class ci Simple formulation: y(x) = wTx + w0 w is termed a weight vector w0 is termed a bias Two class example: c1 if y(x) ≥ 0 otherwise c2

Henrik I Christensen (RIM@GT) Linear Classification 9 / 33

slide-10
SLIDE 10

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Basic Design

Two points on decision surface xa and xb y(xa) = y(xb) = 0 ⇒ wT(xa − xb) = 0 w perpendicular to decision surface wTx ||w|| = − w0 ||w|| Define: ˜ w = (w0, w) and ˜ x = (1, x) so that: y(x) = ˜ wT ˜ x

Henrik I Christensen (RIM@GT) Linear Classification 10 / 33

slide-11
SLIDE 11

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Linear discriminant function

x2 x1 w x

y(x) w

x⊥

−w0 w

y = 0 y < 0 y > 0 R2 R1

Henrik I Christensen (RIM@GT) Linear Classification 11 / 33

slide-12
SLIDE 12

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Multi Class Discrimination

Generation of multiple decision functions yk(x) = wT

k x + wk0

Decision strategy j = arg max

i∈1..k yi(x)

Henrik I Christensen (RIM@GT) Linear Classification 12 / 33

slide-13
SLIDE 13

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Multi-Class Decision Regions

Ri Rj Rk xA xB

  • x

Henrik I Christensen (RIM@GT) Linear Classification 13 / 33

slide-14
SLIDE 14

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Example - Bolts or Needles

Henrik I Christensen (RIM@GT) Linear Classification 14 / 33

slide-15
SLIDE 15

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Minimum distance classification

Suppose we have computed the mean value for each of the classes mneedle = [0.86, 2.34]T and mbolt = [5.74, 5, 85]T We can then compute the minimum distance dj(x) = ||x − mj|| argminidi(x) is the best fit Decision functions can be derived

Henrik I Christensen (RIM@GT) Linear Classification 15 / 33

slide-16
SLIDE 16

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Bolts / Needle Decision Functions

Needle dneedle(x) = 0.86x1 + 2.34x2 − 3.10 Bolt dbolt(x) = 5.74x1 + 5.85x2 − 33.59 Decision boundary di(x) − dj(x) = 0 dneedle/bolt(x) = −4.88x1 − 3.51x2 + 30.49

Henrik I Christensen (RIM@GT) Linear Classification 16 / 33

slide-17
SLIDE 17

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Example decision surface

Henrik I Christensen (RIM@GT) Linear Classification 17 / 33

slide-18
SLIDE 18

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Outline

1

Introduction

2

Linear Discriminant Functions

3

LSQ for Classification

4

Fisher’s Discriminant Method

5

Perceptrons

6

Summary

Henrik I Christensen (RIM@GT) Linear Classification 18 / 33

slide-19
SLIDE 19

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Least Squares for Classification

Just like we could do LSQ for regression we can perform an approximation to the classification vector C Consider again yk(x) = wT

k x + wk0

Rewrite to y(x) = ˜ WT˜ x Assuming we have a target vector T

Henrik I Christensen (RIM@GT) Linear Classification 19 / 33

slide-20
SLIDE 20

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Least Squares for Classification

The error is then: ED( ˜ W) = 1 2Tr

X ˜ W − T)T(˜ X ˜ W − T)

  • The solution is then

˜ W =

  • ˜

XT ˜ X −1 ˜ XTT

Henrik I Christensen (RIM@GT) Linear Classification 20 / 33

slide-21
SLIDE 21

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

LSQ and Outliers

−4 −2 2 4 6 8 −8 −6 −4 −2 2 4 −4 −2 2 4 6 8 −8 −6 −4 −2 2 4

Henrik I Christensen (RIM@GT) Linear Classification 21 / 33

slide-22
SLIDE 22

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Outline

1

Introduction

2

Linear Discriminant Functions

3

LSQ for Classification

4

Fisher’s Discriminant Method

5

Perceptrons

6

Summary

Henrik I Christensen (RIM@GT) Linear Classification 22 / 33

slide-23
SLIDE 23

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Fisher’s linear discriminant

Selection of a decision function that maximizes distance between classes Assume for a start y = WTx Compute m1 and m2 m1 =

1 N1

  • i∈C1 xi

m2 =

1 N2

  • j∈C2 xj

Distance: m2 − m1 = wT (m2 − m1) where mi = wmi

Henrik I Christensen (RIM@GT) Linear Classification 23 / 33

slide-24
SLIDE 24

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

The suboptimal solution

−2 2 6 −2 2 4

Henrik I Christensen (RIM@GT) Linear Classification 24 / 33

slide-25
SLIDE 25

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

The Fisher criterion

Consider the expression J(w) = wTSBw wTSW w where SB is the between class covariance and SW is the within class covariance, i.e. SB = (m1 − m2)(m1 − m2)T and SW =

  • i=C1

(xi − m1)(xi − m1)T +

  • i=C2

(xi − m2)(xi − m2)T Optimized when (wTSBw)Sww = (wTSW w)SBw

  • r

w ∝ S−1

w (m2 − m1)

Henrik I Christensen (RIM@GT) Linear Classification 25 / 33

slide-26
SLIDE 26

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

The Fisher result

−2 2 6 −2 2 4

Henrik I Christensen (RIM@GT) Linear Classification 26 / 33

slide-27
SLIDE 27

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Generalization to N¿2

Define a stacked weight factor y = WTx The within class covariance generalizes to Sw =

K

  • k=1

Sk The between class covariance is SB =

K

  • k=1

Nk(mk − m)(mk − m)T It can be shown that J(w) is optimized by the eigenvectors to the equation S = S−1

W SB

Henrik I Christensen (RIM@GT) Linear Classification 27 / 33

slide-28
SLIDE 28

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Outline

1

Introduction

2

Linear Discriminant Functions

3

LSQ for Classification

4

Fisher’s Discriminant Method

5

Perceptrons

6

Summary

Henrik I Christensen (RIM@GT) Linear Classification 28 / 33

slide-29
SLIDE 29

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Perceptron Algorithm

Developed by Rosenblatt (1962) Formed an important basis for neural networks Use a non-linear transformation φ(x) Construct a decision function y(x) = f

  • wTφ(x)
  • where

f (a) = +1, a ≥ 0 −1, a < 0

Henrik I Christensen (RIM@GT) Linear Classification 29 / 33

slide-30
SLIDE 30

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

The perceptron criterion

Normally we want wtφ(xn) > 0 Given the target vector definition Ep(w) = −

  • n inM

wTφntn Where M represents all the mis-classified samples We can make this a gradient descent as seen in last lecture w(τ+1) = w(τ) − η∇EP(w) = w(τ) + ηφntn

Henrik I Christensen (RIM@GT) Linear Classification 30 / 33

slide-31
SLIDE 31

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Perceptron learning example

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1

Henrik I Christensen (RIM@GT) Linear Classification 31 / 33

slide-32
SLIDE 32

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Outline

1

Introduction

2

Linear Discriminant Functions

3

LSQ for Classification

4

Fisher’s Discriminant Method

5

Perceptrons

6

Summary

Henrik I Christensen (RIM@GT) Linear Classification 32 / 33

slide-33
SLIDE 33

Introduction

  • Disc. Func.

LSQ for Classification Fisher’s Method Perceptrons Summary

Summary

Basics for discrimination / classification Obviously not all problems are linear Optimization of the distance/overlap between classes

Minimizing the probability of error classification

Basic formulation as an optimization problem How to optimize between cluster distance? Covariance Weighted Basic recursive formulation Could we make it more robust?

Henrik I Christensen (RIM@GT) Linear Classification 33 / 33