Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

sparse kernel machines svm
SMART_READER_LITE
LIVE PREVIEW

Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

Introduction Maximum Margin Multiple Classes Regression Example Summary Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu


slide-1
SLIDE 1

Introduction Maximum Margin Multiple Classes Regression Example Summary

Sparse Kernel Machines - SVM

Henrik I. Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I. Christensen (RIM@GT) Support Vector Machines 1 / 42

slide-2
SLIDE 2

Introduction Maximum Margin Multiple Classes Regression Example Summary

Outline

1

Introduction

2

Maximum Margin Classifiers

3

Multi-Class SVM’s

4

The regression case

5

Small Example

6

Summary

Henrik I. Christensen (RIM@GT) Support Vector Machines 2 / 42

slide-3
SLIDE 3

Introduction Maximum Margin Multiple Classes Regression Example Summary

Introduction

Last time we talked about Kernels and Memory Based Models Estimate the full GRAM matrix can pose a major challenge Desirable to store only the “relevant” data Two possible solutions discussed

1

Support Vector Machines (Vapnik, et al.)

2

Relevance Vector Machines

Main difference in how posterior probabilities are handled Small robotics example at end to show SVM performance

Henrik I. Christensen (RIM@GT) Support Vector Machines 3 / 42

slide-4
SLIDE 4

Introduction Maximum Margin Multiple Classes Regression Example Summary

Outline

1

Introduction

2

Maximum Margin Classifiers

3

Multi-Class SVM’s

4

The regression case

5

Small Example

6

Summary

Henrik I. Christensen (RIM@GT) Support Vector Machines 4 / 42

slide-5
SLIDE 5

Introduction Maximum Margin Multiple Classes Regression Example Summary

Maximum Margin Classifiers - Preliminaries

Lets initially consider a linear two-class problems y(x) = wTφ(x) + b with φ(.) being a feature space transformation and b is the bias factor Given a training dataset xi, i ∈ {1...N} Target values ti, i ∈ {1...N}, ti ∈ {−1, 1} Assume for now that there is a linear solution to the problem

Henrik I. Christensen (RIM@GT) Support Vector Machines 5 / 42

slide-6
SLIDE 6

Introduction Maximum Margin Multiple Classes Regression Example Summary

The objective

The objective here is to optimize the margin Let’s just keep the points at the margin

y = 1 y = 0 y = −1 margin y = 1 y = 0 y = −1

Henrik I. Christensen (RIM@GT) Support Vector Machines 6 / 42

slide-7
SLIDE 7

Introduction Maximum Margin Multiple Classes Regression Example Summary

Recap distances and metrics

x2 x1 w x

y(x) w

x⊥

−w0 w

y = 0 y < 0 y > 0 R2 R1

Henrik I. Christensen (RIM@GT) Support Vector Machines 7 / 42

slide-8
SLIDE 8

Introduction Maximum Margin Multiple Classes Regression Example Summary

The objective function

We know that y(x) and t are supposed to have the same sign so that y(x)t > 0, i.e. tny(xn) ||w|| = tn(wTφ(xn) + b) ||w|| The solution is then arg max

w,b

1 ||w|| min

n

  • tn(wTφ(xn) + b)
  • We can scale w and b without loss of generality.

Scale parameters to make the key vector points tn

  • wTφ(xn) + b
  • = 1

Then for all data points it is true tn

  • wTφ(xn) + b
  • ≥ 1

Henrik I. Christensen (RIM@GT) Support Vector Machines 8 / 42

slide-9
SLIDE 9

Introduction Maximum Margin Multiple Classes Regression Example Summary

Parameter estimation

We need to optimize ||w||−1 which can be seen as minimizing ||w||2 subject to the margin requirements In Lagrange terms this is then L(w, b, a) = 1 2||w||2 −

N

  • n=1

an

  • tn
  • wTφ(xn) + b
  • − 1
  • Analyzing partial derivatives gives us

w =

N

  • n=1

antnφ(xn) =

N

  • n=1

antn

Henrik I. Christensen (RIM@GT) Support Vector Machines 9 / 42

slide-10
SLIDE 10

Introduction Maximum Margin Multiple Classes Regression Example Summary

Parameter estimation

Eliminating w and b from the objective function we have L(a) =

N

  • n=1

an − 1 2

N

  • n=1

N

  • m=1

anamtntmk(xn, xm) This is a quadratic optimization problem - see in a minute We can evaluate new points using the form y(x) =

N

  • n=1

antnk(x, xn)

Henrik I. Christensen (RIM@GT) Support Vector Machines 10 / 42

slide-11
SLIDE 11

Introduction Maximum Margin Multiple Classes Regression Example Summary

Estimation of the bias

Once w has been estimated we can use that for estimation of the bias b = 1 NS

  • n∈S
  • tn −
  • m∈S

amtmk(xn, xm)

  • Henrik I. Christensen (RIM@GT)

Support Vector Machines 11 / 42

slide-12
SLIDE 12

Introduction Maximum Margin Multiple Classes Regression Example Summary

Illustrative Synthetic Example

Henrik I. Christensen (RIM@GT) Support Vector Machines 12 / 42

slide-13
SLIDE 13

Introduction Maximum Margin Multiple Classes Regression Example Summary

Status

We have formulated the objective function Still not clear how we will solve it! We have assumed the classes are separable How about more messy data?

Henrik I. Christensen (RIM@GT) Support Vector Machines 13 / 42

slide-14
SLIDE 14

Introduction Maximum Margin Multiple Classes Regression Example Summary

Overlapping class distributions

Assume some data cannot be correctly classified Lets define a margin distance ξn = |tn − y(xn)| Consider

1

ξ < 0 - correct classification

2

ξ = 0 - at the margin / decision boundary

3

ξ ∈ [0; 1] between decision boundary and margin

4

ξ ∈ [1; 2] between margin and other boundary

5

ξ > 2 - the point is definitely misclassified

Henrik I. Christensen (RIM@GT) Support Vector Machines 14 / 42

slide-15
SLIDE 15

Introduction Maximum Margin Multiple Classes Regression Example Summary

Overlap in margin

y = 1 y = 0 y = −1

ξ > 1 ξ < 1 ξ = 0 ξ = 0

Henrik I. Christensen (RIM@GT) Support Vector Machines 15 / 42

slide-16
SLIDE 16

Introduction Maximum Margin Multiple Classes Regression Example Summary

Recasting the problem

Optimizing not just for w but also for misclassification So we have C

N

  • n=1

ξn + 1 2||w|| where C is a regularization coefficient. We have a new objective function L(w, b, a) = 1 2||w||2 +C

N

  • n+1

ξn −

N

  • n=1

an {tny(xn) − 1 + ξn}−

N

  • n=1

µnξn where a and µ are Lagrange multipliers

Henrik I. Christensen (RIM@GT) Support Vector Machines 16 / 42

slide-17
SLIDE 17

Introduction Maximum Margin Multiple Classes Regression Example Summary

Optimization

As before we can derivate partial derivatives and find the extrema. The resulting objective function is then L(a) =

N

  • n=1

an − 1 2

N

  • n=1

N

  • m=1

anamtntmk(xn, xm) which is like before bit the constraints are a little different

0 ≤ an ≤ C and N

n=1 antn = 0

which is across all training samples Many training samples will have an = 0 which is the same as saying they are not at the margin.

Henrik I. Christensen (RIM@GT) Support Vector Machines 17 / 42

slide-18
SLIDE 18

Introduction Maximum Margin Multiple Classes Regression Example Summary

Generating a solution

Solutions are generated through analysis of all training date Re-organization enable some optimization (Vapnik, 1982) Sequential minimal optimization is a common approach (Platt, 2000)

Considers pairwise interaction between Lagrange multipliers

Complexity is somewhere between linear and quadratic

Henrik I. Christensen (RIM@GT) Support Vector Machines 18 / 42

slide-19
SLIDE 19

Introduction Maximum Margin Multiple Classes Regression Example Summary

Mixed example

−2 2 −2 2

Henrik I. Christensen (RIM@GT) Support Vector Machines 19 / 42

slide-20
SLIDE 20

Introduction Maximum Margin Multiple Classes Regression Example Summary

Outline

1

Introduction

2

Maximum Margin Classifiers

3

Multi-Class SVM’s

4

The regression case

5

Small Example

6

Summary

Henrik I. Christensen (RIM@GT) Support Vector Machines 20 / 42

slide-21
SLIDE 21

Introduction Maximum Margin Multiple Classes Regression Example Summary

Multi-Class SVMs

This far the discussion has been for the two-class problem How to extend to K classes?

1

One versus the rest

2

Hierarchical Trees - One vs One

3

Coding the classes to generate a new problem

Henrik I. Christensen (RIM@GT) Support Vector Machines 21 / 42

slide-22
SLIDE 22

Introduction Maximum Margin Multiple Classes Regression Example Summary

One versus the rest

Training for each class with all the others serving as the non-class training samples Typically training is skewed - too few positives compared to negatives Better fit for the negatives The one vs all implies extra complexity in training ≈ K 2

Henrik I. Christensen (RIM@GT) Support Vector Machines 22 / 42

slide-23
SLIDE 23

Introduction Maximum Margin Multiple Classes Regression Example Summary

Tree classifier

Organize the problem as a tree selection Best first elimination - select easy cases first Based on pairwise comparison of classes. Still requires extra comparison of K 2 classes

Henrik I. Christensen (RIM@GT) Support Vector Machines 23 / 42

slide-24
SLIDE 24

Introduction Maximum Margin Multiple Classes Regression Example Summary

Coding new classes

Considering optimization of an error coding How to minimize the criteria function to minimize errors Considered a generalization of voting based strategy Poses a larger training challenge

Henrik I. Christensen (RIM@GT) Support Vector Machines 24 / 42

slide-25
SLIDE 25

Introduction Maximum Margin Multiple Classes Regression Example Summary

Outline

1

Introduction

2

Maximum Margin Classifiers

3

Multi-Class SVM’s

4

The regression case

5

Small Example

6

Summary

Henrik I. Christensen (RIM@GT) Support Vector Machines 25 / 42

slide-26
SLIDE 26

Introduction Maximum Margin Multiple Classes Regression Example Summary

The regression case

In regression the target is not separation of classes but minimization

  • f regression error, i.e.

N

  • n=1

{yn − tn}2 + λ 2 ||w||2 The problem is similar to the case for mixed classes We can define an error function similar to the ξ term used for classification, an example could be: Eǫ(y(x) − t) = 0, |y − t| ≤ ǫ |y − t| − ǫ

  • therwise

Henrik I. Christensen (RIM@GT) Support Vector Machines 26 / 42

slide-27
SLIDE 27

Introduction Maximum Margin Multiple Classes Regression Example Summary

Example ǫ error function

z E(z) −ǫ ǫ

Henrik I. Christensen (RIM@GT) Support Vector Machines 27 / 42

slide-28
SLIDE 28

Introduction Maximum Margin Multiple Classes Regression Example Summary

The regularized error function

The optimization is then wrt to the error function C

N

  • n=1

Eǫ(y(xn) − tn) + 1 2||w||2 Just as before we can define a Lagrangian to be optimized / set criteria for the optimization The criteria are largely the same (see book for details, Eqns 7.56-7.60)

Henrik I. Christensen (RIM@GT) Support Vector Machines 28 / 42

slide-29
SLIDE 29

Introduction Maximum Margin Multiple Classes Regression Example Summary

Regression illustration

x t 1 −1 1

Henrik I. Christensen (RIM@GT) Support Vector Machines 29 / 42

slide-30
SLIDE 30

Introduction Maximum Margin Multiple Classes Regression Example Summary

Outline

1

Introduction

2

Maximum Margin Classifiers

3

Multi-Class SVM’s

4

The regression case

5

Small Example

6

Summary

Henrik I. Christensen (RIM@GT) Support Vector Machines 30 / 42

slide-31
SLIDE 31

Introduction Maximum Margin Multiple Classes Regression Example Summary

Categorization of Rooms

Example of using SVM for room categorization Recognition of different types of rooms across extended periods Training data recorded over a period of 6 months Training and evaluation across 3 different settings Extensive evaluation

Henrik I. Christensen (RIM@GT) Support Vector Machines 31 / 42

slide-32
SLIDE 32

Introduction Maximum Margin Multiple Classes Regression Example Summary

Room Categories

Henrik I. Christensen (RIM@GT) Support Vector Machines 32 / 42

slide-33
SLIDE 33

Introduction Maximum Margin Multiple Classes Regression Example Summary

Training Organization

Henrik I. Christensen (RIM@GT) Support Vector Machines 33 / 42

slide-34
SLIDE 34

Introduction Maximum Margin Multiple Classes Regression Example Summary

Training Organization

Henrik I. Christensen (RIM@GT) Support Vector Machines 34 / 42

slide-35
SLIDE 35

Introduction Maximum Margin Multiple Classes Regression Example Summary

Preprocessing of data

Henrik I. Christensen (RIM@GT) Support Vector Machines 35 / 42

slide-36
SLIDE 36

Introduction Maximum Margin Multiple Classes Regression Example Summary

SVM details

The system uses a χ2 kernel. The kernel is widely used for histogram comparison The kernel is defined as K(x, y) = e−γχ2(x,y) χ2(x, y) =

  • i
  • ||xi − yi||2/||xi + yi||
  • Initially introduced by Marszalek, et al, IJCV 2007.

Trained used “one vs the rest”

Henrik I. Christensen (RIM@GT) Support Vector Machines 36 / 42

slide-37
SLIDE 37

Introduction Maximum Margin Multiple Classes Regression Example Summary

SVM results - Video

Henrik I. Christensen (RIM@GT) Support Vector Machines 37 / 42

slide-38
SLIDE 38

Introduction Maximum Margin Multiple Classes Regression Example Summary

The recognition results

Henrik I. Christensen (RIM@GT) Support Vector Machines 38 / 42

slide-39
SLIDE 39

Introduction Maximum Margin Multiple Classes Regression Example Summary

Another small example

How to remove dependency on background? (Roobaert, 1999)

Henrik I. Christensen (RIM@GT) Support Vector Machines 39 / 42

slide-40
SLIDE 40

Introduction Maximum Margin Multiple Classes Regression Example Summary

Smart use of SVMs - a ”hack” with applications

Henrik I. Christensen (RIM@GT) Support Vector Machines 40 / 42

slide-41
SLIDE 41

Introduction Maximum Margin Multiple Classes Regression Example Summary

Outline

1

Introduction

2

Maximum Margin Classifiers

3

Multi-Class SVM’s

4

The regression case

5

Small Example

6

Summary

Henrik I. Christensen (RIM@GT) Support Vector Machines 41 / 42

slide-42
SLIDE 42

Introduction Maximum Margin Multiple Classes Regression Example Summary

Summary

An approach to storage of “key” data for recognition/regression Definition of optimization to recognize data points The learning is fairly involved (complex) Basically a quadratic optimization problem Evaluation across all training data Keep the essential data

1

Training can be costly

2

Execution can be fast - optimized

Multi-class cases can pose a bit of a challenge

Henrik I. Christensen (RIM@GT) Support Vector Machines 42 / 42