Learning with kernels and SVM malova chata, 23. kv etna, 2006 - - PowerPoint PPT Presentation

learning with kernels and svm
SMART_READER_LITE
LIVE PREVIEW

Learning with kernels and SVM malova chata, 23. kv etna, 2006 - - PowerPoint PPT Presentation

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning with kernels and SVM malova chata, 23. kv etna, 2006 Petra Kudov malka, 23. 5. 2006 Introduction Binary classification


slide-1
SLIDE 1

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Learning with kernels and SVM

Šámalova chata, 23. kvˇ etna, 2006

Petra Kudová

slide-2
SLIDE 2

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Outline

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

slide-3
SLIDE 3

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Learning from data

find a general rule that explains data given only as a sample of limited size data may contain measurement errors or noise supervised learning

data are sample of input-output pairs find input-output mapping prediction, classification, function approximation, etc.

unsupervised learning

data are sample of objects find some structure clustering, etc.

slide-4
SLIDE 4

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Learning methods

wide range of methods available statistical approaches neural networks

  • riginally biological motivation

Multi-layer perceptrons, RBF networks Kohonen maps

kernel methods

modern and popular SVM

slide-5
SLIDE 5

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Trends in machine learning

Articles on machine learning found by Google

Source: http://yaroslavvb.blogspot.com/

slide-6
SLIDE 6

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Trends in machine learning

Articles on neural networks found by Google

Source: http://yaroslavvb.blogspot.com/

slide-7
SLIDE 7

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Trends in machine learning

Articles on suport vector machine found by Google

Source: http://yaroslavvb.blogspot.com/

slide-8
SLIDE 8

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Binary classification

Training set {(xi, yi)}m

i=1

xi ∈ X yi ∈ {−1, 1} find classifier generalization

slide-9
SLIDE 9

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Simple Classifier

Suppose: X ⊂ Rn, classes linearly separable c+ =

1 m+

  • {i|yi=+1} xi

c− =

1 m−

  • {i|yi=−1} xi

c = 1

2(c+ + c−)

y = sgn((x − c), w) = sgn((x − (c+ + c−)/2), ((c+ + c−)) = sgn(x, c+ − x, c− + b) b = 1

2(||c−||2 − ||c+||2)

slide-10
SLIDE 10

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Mapping to the feature space

life is not so easy, not all problems are linearly separable what to do if X is not dot-product space? choose a mapping to some (high dimensional) dot-product space - feature space Φ : X → H

slide-11
SLIDE 11

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Mercer’s condition and Kernels

If a symmetric function K(x, y) satisfies

M

  • i,j=1

aiajK(xi, xj) ≥ 0 for all M ∈ N, xi, and ai ∈ R, there exists a mapping function Φ that maps x into the dot-product feature space and K(x, y) = Φ(x), Φ(y) and vice versa. Function K is called kernel.

slide-12
SLIDE 12

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Examples of kernels

Linear Kernels K(x, y) = x, y Polynomial Kernels K(x, y) = (x, y + 1)d for d = 2 and 2-dimensional inputs K(x, y) = 1 + 2x1y1 + 2x2y2 + 2x1y1x2y2 + x2

1y2 1 + x2 2y2 2

= Φ(x), Φ(x) Φ(x) = (1, √ 2x1, √ 2x2, √ 2x1x2, x2

1 x2 2)T

slide-13
SLIDE 13

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Examples of kernels

RBF Kernels K(x, y) = exp(−||x − y||2 d2 ) Other kernels

kernels on various objects, such as graphs, strings, texts, etc. enable us to use dot-product algorithms measure of similarity

slide-14
SLIDE 14

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Simple Classifier - kernel version

Suppose: X ⊂ Rn, classes linearly separable c+ =

1 m+

  • {i|yi=+1} xi

c− =

1 m−

  • {i|yi=−1} xi

c = 1

2(c+ + c−)

y = sgn((x − c), w) = sgn((x − (c+ + c−)/2), ((c+ + c−)) = sgn(x, c+ − x, c− + b) b = 1

2(||c−||2 − ||c+||2)

slide-15
SLIDE 15

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Simple Classifier - kernel version

Suppose: X is any set, Φ : X → H corresponding to kernel K c+ =

1 m+

  • {i|yi=+1} xi

c− =

1 m−

  • {i|yi=−1} xi

c = 1

2(c+ + c−)

y = sgn((x − c), w) = sgn((x − (c+ + c−)/2), ((c+ + c−)) = sgn(x, c+ − x, c− + b) b = 1

2(||c−||2 − ||c+||2)

slide-16
SLIDE 16

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Simple Classifier - kernel version

Suppose: X is any set, Φ : X → H corresponding to kernel K y = sgn( 1 m+

  • {i|yi=+1}

K(x, xi) − 1 m−

  • {i|yi=−1}

K(x, xi) + b) b = 1 2( 1 m2

  • {i,j|yi=yj=−1}

K(xi, xj) − 1 m2

+

  • {i,j|yi=yj=+1}

K(xi, xj)) Statistical approach Bayes classifier - special case

  • X

K(x, y)dx = 1 ∀y ∈ X; b = 0

slide-17
SLIDE 17

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Simple Classifier - kernel version

Suppose: X is any set, Φ : X → H corresponding to kernel K y = sgn( 1 m+

  • {i|yi=+1}

K(x, xi) − 1 m−

  • {i|yi=−1}

K(x, xi) + 0) = p+(x) = p−(x) Parzen windows Statistical approach Bayes classifier - special case

  • X

K(x, y)dx = 1 ∀y ∈ X; b = 0

slide-18
SLIDE 18

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Separating hyperplane

classifier in a form y(x) = sgn(w, x + b) w, xi + b

  • > 0

for yi = 1 < 0 for yi = −1 each hyperplane D(x) = w, x+b = c, −1 < c < 1 is separating

  • ptimal separating hyperplane - the one with the maximal

margin

slide-19
SLIDE 19

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Separating hyperplane

classifier in a form y(x) = sgn(w, x + b) w, xi+b

  • ≥ 1

for yi = 1 ≤ −1 for yi = −1 each hyperplane D(x) = w, x+b = c, −1 < c < 1 is separating

  • ptimal separating hyperplane - the one with the maximal

margin

slide-20
SLIDE 20

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Separating hyperplane

classifier in a form y(x) = sgn(w, x + b) w, xi+b

  • ≥ 1

for yi = 1 ≤ −1 for yi = −1 each hyperplane D(x) = w, x+b = c, −1 < c < 1 is separating

  • ptimal separating hyperplane - the one with the maximal

margin

slide-21
SLIDE 21

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Classifier with maximal margin

slide-22
SLIDE 22

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Classifier with maximal margin

y(x) = sgn(w, x + b) where w and b are solution of min Q(w), Q(w) = 1 2||w||2 with respect to constraints yi(w, xi + b) ≥ 1, for i = 1, . . . , M quadratic programming problem linear separability → solution exists no local minima

slide-23
SLIDE 23

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Classifier with maximal margin

constrained optimization problem min

w

1 2||w||2 subject to yi(w, xi + b) ≥ 1 can be handled by introducing Lagrange multipliers αi ≥ 0 L(w, b, α) = 1 2||w||2 −

m

  • i=1

αi(yi(w, xi + b) − 1) minimize with respect to w and b maximize with respect to αi

slide-24
SLIDE 24

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Classifier with maximal margin

L(w, b, α) = 1 2||w||2 −

m

  • i=1

αi(yi(w, xi + b) − 1) minimize with respect to w, b; maximize with respect to α Karush-Kuhn-Tucker (KKT) conditions δL(w, b, α) δw = 0 δL(w, b, α) δb = 0 we get w = m

i=1 αiyixi

m

i=1 αiyi = 0

yi(w, xi + b) > 1 → αi = 0 xi irrelevant yi(w, xi + b) = 1 → αi = 0 xi support vector

slide-25
SLIDE 25

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Dual problem

by substitution to L we get max

α∈Rn W(α) = m

  • i=1

αi − 1 2

m

  • i,j=1

αiαjyiyjxi, xj subject to αi ≥ 0

m

  • i=1

αiyi = 0 resulting classifier - (hard margin) support vector machine (SVM) f(x) = sgn(

m

  • i=1

yiαix, xi + b), b = yi − w, xi

slide-26
SLIDE 26

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Support Vector Machine

work in feature space and use kernels classifier f(x) = sgn(

m

  • i=1

yiαiK(x, xi) + b) max

α∈Rn W(α) = m

  • i=1

αi − 1 2

m

  • i,j=1

αiαjyiyjK(xi, xj) subject to αi ≥ 0

m

  • i=1

αiyi = 0

slide-27
SLIDE 27

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Soft margin SVM

separating hyperplane may not exists (high level of noise,

  • verlap of classes, etc.)
slide-28
SLIDE 28

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Soft margin SVM

separating hyperplane may not exists (high level of noise,

  • verlap of classes, etc.)

introduce slack variables ξi yi(w, Φ(x)i + b) ≥ 1 − ξi minimize Q(w, ξ) = 1 2||w||2 + C

m

  • i=1

ξ2

i

C > 0 trade-off between maximization of margin and minimization of training error (depands on noise level)

slide-29
SLIDE 29

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Soft margin SVM

solution has a form f(x) = sgn(

m

  • i=1

yiαiK(x, xi) + b) max

α∈Rn W(α) = m

  • i=1

αi − 1 2

m

  • i,j=1

αiαjyiyj

  • K(xi, xj) + δi,j

C

  • subject to

αi ≥ 0

m

  • i=1

αiyi = 0

slide-30
SLIDE 30

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

SVM - summary

input points are mapped to the feature space dot product computed by means of kernel function classification via separating hyperplane with maximal margin such hyperplane is determined by support vectors

  • ther training samples are irrelevant

data not separable in feature space (noise, etc.) - use soft margin control trade-of between maximal margin and minimum training error (C)

slide-31
SLIDE 31

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

SVM vs. Neural Networks

+ maximization of generalization ability + no local minima

  • extension to multiclass problems
  • long training time

· number of variables same as number of data points · not necessarily true - many techniques to reduce time exists

  • selection of parameters

· kernel function · C

slide-32
SLIDE 32

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

References

Support Vector Machines for Pattern Classification Shigoe Abe, Springer 2005 Learning with Kernels Bernhard Schölkopf and Alex Smola MIT Press, Cambridge, MA, 2002

Source of “sheep vectors” illustrations.

Learning kernel classifiers Ralf Herbrich MIT Press, Cambridge, MA, 2002 http://www.kernel-machines.org/

slide-33
SLIDE 33

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Software

SVMlib (by Chih-Chung Chang and Chih-Jen Lin)

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Matlab toolbox (by S. Gunn)

http://www.isis.ecs.soton.ac.uk/resources/svminfo/

slide-34
SLIDE 34

Šámalka, 23. 5. 2006

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion

Questions?