Machine Learning Support Vector Machines Rui Xia T ext M ining - - PowerPoint PPT Presentation

machine learning
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Support Vector Machines Rui Xia T ext M ining - - PowerPoint PPT Presentation

Machine Learning Support Vector Machines Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn Outline Maximum Margin Linear Classifier Duality Optimization Soft-margin SVM Kernel


slide-1
SLIDE 1

Machine Learning

Support Vector Machines

Rui Xia Text Mining Group Nanjing University of Science & Technology rxia@njust.edu.cn

slide-2
SLIDE 2

Outline

  • Maximum Margin Linear Classifier
  • Duality Optimization
  • Soft-margin SVM
  • Kernel Functions
  • *Sequential Minimal Optimization
  • The Usage of SVM Toolkits

Machine Learning Course, NJUST 2

slide-3
SLIDE 3

Maximum Margin Linear Classifier

Machine Learning Course, NJUST 3

slide-4
SLIDE 4

Recall Previous Linear Classifier

Which linear hyper-plane is better? Which learning criterion to choose?

  • Perceptron Criterion
  • Cross Entropy Criterion

(Logistic Regression)

  • Least Mean Square (LMS)

Criterion

Machine Learning Course, NJUST 4

slide-5
SLIDE 5

Maximum Margin Criterion

Machine Learning Course, NJUST 5

slide-6
SLIDE 6

Distance from Point to Hyper-plane

  • Distance (positive side)
  • Hyper-plane
  • Linear Model

Machine Learning Course, NJUST 6

slide-7
SLIDE 7

Geometric Distance & Functional Distance

  • Distance (negative side)
  • Geometric distance (uniform expression)
  • Functional distance

Machine Learning Course, NJUST 7

slide-8
SLIDE 8

Parameter Scaling

  • Geometric margin: independent of the scale factor
  • Scaling the parameter by a scale factor
  • Functional margin: proportional to the scale factor

Machine Learning Course, NJUST 8

slide-9
SLIDE 9

Maximum Margin Criterion

  • Formulation 1

Machine Learning Course, NJUST 9

slide-10
SLIDE 10

Maximum Margin Criterion

  • Formulation 2

Machine Learning Course, NJUST 10

slide-11
SLIDE 11

Maximum Margin Criterion

  • In this constraint
  • Scaling Constraint

Machine Learning Course, NJUST 11

slide-12
SLIDE 12

Maximum Margin Criterion

  • Formulation 3

Machine Learning Course, NJUST 12

slide-13
SLIDE 13

Duality Optimization

Machine Learning Course, NJUST 13

slide-14
SLIDE 14

Lagrange Multiplier

  • In case of equality constraint

Machine Learning Course, NJUST 14

slide-15
SLIDE 15

An Example

Machine Learning Course, NJUST 15

slide-16
SLIDE 16

Lagrange Multiplier

  • In case of inequality constraint

active inactive

Machine Learning Course, NJUST 16

slide-17
SLIDE 17

An Illustration

Machine Learning Course, NJUST 17

slide-18
SLIDE 18

Lagrange Multiplier

  • In case of multiple equality and inequality constraint

Stationary Primal feasibility Dual feasibility Complementary condition Karush–Kuhn–Tucker (KKT) Conditions

Machine Learning Course, NJUST 18

slide-19
SLIDE 19

Generalized Lagrangian and Duality

  • Primal Optimization Problem
  • Generalized Lagrangian

Machine Learning Course, NJUST 19

slide-20
SLIDE 20

Min-max of Lagrangian = =

Machine Learning Course, NJUST 20

slide-21
SLIDE 21

Primal Problem & Dual Problem

  • The primal problem (min-max of Lagrangian)
  • Max-min vs. Min-max
  • The dual problem (max-min of Lagrangian)

When does the equality hold?

Machine Learning Course, NJUST 21

slide-22
SLIDE 22

Equivalency of Two Problems

  • The equality holds when

– f and the gi’s are convex, and the hi’s are affine; – gi are (strictly) feasible: this means that there exists some w so that gi(w)<0.

= =

  • Equivalency of the primal and dual problems

Primal Problem Dual Problem

Machine Learning Course, NJUST 22

slide-23
SLIDE 23

Karush-Kuhn-Kucker (KKT) Conditions

Stationary Primal feasibility Dual feasibility Complementary condition

  • Furthermore, the solution of the primal and dual

problems satisfy the KKT conditions:

Primal feasibility

Machine Learning Course, NJUST 23

Sufficient and Necessary Condition

slide-24
SLIDE 24

Lagrangian for SVM

  • The optimization problem of SVM
  • The Lagrangian

Machine Learning Course, NJUST 24

slide-25
SLIDE 25

Minimization of the Lagrangian

  • Take the derivative of the Lagrangian
  • Plug back into the Lagrangian

Machine Learning Course, NJUST 25

slide-26
SLIDE 26

Dual Problem of SVM

  • Dual Problem

Guarantee that the KKT conditions are satisfied.

Machine Learning Course, NJUST 26

slide-27
SLIDE 27

Why “Support Vector”?

  • Decision function
  • KKT conditions

Machine Learning Course, NJUST 27

slide-28
SLIDE 28

The Value of the Bias

is a positive support vector is a negative support vector

Machine Learning Course, NJUST 28

slide-29
SLIDE 29

One Remaining Problem

  • Dual Problem of SVM
  • Decision function

How to compute alpha? How to solve the dual

  • ptimization problem?

Machine Learning Course, NJUST 29

slide-30
SLIDE 30

Soft-margin SVM

Machine Learning Course, NJUST 30

slide-31
SLIDE 31

Linearly Non-separable Case

Linearly separable Linearly non-separable

Machine Learning Course, NJUST 31

slide-32
SLIDE 32

Soft Margin Criterion

Maximum margin

Soft margin

Machine Learning Course, NJUST 32

slide-33
SLIDE 33

Three Types of Slacks

Machine Learning Course, NJUST 33

slide-34
SLIDE 34

Lagrangian for Soft-margin SVM

  • Lagrangian form

= =

  • Recall the equivalency of the primal and dual problems

Primal Problem Dual Problem

Machine Learning Course, NJUST 34

slide-35
SLIDE 35

Dual Problem for Soft-margin SVM

  • Plug back into the Lagrangian
  • Gradient

Machine Learning Course, NJUST 35

slide-36
SLIDE 36

Maximum-margin SVM vs. Soft-margin SVM

  • Maximum-margin SVM
  • Soft-margin SVM

Machine Learning Course, NJUST 36

slide-37
SLIDE 37

KKT Complementarity Condition

  • Two KKT complementarity conditions
  • Some useful conclusions

Machine Learning Course, NJUST 37

slide-38
SLIDE 38

Slacks and Support Vectors

Machine Learning Course, NJUST 38

slide-39
SLIDE 39

Kernel Functions

Machine Learning Course, NJUST 39

slide-40
SLIDE 40

Low-dimensional-nonseparable to Higher-dimensional-separable

Machine Learning Course, NJUST 40

slide-41
SLIDE 41

From low dimension to higher dimension

  • Feature Space Mapping: from Low-dimensional-nonseparable to

Higher-dimensional-separable

Machine Learning Course, NJUST 41

slide-42
SLIDE 42

Kernel Functions

  • Definition: Product of higher feature space
  • An example

Machine Learning Course, NJUST 42

slide-43
SLIDE 43

SVM in Higher-dimensional Feature Space

  • Decision function
  • Training process

Machine Learning Course, NJUST 43

slide-44
SLIDE 44

Kernel Trick in SVM

  • Kernel Trick in SVM

– Sometimes it’s hard to know the exact projection function, but relatively easy to know the Kernel function – In SVM, all of the calculations of feature vectors are in the form of product – Therefore, we only need to know the Kernel function used in SVM, but without the need to know the exact projection function.

Machine Learning Course, NJUST 44

slide-45
SLIDE 45
  • Kernel matrix

– For any finite set of points – Element of kernel matrix

Mercer Condition

  • Mercer theorem
  • A valid kernel satisfies

– Symmetric – Positive semi-definite

Machine Learning Course, NJUST 45

slide-46
SLIDE 46

Common Kernel Functions

  • Linear kernel
  • Polynomial kernel
  • Gaussian kernel
  • Sigmoid kernel, pyramid kernel, string kernel, tree kernel…

Machine Learning Course, NJUST 46

slide-47
SLIDE 47

Kernel SVM

  • Decision
  • Training

Machine Learning Course, NJUST 47

slide-48
SLIDE 48

Soft-margin Kernel SVM

  • Decision
  • Training

Machine Learning Course, NJUST 48

slide-49
SLIDE 49

Sequential Minimal Optimization

Machine Learning Course, NJUST 49

slide-50
SLIDE 50

Coordinate Ascent

  • Consider a unconstrained optimization problem
  • Coordinate Ascent Algorithm

Machine Learning Course, NJUST 50

slide-51
SLIDE 51

Coordinate Ascent

  • An Example

Machine Learning Course, NJUST 51

slide-52
SLIDE 52

Recall the Dual Problem in SVM

  • The Dual Optimization Problem
  • KKT Conditions

Machine Learning Course, NJUST 52

slide-53
SLIDE 53

Coordinate Ascent in SVM

  • Choose two coordinates for optimization each time
  • Which two coordinates to be chosen?

Machine Learning Course, NJUST 53

slide-54
SLIDE 54

where

The SMO Algorithm

Machine Learning Course, NJUST 54

slide-55
SLIDE 55

The SMO Algorithm

  • Optimization by letting the gradient equals zero
  • Variable Elimination using Equality Constraint

Machine Learning Course, NJUST 55

slide-56
SLIDE 56

The SMO Updating

  • Make use of
  • We finally have

where

Machine Learning Course, NJUST 56

slide-57
SLIDE 57

Adding Inequality Constraints

  • Equality Constraints
  • Inequality Constraints

Machine Learning Course, NJUST 57

slide-58
SLIDE 58

Final Updating of Two Multipliers

  • In case of
  • In case of
  • Final Updating

Machine Learning Course, NJUST 58

slide-59
SLIDE 59

Heuristics to Choose Two Multipliers

  • First choose a Lagrange multiplier that violate the KKT

condition (Osuna Theory)

  • Second choose a Lagrange multiplier that maximize |E1-E2|

Machine Learning Course, NJUST 59

slide-60
SLIDE 60

Updating of the Bias

  • Choose b that makes the KKT conditions hold (when

alpha is not at the bounds)

  • Updating of b

Machine Learning Course, NJUST 60

slide-61
SLIDE 61

Convergence Condition

  • The problem has been solved, when all the Lagrange

multipliers satisfy the KKT conditions (within a user-defined tolerance).

  • Updating of the weights in case of a linear kernel

Machine Learning Course, NJUST 61

slide-62
SLIDE 62

Questions?

Machine Learning Course, NJUST