Data Mining Support Vector Machines Introduction to Data Mining, 2 - - PDF document

data mining support vector machines introduction to data
SMART_READER_LITE
LIVE PREVIEW

Data Mining Support Vector Machines Introduction to Data Mining, 2 - - PDF document

Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Introduction to Data Mining, 2 nd Edition 02/17/2020 1 1 Support Vector Machines Find a linear hyperplane (decision


slide-1
SLIDE 1

02/17/2020 Introduction to Data Mining, 2nd Edition 1

Data Mining Support Vector Machines Introduction to Data Mining, 2nd Edition by Tan, Steinbach, Karpatne, Kumar

02/17/2020 Introduction to Data Mining, 2nd Edition 2

Support Vector Machines

  • Find a linear hyperplane (decision boundary) that will separate the data

1 2

slide-2
SLIDE 2

02/17/2020 Introduction to Data Mining, 2nd Edition 3

Support Vector Machines

  • One Possible Solution

02/17/2020 Introduction to Data Mining, 2nd Edition 4

Support Vector Machines

  • Another possible solution

3 4

slide-3
SLIDE 3

02/17/2020 Introduction to Data Mining, 2nd Edition 5

Support Vector Machines

  • Other possible solutions

02/17/2020 Introduction to Data Mining, 2nd Edition 6

Support Vector Machines

  • Which one is better? B1 or B2?
  • How do you define better?

5 6

slide-4
SLIDE 4

02/17/2020 Introduction to Data Mining, 2nd Edition 7

Support Vector Machines

  • Find hyperplane maximizes the margin => B1 is better than B2

02/17/2020 Introduction to Data Mining, 2nd Edition 8

Support Vector Machines

   b x w   1     b x w   1     b x w               1 b x w if 1 1 b x w if 1 ) (      x f || || 2 Margin w   7 8

slide-5
SLIDE 5

02/17/2020 Introduction to Data Mining, 2nd Edition 9

Linear SVM

  • Linear model:
  • Learning the model is equivalent to determining

the values of – How to find from training data?

            1 b x w if 1 1 b x w if 1 ) (      x f

and b w  and b w 

02/17/2020 Introduction to Data Mining, 2nd Edition 10

Learning Linear SVM

  • Objective is to maximize:

– Which is equivalent to minimizing: – Subject to the following constraints:

  • r

 This is a constrained optimization problem

– Solve it using Lagrange multiplier method

|| || 2 Margin w  

            1 b x w if 1 1 b x w if 1

i i

   

i

y

2 || || ) (

2

w w L   

𝑧w • x 𝑐 1, 𝑗 1,2, . . . , 𝑂 9 10

slide-6
SLIDE 6

02/17/2020 Introduction to Data Mining, 2nd Edition 11

Example of Linear SVM

x1 x2 y 

0.3858 0.4687 1 65.5261 0.4871 0.611

  • 1

65.5261 0.9218 0.4103

  • 1

0.7382 0.8936

  • 1

0.1763 0.0579 1 0.4057 0.3529 1 0.9355 0.8132

  • 1

0.2146 0.0099 1

Support vectors

02/17/2020 Introduction to Data Mining, 2nd Edition 12

Learning Linear SVM

  • Decision boundary depends only on support

vectors – If you have data set with same support vectors, decision boundary will not change – How to classify using SVM once w and b are found? Given a test record, xi

            1 b x w if 1 1 b x w if 1 ) (

i i

    

i

x f

11 12

slide-7
SLIDE 7

02/17/2020 Introduction to Data Mining, 2nd Edition 13

Support Vector Machines

  • What if the problem is not linearly separable?

02/17/2020 Introduction to Data Mining, 2nd Edition 14

Support Vector Machines

  • What if the problem is not linearly separable?

– Introduce slack variables

 Need to minimize:  Subject to:  If k is 1 or 2, this leads to similar objective function

as linear SVM but with different constraints (see textbook)

            

i i i i

1 b x w if 1

  • 1

b x w if 1      

i

y        

 N i k i

C w w L

1 2

2 || || ) (  

13 14

slide-8
SLIDE 8

02/17/2020 Introduction to Data Mining, 2nd Edition 15

Support Vector Machines

  • Find the hyperplane that optimizes both factors

02/17/2020 Introduction to Data Mining, 2nd Edition 16

Nonlinear Support Vector Machines

  • What if decision boundary is not linear?

15 16

slide-9
SLIDE 9

02/17/2020 Introduction to Data Mining, 2nd Edition 17

Nonlinear Support Vector Machines

  • Transform data into higher dimensional space

) (     b x w  

Decision boundary:

02/17/2020 Introduction to Data Mining, 2nd Edition 18

Learning Nonlinear SVM

  • Optimization problem:
  • Which leads to the same set of equations (but

involve (x) instead of x)

17 18

slide-10
SLIDE 10

02/17/2020 Introduction to Data Mining, 2nd Edition 19

Learning NonLinear SVM

  • Issues:

– What type of mapping function  should be used? – How to do the computation in high dimensional space?

 Most computations involve dot product (xi) (xj)  Curse of dimensionality?

02/17/2020 Introduction to Data Mining, 2nd Edition 20

Learning Nonlinear SVM

  • Kernel Trick:

– (xi) (xj) = K(xi, xj) – K(xi, xj) is a kernel function (expressed in terms of the coordinates in the original space)

 Examples:

19 20

slide-11
SLIDE 11

02/17/2020 Introduction to Data Mining, 2nd Edition 21

Example of Nonlinear SVM

SVM with polynomial degree 2 kernel

02/17/2020 Introduction to Data Mining, 2nd Edition 22

Learning Nonlinear SVM

  • Advantages of using kernel:

– Don’t have to know the mapping function  – Computing dot product (xi) (xj) in the

  • riginal space avoids curse of dimensionality
  • Not all functions can be kernels

– Must make sure there is a corresponding  in some high-dimensional space – Mercer’s theorem (see textbook)

21 22

slide-12
SLIDE 12

02/17/2020 Introduction to Data Mining, 2nd Edition 23

Characteristics of SVM

  • The learning problem is formulated as a convex optimization problem

– Efficient algorithms are available to find the global minima – Many of the other methods use greedy approaches and find locally

  • ptimal solutions

– High computational complexity for building the model

  • Robust to noise
  • Overfitting is handled by maximizing the margin of the decision boundary,
  • SVM can handle irrelevant and redundant better than many other

techniques

  • The user needs to provide the type of kernel function and cost function
  • Difficult to handle missing values
  • What about categorical variables?

23