Lecture #19: Support Vector Machines #2 CS 109A, STAT 121A, AC 209A: - PowerPoint PPT Presentation

Lecture #19: Support Vector Machines #2 CS 109A, STAT 121A, AC 209A: Data Science Pavlos Protopapas Kevin Rader Margo Levine Rahul Dave

Lecture Outline Review Extension to Non-linear Boundaries 2

Review 3

Classifiers and Decision Boundaries Last time, we derived a linear classifier based on the intuition that a good classifier should ▶ maximize the distance between the points and the decision boundary (maximize margin) ▶ misclassify as few points as possible 4

SVC as Optimization With the help of geometry, we translated our wish list into an optimization problem  N ξ n ∈ R + ,w,b ∥ w ∥ 2 + λ ∑ min  ξ n  n =1  such that y n ( w ⊤ x n + b ) ≥ 1 − ξ n , n = 1 , . . . , N  where ξ n quantifies the error at x n . The SVC optimization problem is often solved in an alternate form (the dual form) N α n − 1 ∑ ∑ max y n y m α n α m x ⊤ n x m 2 α n ≥ 0 , ∑ n α n y n =0 n n,m =1 Later we’ll see that this alternate form allows us to use SVC with non-linear boundaries. 5

Decision Boundaries and Support Vectors Recall how the error terms ξ n ’s were defined: the points where ξ n = 0 are precisely the support vectors 6

Decision Boundaries and Support Vectors Thus to re-construct the decision boundary, only the support vectors are needed! 6

Decision Boundaries and Support Vectors ▶ The decision boundary of an SVC is given by w ⊤ x + ˆ ∑ α n y n ( x ⊤ ˆ b = ˆ n x n ) + b x n is a support vector where ˆ α n and the set of support vectors are found by solving the optimization problem. ▶ To classify a test point x test , we predict ( ) w ⊤ x + ˆ y test = sign ˆ ˆ b 6

Extension to Non-linear Boundaries 7

Polynomial Regression: Two Perspectives Given a training set { ( x 1 , y 1 ) , . . . , ( x N , y N ) } with a single real-valued predictor, we can view fitting a 2nd degree polynomial model w 0 + w 1 x + w 2 x 2 on the data as the process of finding the best quadratic curve that fits the data. But in practice, we first expand the feature dimension of the training set x n �→ ( x 0 n , x 1 n , x 2 n ) and train a linear model on the expanded data { ( x 0 n , x 1 n , x 2 N , y 1 ) , . . . , ( x 0 N , x 1 N , x 2 N , y N ) } 8

Transforming the Data The key observation is that training a polynomial model is just training a linear model on data with transformed predictors. In our previous example, transforming the data to fit a 2nd degree polynomial model requires a map φ : R → R 3 φ ( x ) = ( x 0 , x 1 , x 2 ) where R called the input space , R 3 is called the feature space . While the response may not have a linear correlation in the input space R , it may have one in the feature space R 3 . 9

SVC with Non-Linear Decision Boundaries The same insight applies to classification: while the response may not be linear separable in the input space, it may be in a feature space after a fancy transformation: 10

SVC with Non-Linear Decision Boundaries The motto: instead of tweaking the definition of SVC to accommodate non-linear decision boundaries, we map the data into a feature space in which the classes are linearly separable (or nearly separable): ▶ Apply transform φ : R J → R J ′ on training data x n �→ φ ( x n ) where typically J ′ is much larger than J . ▶ Train an SVC on the transformed data { ( φ ( x 1 ) , y 1 ) , . . . , ( φ ( x N ) , y N ) } 10

The Kernel Trick Since the feature space R J ′ is extremely high dimensional, computing φ explicitly can be costly. Instead, we note that computing φ is unnecessary. Recall that training an SVC involves solving the optimization problem N α n − 1 ∑ ∑ max y n y m α n α m φ ( x n ) ⊤ φ ( x m ) 2 α n ≥ 0 , ∑ n α n y n =0 n n,m =1 In the above, we are only interested in computing inner products φ ( x n ) ⊤ φ ( x m ) in the feature space and not the quantities φ ( x n ) . 11

The Kernel Trick The inner product between two vectors is a measure of the similarity of the two vectors. Definition Given a transformation φ : R J → R J ′ , from input space R J to feature space R J ′ , the function K : R J × R J → R defined by K ( x n , x m ) = φ ( x n ) ⊤ φ ( x m ) , x n , x m ∈ R J is called the kernel function of φ . Generally, kernel function may refer to any function K : R J × R J → R that measure the similarity of vectors in R J , without explicitly defining a transform φ . 11

The Kernel Trick For a choice of kernel K , K ( x n , x m ) = φ ( x n ) ⊤ φ ( x m ) we train an SVC by solving N α n − 1 max ∑ ∑ y n y m α n α m K ( x n , x m ) 2 α n ≥ 0 , ∑ n α n y n =0 n n,m =1 Computing K ( x n , x m ) can be done without computing the mappings φ ( x n ) , φ ( x m ) . This way of training a SVC in feature space without explicitly working with the mapping φ is called the kernel trick . 11

Transforming Data: An Example Example Let’s define φ : R 2 → R 6 by √ √ √ 2 x 2 , x 2 1 , x 2 φ ([ x 1 , x 2 ]) = (1 , 2 x 1 , 2 , 2 x 1 x 2 ) The inner product in the feature space is φ ([ x 11 , x 12 ]) ⊤ φ ([ x 21 , x 22 ]) = (1 + x 11 x 21 + x 12 x 22 ) 2 Thus, we can directly define a kernel function K : R 2 × R 2 → R by K ( x 1 , x 2 ) = (1 + x 11 x 21 + x 12 x 22 ) 2 . Notice that we need not compute φ ([ x 11 , x 12 ]) , φ ([ x 21 , x 22 ]) to compute K ( x 1 , x 2 ) . 12

Kernel Functions Common kernel functions include: ▶ Polynomial Kernel ( kernel='poly' ) K ( x 1 , x 2 ) = ( x ⊤ 1 x 2 + 1) d where d is a hyperparameter ▶ Radial Basis Function Kernel ( kernel='rbf' ) { −∥ x 1 − x 2 ∥ 2 } K ( x 1 , x 2 ) = exp 2 σ 2 where σ is a hyperparameter ▶ Sigmoid Kernel ( kernel='sigmoid' ) K ( x 1 , x 2 ) = tanh ( κx ⊤ 1 x 2 + θ ) where κ and θ are hyperparameters. 13

Let’s go to the notebook 14

Lecture #19: Support Vector Machines #2 CS 109A, STAT 121A, AC 209A: - PowerPoint PPT Presentation

Lecture #19: Support Vector Machines #2 CS 109A, STAT 121A, AC 209A: Data Science Pavlos Protopapas Kevin Rader Margo Levine Rahul Dave Lecture Outline Review Extension to Non-linear Boundaries 2 Review 3 Classifiers and Decision

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

ANNUAL MEETING OF STOCKHOLDERS October 10, 2017 Safe Harbor Statem ent During the course of

Progress of x-ray tests at ETH Marco Rossini Institute for Particle Physics, ETH Zrich 26.

Contents 1. Generator 2. Tube 3. Collimator 4. X-ray Tube Support Unit A. Stand Type B.

215-meter-long Beamline at SPring-8 S. Goto a , K. Takeshita a , Y. Suzuki a , H. Ohashi a , Y.

2015 FULL YEAR RESULTS Thierry Le Hnaff Chairman and CEO 3 MARCH 2016 10-YEAR SUCCESSFUL

Mid-Atlantic District Which 50 ? A Look at Re-Opening the Church Building May 4, 2020 X1

A BRANCH AND BOUND ALGORITHM FOR THE GENERALIZED ASSIGNMENT PROBLEM* G. Terry ROSS University of

Mock -ups in CLRC Design brainstorming P a t i e n t R o o m P r o t o t y p e P a t i e n t

Sambuz

Useful Links

Newsletter

Mail Us

Lecture #19: Support Vector Machines #2 CS 109A, STAT 121A, AC 209A: - PowerPoint PPT Presentation

Lecture #19: Support Vector Machines #2 CS 109A, STAT 121A, AC 209A: Data Science Pavlos Protopapas Kevin Rader Margo Levine Rahul Dave Lecture Outline Review Extension to Non-linear Boundaries 2 Review 3 Classifiers and Decision

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

ANNUAL MEETING OF STOCKHOLDERS October 10, 2017 Safe Harbor Statem ent During the course of

Progress of x-ray tests at ETH Marco Rossini Institute for Particle Physics, ETH Zrich 26.

Contents 1. Generator 2. Tube 3. Collimator 4. X-ray Tube Support Unit A. Stand Type B.

215-meter-long Beamline at SPring-8 S. Goto a , K. Takeshita a , Y. Suzuki a , H. Ohashi a , Y.

2015 FULL YEAR RESULTS Thierry Le Hnaff Chairman and CEO 3 MARCH 2016 10-YEAR SUCCESSFUL

Mid-Atlantic District Which 50 ? A Look at Re-Opening the Church Building May 4, 2020 X1

A BRANCH AND BOUND ALGORITHM FOR THE GENERALIZED ASSIGNMENT PROBLEM* G. Terry ROSS University of

Mock -ups in CLRC Design brainstorming P a t i e n t R o o m P r o t o t y p e P a t i e n t

Sambuz

Useful Links

Newsletter

Mail Us

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David