IAML: Support Vector Machines I Nigel Goddard School of Informatics - PowerPoint PPT Presentation

IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18

Outline ◮ Separating hyperplane with maximum margin ◮ Non-separable training data ◮ Expanding the input into a high-dimensional space ◮ Support vector regression ◮ Reading: W & F sec 6.3 (maximum margin hyperplane, nonlinear class boundaries), SVM handout. SV regression not examinable. 2 / 18

Overview ◮ Support vector machines are one of the most effective and widely used classification algorithms. ◮ SVMs are the combination of two ideas ◮ Maximum margin classification ◮ The “kernel trick” ◮ SVMs are a linear classifier, like logistic regression and perceptron 3 / 18

Stuff You Need to Remember w ⊤ x is length of the projection of x onto w (if w is a unit vector) x w b i.e., b = w T x . (If you do not remember this, see supplementary maths notes on course Web site.) 4 / 18

Separating Hyperplane For any linear classifier ◮ Training instances ( x i , y i ) , i = 1 , . . . , n . y i ∈ {− 1 , + 1 } ◮ Hyperplane w ⊤ x + w 0 = 0 ◮ Notice for this lecture we use − 1 rather than 0 for negative class. This will be convenient for the maths. x 2 o o o o o o o w o o x o o x x x x 1 x x x x 5 / 18

A Crap Decision Boundary Seems okay This is crap x 2 x 2 o o o o o o o o o o o o o o o x o o w o o o x x x o x o x x x x 1 x x x x 1 x x x w x x 6 / 18

Idea: Maximize the Margin The margin is the distance between the decision boundary (the hyperplane) and the closest training point. x o x x o x o o ~ margin w o 7 / 18

Computing the Margin ◮ The tricky part will be to get an equation for the margin ◮ We’ll start by getting the distance from the origin to the hyperplane ◮ i.e., We want to compute the scalar b below w b w T x + w 0 = 0 8 / 18

Computing the Distance to Origin ◮ Define z as the point on the hyperplane closest to the origin. ◮ z must be proportional to z w , because w normal to hyperplane w b ◮ By definition of b , we have the norm of z given by: w T x + w 0 = 0 || z || = b So b w || w || = z 9 / 18

Computing the Distance to Origin ◮ We know that (a) z on the hyperplane and (b) b w || w || = z . ◮ First (a) means w T z + w 0 = 0 ◮ Substituting we get w T b w || w || + w 0 = 0 b w T w || w || + w 0 = 0 b = − w 0 || w || √ ◮ Remember || w || = w T w . ◮ Now we have the distance from the origin to the hyperplane! 10 / 18

Computing the Distance to Hyperplane x c w a b ◮ Now we want c , the distance from x to the hyperplane. ◮ It’s clear that c = | b − a | , where a the length of the projection of x onto w . Quiz: What is a ? 11 / 18

Computing the Distance to Hyperplane x c w a b ◮ Now we want c , the distance from x to the hyperplane. ◮ It’s clear that c = | b − a | , where a the length of the projection of x onto w . Quiz: What is a ? a = w T x || w || 12 / 18

Equation for the Margin ◮ The perpendicular distance from a point x to the hyperplane w T x + w 0 = 0 is 1 || w ||| w T x + w 0 | ◮ The margin is the distance from the closest training point to the hyperplane 1 || w ||| w T x i + w 0 | min i 13 / 18

The Scaling ◮ Note that ( w , w 0 ) and ( c w , cw 0 ) defines the same hyperplane. The scale is arbitrary. ◮ This is because we predict class y = 1 if w T x + w 0 ≥ 0. That’s the same thing as saying c w T x + cw 0 ≥ 0 ◮ To remove this freedom, we will put a constraint on ( w , w 0 ) | w ⊤ x i + w 0 | = 1 min i ◮ With this constraint, the margin is always 1 / || w || . 14 / 18

First version of Max Margin Optimization Problem ◮ Here is a first version of an optimization problem to maximize the margin (we will simplify) 1 / || w || max w subject to w ⊤ x i + w 0 ≥ 0 for all i with y i = 1 w ⊤ x i + w 0 ≤ 0 for all i with y i = − 1 | w ⊤ x i + w 0 | = 1 min i ◮ The first two constraints are too lose. It’s the same thing to say max 1 / || w || w subject to w ⊤ x i + w 0 ≥ 1 for all i with y i = 1 w ⊤ x i + w 0 ≤ − 1 for all i with y i = − 1 | w ⊤ x i + w 0 | = 1 min i ◮ Now the third constraint is redundant 15 / 18

First version of Max Margin Optimization Problem ◮ That means we can simplify to max 1 / || w || w subject to w ⊤ x i + w 0 ≥ 1 for all i with y i = 1 w ⊤ x i + w 0 ≤ − 1 for all i with y i = − 1 ◮ Here’s a compact way to write those two constraints max 1 / || w || w subject to y i ( w ⊤ x i + w 0 ) ≥ 1 for all i ◮ Finally, note that maximizing 1 / || w || is the same thing as minimizing || w || 2 16 / 18

The SVM optimization problem ◮ So the SVM weights are determined by solving the optimization problem: || w || 2 min w s.t. y i ( w ⊤ x i + w 0 ) ≥ + 1 for all i ◮ Solving this will require maths that we don’t have in this course. But I’ll show the form of the solution next time. 17 / 18

Fin (Part I) 18 / 18

IAML: Support Vector Machines I Nigel Goddard School of Informatics - PowerPoint PPT Presentation

IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18 Outline Separating hyperplane with maximum margin Non-separable training data Expanding the input into a high-dimensional space Support vector

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

In SMV I IAML: Support Vector Machines II We saw: Max margin trick Nigel Goddard

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

L15:Microarray analysis (Classification) November 09 Bafna Silly Quiz Social networking

Natural Language Processing and Information Retrieval Support Vector Machines Alessandro

Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Po

Support Vector Machines (I): Overview and Linear SVM LING 572 Advanced Statistical Techniques

Support Vector Machines Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

Hypercube locality-sensitive hashing for approximate near neighbors Thijs Laarhoven

Bisectors and foliations in the complex hyperbolic space Maciej Czarnecki Uniwersytet L

Introduction to Support Vector Machines Starting from slides drawn by Ming-Hsuan Yang and Antoine

IAML: Support Vector Machines I Nigel Goddard School of Informatics - PowerPoint PPT Presentation

IAML: Support Vector Machines I Nigel Goddard School of Informatics Semester 1 1 / 18 Outline Separating hyperplane with maximum margin Non-separable training data Expanding the input into a high-dimensional space Support vector

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

In SMV I IAML: Support Vector Machines II We saw: Max margin trick Nigel Goddard

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

L15:Microarray analysis (Classification) November 09 Bafna Silly Quiz Social networking

Natural Language Processing and Information Retrieval Support Vector Machines Alessandro

Optimal separating hyperplane. Basis expansion. Kernel trick. Support vector machine. Petr Po

Support Vector Machines (I): Overview and Linear SVM LING 572 Advanced Statistical Techniques

Support Vector Machines Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

Hypercube locality-sensitive hashing for approximate near neighbors Thijs Laarhoven

Bisectors and foliations in the complex hyperbolic space Maciej Czarnecki Uniwersytet L

Introduction to Support Vector Machines Starting from slides drawn by Ming-Hsuan Yang and Antoine

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David