Support vector machines Lecture 4 David Sontag New York University - PowerPoint PPT Presentation

Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin

Allowing for slack: “Soft margin” SVM w . x + b = +1 w . x + b = 0 w . x + b = -1 + C Σ j ξ j - ξ j ξ j ≥ 0 ξ “slack variables” ξ What is the (optimal) value of ξ j as a function ξ of w and b? ξ then ξ j = 0 If If then ξ j = Sometimes written as

Equivalent hinge loss formulation + C Σ j ξ j - ξ j ξ j ≥ 0 into the objective, we get: Substituting The hinge loss is defined as This is called regularization ; This part is empirical risk minimization, used to prevent overfitting! using the hinge loss

Hinge loss vs. 0/1 loss Hinge loss: 1 0-1 Loss: 0 1 Hinge loss upper bounds 0/1 loss!

How to deal with imbalanced data? • In many practical applications we may have imbalanced data sets • We may want errors to be equally distributed between the positive and negative classes • A slight modification to the SVM objective does the trick! Class-specific weighting of the slack variables

How do we do multi-class classification?

One versus all classification w + Learn 3 classifiers: w - • - vs {o,+}, weights w - • + vs {o,-}, weights w + • o vs {+,-}, weights w o w o Predict label using: Any problems? Could we learn this dataset? 

Multi-class SVM w + Simultaneously learn 3 sets of weights: w - • How do we guarantee the correct labels? w o • Need new constraints! The “score” of the correct class must be better than the “score” of wrong classes:

Multi-class SVM As for the SVM, we introduce slack variables and maximize margin: To predict, we use: Now can we learn it? 

Software • SVM light : one of the most widely used SVM packages. Fast optimization, can handle very large datasets, C++ code. • LIBSVM (used within Python’s scikit-learn) • Both of these handle multi-class, weighted SVM for imbalanced data, etc. • There are several new approaches to solving the SVM objective that can be much faster: – Stochastic subgradient method (up next!) – Distributed computation • See http://mloss.org, “machine learning open source software”

PEGASOS [ICML 2007] Primal Efficient sub-GrAdient SOlver for SVM The Hebrew University Shai Shalev-Shwartz Jerusalem, Israel Yoram Singer Nati Srebro

Support Vector Machines QP form: More “natural” form: Regularization Empirical loss term

PEGASOS A_t = S |A_t| = 1 Subgradient method Stochastic gradient 1 Subgradient Projection

Run-Time of Pegasos • Choosing |A t |=1  Run-time required for Pegasos to find ε accurate solution w.p. ¸ 1- δ • Run-time does not depend on #examples • Depends on “difficulty” of problem ( λ and ε )

Experiments • 3 datasets (provided by Joachims) – Reuters CCAT (800K examples, 47k features) – Physics ArXiv (62k examples, 100k features) – Covertype (581k examples, 54 features) Pegasos SVM-Perf SVM-Light 2 77 20,075 Reuters Training Time (in seconds): 6 85 25,514 Covertype 2 5 80 Astro-Physics

What’s Next! • Learn one of the most interesting and exciting recent advancements in machine learning – The “kernel trick” – High dimensional feature spaces at no extra cost! • But first, a detour – Constrained optimization!

Support vector machines Lecture 4 David Sontag New York University - PowerPoint PPT Presentation

Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Allowing for slack: Soft margin SVM w . x + b = +1 w . x + b = 0 w . x + b = -1 + C j j -

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

The Role of Information and Communication Technologies in the Development of Inclusive Society for

CPSC 490 DP Part 2: Max-IS on Arrays, LCS, Recovery, and Binary Exponentiation Lucca Siaudzionis

Graphs Part I: Basic algorithms Laura Toma Algorithms (csci2200), Bowdoin College Part I: Basic

Information Effects for Understanding Type Systems Or: how someone else found the maths to

ACIP COVID-19 Vaccines Work Group Dr. Beth Bell, Work Group Chair October 30, 2020 For more

Support vector machines (SVMs) Lecture 6 David Sontag New York University Slides adapted from

Chinese government (Picture credit: Reuters.) is under attack from terrorists in Hong Kong. How

19 th XBRL International Conference Reducing reporting burden with XBRL: a catalyst for better

Support vector machines Lecture 4 David Sontag New York University - PowerPoint PPT Presentation

Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Allowing for slack: Soft margin SVM w . x + b = +1 w . x + b = 0 w . x + b = -1 + C j j -

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

The Role of Information and Communication Technologies in the Development of Inclusive Society for

CPSC 490 DP Part 2: Max-IS on Arrays, LCS, Recovery, and Binary Exponentiation Lucca Siaudzionis

Graphs Part I: Basic algorithms Laura Toma Algorithms (csci2200), Bowdoin College Part I: Basic

Information Effects for Understanding Type Systems Or: how someone else found the maths to

ACIP COVID-19 Vaccines Work Group Dr. Beth Bell, Work Group Chair October 30, 2020 For more

Support vector machines (SVMs) Lecture 6 David Sontag New York University Slides adapted from

Chinese government (Picture credit: Reuters.) is under attack from terrorists in Hong Kong. How

19 th XBRL International Conference Reducing reporting burden with XBRL: a catalyst for better

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David