BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick - PowerPoint PPT Presentation

Photo by Arthur Gretton, CMU Machine Learning Protestors at G20 BBM406 Fundamentals of   Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss Support Vector Regression Aykut Erdem // Hacettepe University // Fall 2019

Administrative Deadlines are much closer than they appear • Project progress reports are   on syllabus due soon!   Due: December 22, 2019 (11:59pm) Each group should submit a project progress report by December 22, 2018. The report should be 3-4 pages and should describe the following points as clearly as possible: • Problem to be addressed. Give a short description of the problem that you will explore. Explain why you find it interesting. • Related work. Briefly review the major works related to your research topic. • Methodology to be employed. Describe the neural architecture that is expected to form the basis of the project. State whether you will extend an existing method or you are going to devise your own approach. • Experimental evaluation. Briefly explain how you will evaluate your results. State which dataset(s) you will employ in your evaluation. Provide your preliminary results (if any). 2

Last time… Soft-margin Classifier h w, x i + b � 1 h w, x i + b  � 1 minimum error separator is impossible Theorem (Minsky & Papert)   slide by Alex Smola Finding the minimum error separating hyperplane is NP hard

Last time… Adding Slack Variables ξ i ≥ 0 h w, x i + b � 1 � ξ h w, x i + b  � 1 + ξ minimize amount of slack slide by Alex Smola Convex optimization problem

Last time… Adding Slack Variables • for point is between the margin and correctly 0 < ξ ≤ 1 classified • for point is misclassified ξ i ≥ 0 h w, x i + b � 1 � ξ h w, x i + b  � 1 + ξ adopted from Andrew Zisserman minimize amount of slack Convex optimization problem

      Last time… Adding Slack Variables • Hard margin problem 1 2 k w k 2 subject to y i [ h w, x i i + b ] � 1 minimize w,b • With slack variables   1 2 k w k 2 + C X minimize ξ i w,b i subject to y i [ h w, x i i + b ] � 1 � ξ i and ξ i � 0 Problem is always feasible. Proof:   (also yields upper bound) w = 0 and b = 0 and ξ i = 1 slide by Alex Smola

      Soft-margin classifier • Optimization problem:   1 2 k w k 2 + C X minimize ξ i w,b i subject to y i [ h w, x i i + b ] � 1 � ξ i and ξ i � 0 C is a regularization parameter: • small C allows constraints to be easily ignored   → large margin • large C makes constraints hard to ignore   adopted from Andrew Zisserman → narrow margin • C = ∞ enforces all constraints: hard margin

Last time… Multi-class SVM • Simultaneously-learn-3-sets-- w + of-weights:-- w - • How-do-we-guarantee-the-- correct-labels?-- w o • Need-new-constraints!-- The-“score”-of-the-correct-- class-must-be-be?er-than- the-“score”-of-wrong-classes:-- slide by Eric Xing 8

Last time… Multi-class SVM • As#for#the#SVM,#we#introduce#slack#variables#and#maximize#margin:## To predict, we use: Now#can#we#learn#it?### ?  slide by Eric Xing 9

Last time… Kernels • Original data • Data in feature space (implicit) • Solve in feature space using kernels slide by Alex Smola 10

Last time… Quadratic Features Quadratic Features in Quadratic Features in R 2 p ⇣ ⌘ x 2 2 x 1 x 2 , x 2 Φ ( x ) := 1 , 2 Dot Product Dot Product p p D⇣ ⌘ ⇣ 2 ⌘E 2 , h Φ ( x ) , Φ ( x 0 ) i = x 0 2 x 0 1 x 0 2 , x 0 x 2 2 x 1 x 2 , x 2 1 , , 2 1 2 = h x, x 0 i 2 . Insight Insight Trick works for any polynomials of order via Trick works for any polynomials of order d via h x, x 0 i d . slide by Alex Smola 11

Computational E ffi ciency Problem Extracting features can sometimes be very costly. Example: second order features in 1000 dimensions. This leads to 5005 numbers. For higher order polyno- 5 · 10 5 mial features much worse. Solution Solution Don’t compute the features, try to compute dot products implicitly. For some features this works . . . Definition Definition A kernel function k : X ⇥ X ! R is a symmetric function in its arguments for which the following property holds k ( x, x 0 ) = h Φ ( x ) , Φ ( x 0 ) i for some feature map Φ . slide by Alex Smola If k ( x, x 0 ) is much cheaper to compute than Φ ( x ) . . . 12

Last time.. Example kernels Examples of kernels k ( x, x 0 ) h x, x 0 i Linear exp ( � λ k x � x 0 k ) Laplacian RBF � λ k x � x 0 k 2 � � Gaussian RBF exp ( h x, x 0 i + c i ) d , c � 0 , d 2 N Polynomial B 2 n +1 ( x � x 0 ) B-Spline E c [ p ( x | c ) p ( x 0 | c )] Cond. Expectation Simple trick for checking Mercer’s condition Compute the Fourier transform of the kernel and check that it is nonnegative. slide by Alex Smola 13

Today • The Kernel Trick for SVMs • Risk and Loss • Support Vector Regression 14

The Kernel Trick for SVMs slide by Alex Smola

    The Kernel Trick for SVMs • Linear soft margin problem 1 2 k w k 2 + C X minimize ξ i w,b i subject to y i [ h w, x i i + b ] � 1 � ξ i and ξ i � 0 • Dual problem   � 1 X X maximize α i α j y i y j h x i , x j i + α i 2 α i,j i X subject to α i y i = 0 and α i 2 [0 , C ] i • Support vector expansion slide by Alex Smola X f ( x ) = α i y i h x i , x i + b i

    The Kernel Trick for SVMs • Linear soft margin problem 1 2 k w k 2 + C X minimize ξ i w,b i subject to y i [ h w, φ ( x i ) i + b ] � 1 � ξ i and ξ i � 0 • Dual problem   − 1 X X maximize α i α j y i y j k ( x i , x j ) + α i 2 α i,j i X subject to α i y i = 0 and α i ∈ [0 , C ] i • Support vector expansion slide by Alex Smola X f ( x ) = α i y i k ( x i , x ) + b i

C=1 slide by Alex Smola

C=1 y = 1 support   vectors y = -1 y=0 support   vectors slide by Alex Smola

And now with a narrower kernel slide by Alex Smola

slide by Alex Smola

And now with a very wide kernel slide by Alex Smola

slide by Alex Smola

Nonlinear Separation • Increasing C allows for more nonlinearities • Decreases number of errors • SV boundary need not be contiguous slide by Alex Smola • Kernel width adjusts function class

Overfitting? • Huge feature space with kernels: should we worry about overfitting?   • SVM objective seeks a solution with large margin - Theory says that large margin leads to good generalization (we will see this in a couple of lectures)   • But everything overfits sometimes!!!   • Can control by: - Setting C - Choosing a better Kernel - Varying parameters of the Kernel (width of Gaussian, slide by Alex Smola etc.) 55

56 Risk and Loss slide by Alex Smola

          Loss function point of view • Constrained quadratic program   1 2 k w k 2 + C X minimize ξ i w,b i subject to y i [ h w, x i i + b ] � 1 � ξ i and ξ i � 0 • Risk minimization setting   1 2 k w k 2 + C X minimize max [0 , 1 � y i [ h w, x i i + b ]] w,b i empirical risk Follows from finding minimal slack variable slide by Alex Smola for given ( w,b ) pair.

Soft margin as proxy for binary • Soft margin loss max(0 , 1 − yf ( x )) • Binary loss { yf ( x ) < 0 } convex upper bound binary loss function margin slide by Alex Smola

More loss functions h 1 + e − f ( x ) i • Logistic log • Huberized loss  0 if f ( x ) > 1   2 (1 − f ( x )) 2 1 if f ( x ) ∈ [0 , 1] 1  2 − f ( x ) if f ( x ) < 0  • Soft margin (asymptotically) linear max(0 , 1 − f ( x )) (asymptotically) 0 slide by Alex Smola

      Risk minimization view • Find function f minimizing classification error R [ f ] := E x,y ∼ p ( x,y ) [ { yf ( x ) > 0 } ] • Compute empirical average   m R emp [ f ] := 1 X { y i f ( x i ) > 0 } m i =1 − Minimization is nonconvex − Overfitting as we minimize empirical error • Compute convex upper bound on the loss • Add regularization for capacity control   regularization m R reg [ f ] := 1 X max(0 , 1 − y i f ( x i )) + λ Ω [ f ] slide by Alex Smola m i =1 how to control ƛ

Support Vector   Regression 61

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick - PowerPoint PPT Presentation

Photo by Arthur Gretton, CMU Machine Learning Protestors at G20 BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss Support Vector Regression Aykut Erdem // Hacettepe University // Fall 2019

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

US EPA Superfund Optimization: Progress and Outcomes A Webinar from the Federal Remediation

Session 8: Project Communications Management Jim Bryant, PMP July 23, 2019 Project Management

Managing My Way to Project Management @buritica I'm not a people gardener Leaders drive their

SoberIT Software Business and Engineering Institute T-76.612 Software Project Management -

Mon., 30 Nov. 2015 Project progress reports due today! Schedule for rest of semester:

Research Performance Progress Report (RPPR) Best Practices Contents Best practices Roles &

Procrastination Quotes You may delay, but time will not. Someday is not a day of the

Workshop for Mid-Career Faculty Pathways to Promotion Friday, September 23 rd , 2016 Pathways to

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick - PowerPoint PPT Presentation

Photo by Arthur Gretton, CMU Machine Learning Protestors at G20 BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss Support Vector Regression Aykut Erdem // Hacettepe University // Fall 2019

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

US EPA Superfund Optimization: Progress and Outcomes A Webinar from the Federal Remediation

Session 8: Project Communications Management Jim Bryant, PMP July 23, 2019 Project Management

Managing My Way to Project Management @buritica I'm not a people gardener Leaders drive their

SoberIT Software Business and Engineering Institute T-76.612 Software Project Management -

Mon., 30 Nov. 2015 Project progress reports due today! Schedule for rest of semester:

Research Performance Progress Report (RPPR) Best Practices Contents Best practices Roles &amp;

Procrastination Quotes You may delay, but time will not. Someday is not a day of the

Workshop for Mid-Career Faculty Pathways to Promotion Friday, September 23 rd , 2016 Pathways to

Research Performance Progress Report (RPPR) Best Practices Contents Best practices Roles &