Machine Learning Support Vector Machines Rui Xia T ext M ining - PowerPoint PPT Presentation

Machine Learning Support Vector Machines Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn

Outline • Maximum Margin Linear Classifier • Duality Optimization • Soft-margin SVM • Kernel Functions • *Sequential Minimal Optimization • The Usage of SVM Toolkits Machine Learning Course, NJUST 2

Maximum Margin Linear Classifier Machine Learning Course, NJUST 3

Recall Previous Linear Classifier • Perceptron Criterion • Cross Entropy Criterion (Logistic Regression) • Least Mean Square (LMS) Criterion • … Which linear hyper-plane Which learning criterion to is better? choose? Machine Learning Course, NJUST 4

Maximum Margin Criterion Machine Learning Course, NJUST 5

Distance from Point to Hyper-plane • Linear Model • Hyper-plane • Distance (positive side) Machine Learning Course, NJUST 6

Geometric Distance & Functional Distance • Distance (negative side) • Geometric distance (uniform expression) • Functional distance Machine Learning Course, NJUST 7

Parameter Scaling • Scaling the parameter by a scale factor • Geometric margin: independent of the scale factor • Functional margin: proportional to the scale factor Machine Learning Course, NJUST 8

Maximum Margin Criterion • Formulation 1 Machine Learning Course, NJUST 9

Maximum Margin Criterion • Scaling Constraint • In this constraint Machine Learning Course, NJUST 11

Duality Optimization Machine Learning Course, NJUST 13

Lagrange Multiplier • In case of equality constraint Machine Learning Course, NJUST 14

An Example Machine Learning Course, NJUST 15

Lagrange Multiplier • In case of inequality constraint active inactive Machine Learning Course, NJUST 16

An Illustration Machine Learning Course, NJUST 17

Lagrange Multiplier • In case of multiple equality and inequality constraint Stationary Primal feasibility Dual feasibility Complementary condition Karush–Kuhn–Tucker ( KKT) Conditions Machine Learning Course, NJUST 18

Generalized Lagrangian and Duality • Primal Optimization Problem • Generalized Lagrangian Machine Learning Course, NJUST 19

Min-max of Lagrangian = = Machine Learning Course, NJUST 20

Primal Problem & Dual Problem • The primal problem (min-max of Lagrangian) • The dual problem (max-min of Lagrangian) • Max-min vs. Min-max When does the equality hold? Machine Learning Course, NJUST 21

Equivalency of Two Problems • The equality holds when – f and the g i ’s are convex, and the h i ’s are affine; – g i are (strictly) feasible: this means that there exists some w so that g i ( w )<0. • Equivalency of the primal and dual problems = = Dual Problem Primal Problem Machine Learning Course, NJUST 22

Karush-Kuhn-Kucker (KKT) Conditions • Furthermore, the solution of the primal and dual problems satisfy the KKT conditions: Stationary Primal feasibility Complementary condition Primal feasibility Dual feasibility Sufficient and Necessary Condition Machine Learning Course, NJUST 23

Lagrangian for SVM • The optimization problem of SVM • The Lagrangian Machine Learning Course, NJUST 24

Minimization of the Lagrangian • Take the derivative of the Lagrangian • Plug back into the Lagrangian Machine Learning Course, NJUST 25

Dual Problem of SVM • Dual Problem Guarantee that the KKT conditions are satisfied. Machine Learning Course, NJUST 26

Why “Support Vector”? • Decision function • KKT conditions Machine Learning Course, NJUST 27

The Value of the Bias is a positive support vector is a negative support vector Machine Learning Course, NJUST 28

One Remaining Problem • Decision function How to compute alpha? • Dual Problem of SVM How to solve the dual optimization problem? Machine Learning Course, NJUST 29

Soft-margin SVM Machine Learning Course, NJUST 30

Linearly Non-separable Case Linearly separable Linearly non-separable Machine Learning Course, NJUST 31

Soft Margin Criterion Maximum margin Soft margin Machine Learning Course, NJUST 32

Three Types of Slacks Machine Learning Course, NJUST 33

Lagrangian for Soft-margin SVM • Recall the equivalency of the primal and dual problems = = Dual Problem Primal Problem • Lagrangian form Machine Learning Course, NJUST 34

Dual Problem for Soft-margin SVM • Gradient • Plug back into the Lagrangian Machine Learning Course, NJUST 35

Maximum-margin SVM vs. Soft-margin SVM • Maximum-margin SVM • Soft-margin SVM Machine Learning Course, NJUST 36

KKT Complementarity Condition • Two KKT complementarity conditions • Some useful conclusions Machine Learning Course, NJUST 37

Slacks and Support Vectors Machine Learning Course, NJUST 38

Kernel Functions Machine Learning Course, NJUST 39

Low-dimensional-nonseparable to Higher-dimensional-separable Machine Learning Course, NJUST 40

From low dimension to higher dimension • Feature Space Mapping: from Low-dimensional-nonseparable to Higher-dimensional-separable Machine Learning Course, NJUST 41

Kernel Functions • Definition: Product of higher feature space • An example Machine Learning Course, NJUST 42

SVM in Higher-dimensional Feature Space • Decision function • Training process Machine Learning Course, NJUST 43

Kernel Trick in SVM • Kernel Trick in SVM – Sometimes it’s hard to know the exact projection function, but relatively easy to know the Kernel function – In SVM, all of the calculations of feature vectors are in the form of product – Therefore, we only need to know the Kernel function used in SVM, but without the need to know the exact projection function. Machine Learning Course, NJUST 44

Mercer Condition • Kernel matrix – For any finite set of points – Element of kernel matrix • A valid kernel satisfies – Symmetric – Positive semi-definite • Mercer theorem Machine Learning Course, NJUST 45

Common Kernel Functions • Linear kernel • Polynomial kernel • Gaussian kernel • Sigmoid kernel, pyramid kernel, string kernel, tree kernel… Machine Learning Course, NJUST 46

Kernel SVM • Training • Decision Machine Learning Course, NJUST 47

Soft-margin Kernel SVM • Training • Decision Machine Learning Course, NJUST 48

Sequential Minimal Optimization Machine Learning Course, NJUST 49

Coordinate Ascent • Consider a unconstrained optimization problem • Coordinate Ascent Algorithm Machine Learning Course, NJUST 50

Coordinate Ascent • An Example Machine Learning Course, NJUST 51

Recall the Dual Problem in SVM • The Dual Optimization Problem • KKT Conditions Machine Learning Course, NJUST 52

Coordinate Ascent in SVM • Choose two coordinates for optimization each time • Which two coordinates to be chosen? Machine Learning Course, NJUST 53

The SMO Algorithm where Machine Learning Course, NJUST 54

The SMO Algorithm • Variable Elimination using Equality Constraint • Optimization by letting the gradient equals zero Machine Learning Course, NJUST 55

The SMO Updating • Make use of • We finally have where Machine Learning Course, NJUST 56

Adding Inequality Constraints • Equality Constraints • Inequality Constraints Machine Learning Course, NJUST 57

Final Updating of Two Multipliers • In case of • In case of • Final Updating Machine Learning Course, NJUST 58

Heuristics to Choose Two Multipliers • First choose a Lagrange multiplier that violate the KKT condition (Osuna Theory) • Second choose a Lagrange multiplier that maximize |E1-E2| Machine Learning Course, NJUST 59

Updating of the Bias • Choose b that makes the KKT conditions hold (when alpha is not at the bounds) • Updating of b Machine Learning Course, NJUST 60

Convergence Condition • Updating of the weights in case of a linear kernel • The problem has been solved, when all the Lagrange multipliers satisfy the KKT conditions (within a user-defined tolerance). Machine Learning Course, NJUST 61

Questions? Machine Learning Course, NJUST

Machine Learning Support Vector Machines Rui Xia T ext M ining - PowerPoint PPT Presentation

Machine Learning Support Vector Machines Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn Outline Maximum Margin Linear Classifier Duality Optimization Soft-margin SVM Kernel

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

An Algorithm for the Approximate and Fast Solution of Linear Complementarity Problems. e Luis

On Perspective Functions, Vanishing Constraints, and Complementarity Programming Fast

Elements of quantum theory from limited information and complementarity Philipp H ohn

A 2-categorical Analysis of Complementary Families and Quantum Key Distribution Krzysztof Bar 1

Assessment of complementarity between wind power and photovoltaic installations to supply

On the local stability of semidefinite relaxations Diego Cifuentes Department of Mathematics

Matrix-based Inductive Theorem Proving Christoph Kreitz Department of Computer Science, Cornell

Singularity Degree of PSD Matrix Completion Shin-ichi Tanigawa CWI and Kyoto July 29, 2016 1 /