Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Maximum Margin Multiple Classes Regression Example Summary Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support Vector Machines 1 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Outline Introduction 1 Maximum Margin Classifiers 2 Multi-Class SVM’s 3 The regression case 4 Small Example 5 Summary 6 Henrik I. Christensen (RIM@GT) Support Vector Machines 2 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Introduction Last time we talked about Kernels and Memory Based Models Estimate the full GRAM matrix can pose a major challenge Desirable to store only the “relevant” data Two possible solutions discussed Support Vector Machines (Vapnik, et al.) 1 Relevance Vector Machines 2 Main difference in how posterior probabilities are handled Small robotics example at end to show SVM performance Henrik I. Christensen (RIM@GT) Support Vector Machines 3 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Maximum Margin Classifiers - Preliminaries Lets initially consider a linear two-class problems y ( x ) = w T φ ( x ) + b with φ ( . ) being a feature space transformation and b is the bias factor Given a training dataset x i , i ∈ { 1 ... N } Target values t i , i ∈ { 1 ... N } , t i ∈ {− 1 , 1 } Assume for now that there is a linear solution to the problem Henrik I. Christensen (RIM@GT) Support Vector Machines 5 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary The objective The objective here is to optimize the margin Let’s just keep the points at the margin y = 1 y = − 1 y = 0 y = 0 y = − 1 y = 1 margin Henrik I. Christensen (RIM@GT) Support Vector Machines 6 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Recap distances and metrics x 2 y > 0 y = 0 R 1 y < 0 R 2 x w y ( x ) � w � x ⊥ x 1 − w 0 � w � Henrik I. Christensen (RIM@GT) Support Vector Machines 7 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary The objective function We know that y ( x ) and t are supposed to have the same sign so that y ( x ) t > 0, i.e. = t n ( w T φ ( x n ) + b ) t n y ( x n ) || w || || w || The solution is then � 1 �� t n ( w T φ ( x n ) + b ) arg max || w || min n w , b We can scale w and b without loss of generality. Scale parameters to make the key vector points � � w T φ ( x n ) + b = 1 t n Then for all data points it is true � � w T φ ( x n ) + b t n ≥ 1 Henrik I. Christensen (RIM@GT) Support Vector Machines 8 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Parameter estimation We need to optimize || w || − 1 which can be seen as minimizing || w || 2 subject to the margin requirements In Lagrange terms this is then N L ( w , b , a ) = 1 2 || w || 2 − � � � � � w T φ ( x n ) + b a n t n − 1 n =1 Analyzing partial derivatives gives us N � = a n t n φ ( x n ) w n =1 N � 0 = a n t n n =1 Henrik I. Christensen (RIM@GT) Support Vector Machines 9 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Parameter estimation Eliminating w and b from the objective function we have N N N a n − 1 � � � L ( a ) = a n a m t n t m k ( x n , x m ) 2 n =1 n =1 m =1 This is a quadratic optimization problem - see in a minute We can evaluate new points using the form N � y ( x ) = a n t n k ( x , x n ) n =1 Henrik I. Christensen (RIM@GT) Support Vector Machines 10 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Estimation of the bias Once w has been estimated we can use that for estimation of the bias � � b = 1 � � t n − a m t m k ( x n , x m ) N S n ∈ S m ∈ S Henrik I. Christensen (RIM@GT) Support Vector Machines 11 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Illustrative Synthetic Example Henrik I. Christensen (RIM@GT) Support Vector Machines 12 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Status We have formulated the objective function Still not clear how we will solve it! We have assumed the classes are separable How about more messy data? Henrik I. Christensen (RIM@GT) Support Vector Machines 13 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Overlapping class distributions Assume some data cannot be correctly classified Lets define a margin distance ξ n = | t n − y ( x n ) | Consider ξ < 0 - correct classification 1 ξ = 0 - at the margin / decision boundary 2 ξ ∈ [0; 1] between decision boundary and margin 3 ξ ∈ [1; 2] between margin and other boundary 4 ξ > 2 - the point is definitely misclassified 5 Henrik I. Christensen (RIM@GT) Support Vector Machines 14 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Overlap in margin y = − 1 y = 0 y = 1 ξ > 1 ξ < 1 ξ = 0 ξ = 0 Henrik I. Christensen (RIM@GT) Support Vector Machines 15 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Recasting the problem Optimizing not just for w but also for misclassification So we have N ξ n + 1 � C 2 || w || n =1 where C is a regularization coefficient. We have a new objective function N N N L ( w , b , a ) = 1 2 || w || 2 + C � � � ξ n − a n { t n y ( x n ) − 1 + ξ n }− µ n ξ n n +1 n =1 n =1 where a and µ are Lagrange multipliers Henrik I. Christensen (RIM@GT) Support Vector Machines 16 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Optimization As before we can derivate partial derivatives and find the extrema. The resulting objective function is then N N N a n − 1 � � � L ( a ) = a n a m t n t m k ( x n , x m ) 2 n =1 n =1 m =1 which is like before bit the constraints are a little different 0 ≤ a n ≤ C and � N n =1 a n t n = 0 which is across all training samples Many training samples will have a n = 0 which is the same as saying they are not at the margin. Henrik I. Christensen (RIM@GT) Support Vector Machines 17 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Generating a solution Solutions are generated through analysis of all training date Re-organization enable some optimization (Vapnik, 1982) Sequential minimal optimization is a common approach (Platt, 2000) Considers pairwise interaction between Lagrange multipliers Complexity is somewhere between linear and quadratic Henrik I. Christensen (RIM@GT) Support Vector Machines 18 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Mixed example 2 0 −2 −2 0 2 Henrik I. Christensen (RIM@GT) Support Vector Machines 19 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Multi-Class SVMs This far the discussion has been for the two-class problem How to extend to K classes? One versus the rest 1 Hierarchical Trees - One vs One 2 Coding the classes to generate a new problem 3 Henrik I. Christensen (RIM@GT) Support Vector Machines 21 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary One versus the rest Training for each class with all the others serving as the non-class training samples Typically training is skewed - too few positives compared to negatives Better fit for the negatives The one vs all implies extra complexity in training ≈ K 2 Henrik I. Christensen (RIM@GT) Support Vector Machines 22 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Tree classifier Organize the problem as a tree selection Best first elimination - select easy cases first Based on pairwise comparison of classes. Still requires extra comparison of K 2 classes Henrik I. Christensen (RIM@GT) Support Vector Machines 23 / 42

Introduction Maximum Margin Multiple Classes Regression Example Summary Coding new classes Considering optimization of an error coding How to minimize the criteria function to minimize errors Considered a generalization of voting based strategy Poses a larger training challenge Henrik I. Christensen (RIM@GT) Support Vector Machines 24 / 42

Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Maximum Margin Multiple Classes Regression Example Summary Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Sensitivity study of at the Belle II experiment Outline Michel Hernndez

Why Majority Rule Does Transitions Between . . . Not Work in Quantum States of Several . . .

Open Meetings Frayda S. Bluestein School of Government UNC Chapel Hill Overview What

List Order Maintenance E E B B H H D D I I C C F F G G A A Insert(D,I) Build

Annual Meeting December 5, 2017 Forward Looking Statements This document contains certain

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

No conflicts to disclose Margot Kushel, MD Professor of Medicine, UCSF @mkushel 3/1/2018 2

2019 Budget Adoption Managing Member Meeting November 13 th , 2018 11/7/2018 1 Agenda

Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Maximum Margin Multiple Classes Regression Example Summary Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning Theory CS 446 1. SVM risk SVM risk Consider the empirical and true/population

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Sensitivity study of at the Belle II experiment Outline Michel Hernndez

Why Majority Rule Does Transitions Between . . . Not Work in Quantum States of Several . . .

Open Meetings Frayda S. Bluestein School of Government UNC Chapel Hill Overview What

List Order Maintenance E E B B H H D D I I C C F F G G A A Insert(D,I) Build

Annual Meeting December 5, 2017 Forward Looking Statements This document contains certain

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

No conflicts to disclose Margot Kushel, MD Professor of Medicine, UCSF @mkushel 3/1/2018 2

2019 Budget Adoption Managing Member Meeting November 13 th , 2018 11/7/2018 1 Agenda

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT