E9 205 Machine Learning for Signal Procesing Support Vector Machines - PowerPoint PPT Presentation

E9 205 Machine Learning for Signal Procesing Support Vector Machines 9-10-2019

Linear Classifiers x f y est w x + b>0 0 = denotes +1 b + x denotes -1 w How would you classify this data? w x + b<0 “SVM and applications”, Mingyue Tan. Univ of British Columbia

Linear Classifiers f x y est denotes +1 w x + b>0 denotes -1 w x + b=0 How would you classify this data? w x + b<0 “SVM and applications”, Mingyue Tan. Univ of British Columbia

Linear Classifiers f x y est denotes +1 denotes -1 How would you classify this data? “SVM and applications”, Mingyue Tan. Univ of British Columbia

Linear Classifiers f x y est denotes +1 denotes -1 Any of these would be fine.. ..but which is best? “SVM and applications”, Mingyue Tan. Univ of British Columbia

Linear Classifiers f x y est denotes +1 denotes -1 How would you classify this data? Misclassified to +1 class “SVM and applications”, Mingyue Tan. Univ of British Columbia

Linear Classifiers f x y est denotes +1 denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Maximum Margin f x y est 1. Maximizing the margin is good f ( x , w ,b) = sign( w x + b) according to intuition denotes +1 2. Implies that only support vectors are The maximum denotes -1 important; other training examples are margin linear ignorable. classifier is the 3. Empirically it works very very well. linear classifier Support Vectors with the, um, are those data maximum margin. points that the margin pushes up This is the simplest against kind of SVM (Called an LSVM) Linear SVM

Non-linear SVMs ■ Datasets that are linearly separable with some noise work out great: x 0 ■ But what are we going to do if the dataset is just too hard? x 0 ■ How about… mapping data to a higher-dimensional space: x 2 0 x

Non-linear SVMs: Feature spaces ■ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ : x → φ ( x )

The “ Kernel Trick ” The linear classifier relies on dot product between vectors k (x i ,x j )= x i T x j ■ If every data point is mapped into high-dimensional space via some ■ transformation Φ : x → φ ( x ), the dot product becomes: k ( x i ,x j )= φ ( x i ) T φ ( x j ) A kernel function is some function that corresponds to an inner product in ■ some expanded feature space. Example: ■ 2-dimensional vectors x=[ x 1 x 2 ]; let k ( x i ,x j )=(1 + x iT x j ) 2, Need to show that K ( x i ,x j )= φ (x i ) T φ (x j ) : k ( x i ,x j )=(1 + x iT x j ) 2, = 1+ x i12 x j12 + 2 x i1 x j1 x i2 x j2 + x i22 x j22 + 2 x i1 x j1 + 2 x i2 x j2 = [1 x i12 √ 2 x i1 x i2 x i22 √ 2 x i1 √ 2 x i2 ] T [1 x j12 √ 2 x j1 x j2 x j22 √ 2 x j1 √ 2 x j2 ] = φ (x i ) T φ (x j ), where φ (x) = [1 x 12 √ 2 x 1 x 2 x 22 √ 2 x 1 √ 2 x 2 ]

What Functions are Kernels? ■ For many functions k ( x i ,x j ) checking that k ( x i ,x j )= φ (x i ) T φ (x j ) can be cumbersome. ■ Mercer’s theorem: Every semi-positive definite symmetric function is a kernel ■ Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix: k ( x 1 , x 1 ) k ( x 1 , x 2 ) k ( x 1 , x 3 ) … k ( x 1 , x N ) k ( x 2 , x 1 ) k ( x 2 , x 2 ) k ( x 2 , x 3 ) k ( x 2 , x N ) K = … … … … … k ( x N , x 1 ) k ( x N , x 2 ) k ( x N , x 3 ) … k ( x N , x N )

Examples of Kernel Functions ■ Linear: k ( x i , x j )= x i T x j ■ Polynomial of power p : k ( x i ,x j )= (1+ x i T x j ) p ■ Gaussian (radial-basis function network): ■ Sigmoid: k ( x i , x j )= tanh( β 0 x i T x j + β 1 )

SVM Formulation ❖ Goal - 1) Correctly classify all training data 2) Define the Margin 3) Maximize the Margin ❖ Equivalently written as such that

Solving the Optimization Problem Need to optimize a quadratic function subject to linear constraints. ■ Quadratic optimization problems are a well-known class of ■ mathematical programming problems, and many (rather intricate) algorithms exist for solving them. The solution involves constructing a dual problem where a Lagrange ■ multiplier is associated with every constraint in the primary problem: The dual problem in this case is maximized ■ Find such that maximized and ,

Solving the Optimization Problem ■ The solution has the form: ■ Each non-zero a n indicates that corresponding x n is a support vector. Let S denote the set of support vectors. ■ And the classifying function will have the form:

Solving the Optimization Problem

Visualizing Gaussian Kernel SVM

Overlapping class boundaries ■ The classes are not linearly separable - Introducing slack variables ■ Slack variables are non-negative ■ They are defined using ■ The upper bound on mis-classification ■ The cost function to be optimized in this case

SVM Formulation - overlapping classes Formulation very similar to previous case except for ■ additional constraints ■ Solved using the dual formulation - sequential minimal optimization algorithm ■ Final classifier is based on the sign of

Overlapping class boundaries

E9 205 Machine Learning for Signal Procesing Support Vector Machines - PowerPoint PPT Presentation

E9 205 Machine Learning for Signal Procesing Support Vector Machines 9-10-2019 Linear Classifiers x f y est w x + b>0 0 = denotes +1 b + x denotes -1 w How would you classify this data? w x + b<0 SVM and applications,

E9 205 Machine Learning for Signal Procesing Deep Learning for Audio and Vision 20-11-2019

I-205 SB Closed at X Johnson Creek Blvd I-205 SB Detour Route: Johnson Creek Blvd WB to OR213

Approximation schemes for machine scheduling with resource (in-)dependent processing times Klaus

E9 205 Machine Learning for Signal Processing Feature Extraction 08-08-2016 Recap Real-world

E9 205 Machine Learning for Signal Processing Introduction to Machine Learning of Sensory Signals

E9 205 Machine Learning for Signal Processing Introduction to Machine Learning of Sensory Signals

E9 205 Machine Learning for Signal Processing Introduction to Machine Learning of Sensory Signals

E9 205 Machine Learning for Signal Processing Introduction to Machine Learning of Sensory Signals

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

E9 205 Machine Learning for Signal Processing Non-negative Matrix Factorization 16-09-2019 Audio

E9 205 Machine Learning for Signal Processing Linear Predictive Analysis 22-08-2016 Linear

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

E9 205 Machine Learning for Signal Processing Dimensionality Reduction - I 21-08-2019 Instructor

E9 205: Machine Learning for Signal Processing Introduction to 16-10-2019 Neural Network Models

E9 205 Machine Learning for Signal Processing 23-8-17 Outline Basics for Image Processing

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview

Test Case Software

ADVANCED MACHINE LEARNING Caveats and Techniques to Deal with Imbalanced Datasets (Adapted from

Lecture 9: Logistic Regression Discriminative vs. Generative Classification Aykut Erdem

Kindergarten students to another location). Overflow does become an added cost to the district

Class 15: Calculation of natural frequency Class 15: Calculation of natural frequency Old Slide

The Short Introduction to Imbalanced Classification Zeyu Qin 07.02.2020 Overview Reference