Linear Models for Classification Henrik I Christensen Robotics - PowerPoint PPT Presentation

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear Classification 1 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Outline Introduction 1 Linear Discriminant Functions 2 LSQ for Classification 3 Fisher’s Discriminant Method 4 Perceptrons 5 Summary 6 Henrik I Christensen (RIM@GT) Linear Classification 2 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Introduction Last time: prediction of new functional values Today: linear classification of data Basic pattern recognition Separation of data: buy/sell Segmentation of line data, ... Henrik I Christensen (RIM@GT) Linear Classification 3 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Simple Example - Bolts or Needles Henrik I Christensen (RIM@GT) Linear Classification 4 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Classification Given An input vector: X A set of classes: c i ∈ C , i = 1 , . . . , k Mapping m : X → C Separation of space into decision regions Boundaries termed decision boundaries/surfaces Henrik I Christensen (RIM@GT) Linear Classification 5 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Basis Formulation It is a 1-of-K coding problem Target vector: t = (0 , . . . , 1 , . . . , 0) Consideration of 3 different approaches Optimization of discriminant function 1 Bayesian Formulation: p ( c i | x ) 2 Learning & Decision fusion 3 Henrik I Christensen (RIM@GT) Linear Classification 6 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Code for experimentation There are data sets and sample code available NETLAB: http://www.ncrg.aston.ac.uk/netlab/index.php NAVTOOLBOX: http://www.cas.kth.se/toolbox/ SLAM Dataset: http://kaspar.informatik.uni-freiburg.de/ ~slamEvaluation/datasets.php Henrik I Christensen (RIM@GT) Linear Classification 7 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Discriminant Functions Objective: input vector x assigned to a class c i Simple formulation: y ( x ) = w T x + w 0 w is termed a weight vector w 0 is termed a bias Two class example: c 1 if y ( x ) ≥ 0 otherwise c 2 Henrik I Christensen (RIM@GT) Linear Classification 9 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Basic Design Two points on decision surface x a and x b y ( x a ) = y ( x b ) = 0 ⇒ w T ( x a − x b ) = 0 w perpendicular to decision surface w T x || w || = − w 0 || w || Define: ˜ w = ( w 0 , w ) and ˜ x = (1 , x ) so that: w T ˜ y ( x ) = ˜ x Henrik I Christensen (RIM@GT) Linear Classification 10 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Linear discriminant function y > 0 x 2 y = 0 R 1 y < 0 R 2 x w y ( x ) � w � x ⊥ x 1 − w 0 � w � Henrik I Christensen (RIM@GT) Linear Classification 11 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Multi Class Discrimination Generation of multiple decision functions y k ( x ) = w T k x + w k 0 Decision strategy j = arg max i ∈ 1 .. k y i ( x ) Henrik I Christensen (RIM@GT) Linear Classification 12 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Multi-Class Decision Regions R j R i R k x B x A � x Henrik I Christensen (RIM@GT) Linear Classification 13 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Example - Bolts or Needles Henrik I Christensen (RIM@GT) Linear Classification 14 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Minimum distance classification Suppose we have computed the mean value for each of the classes m needle = [0 . 86 , 2 . 34] T and m bolt = [5 . 74 , 5 , 85] T We can then compute the minimum distance d j ( x ) = || x − m j || a rgmin i d i ( x ) is the best fit Decision functions can be derived Henrik I Christensen (RIM@GT) Linear Classification 15 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Bolts / Needle Decision Functions Needle d needle ( x ) = 0 . 86 x 1 + 2 . 34 x 2 − 3 . 10 Bolt d bolt ( x ) = 5 . 74 x 1 + 5 . 85 x 2 − 33 . 59 Decision boundary d i ( x ) − d j ( x ) = 0 d needle / bolt ( x ) = − 4 . 88 x 1 − 3 . 51 x 2 + 30 . 49 Henrik I Christensen (RIM@GT) Linear Classification 16 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Example decision surface Henrik I Christensen (RIM@GT) Linear Classification 17 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Least Squares for Classification Just like we could do LSQ for regression we can perform an approximation to the classification vector C Consider again y k ( x ) = w T k x + w k 0 Rewrite to y ( x ) = ˜ W T ˜ x Assuming we have a target vector T Henrik I Christensen (RIM@GT) Linear Classification 19 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Least Squares for Classification The error is then: W ) = 1 � � E D ( ˜ (˜ X ˜ W − T ) T (˜ X ˜ 2 Tr W − T ) The solution is then � − 1 ˜ � X T ˜ ˜ ˜ X T T W = X Henrik I Christensen (RIM@GT) Linear Classification 20 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary LSQ and Outliers 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Henrik I Christensen (RIM@GT) Linear Classification 21 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Fisher’s linear discriminant Selection of a decision function that maximizes distance between classes Assume for a start y = W T x Compute m 1 and m 2 1 1 m 1 = � i ∈ C 1 x i m 2 = � j ∈ C 2 x j N 1 N 2 Distance: m 2 − m 1 = w T ( m 2 − m 1 ) where m i = wm i Henrik I Christensen (RIM@GT) Linear Classification 23 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary The suboptimal solution 4 2 0 −2 −2 2 6 Henrik I Christensen (RIM@GT) Linear Classification 24 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary The Fisher criterion Consider the expression J ( w ) = w T S B w w T S W w where S B is the between class covariance and S W is the within class covariance, i.e. S B = ( m 1 − m 2 )( m 1 − m 2 ) T and ( x i − m 1 )( x i − m 1 ) T + � � ( x i − m 2 )( x i − m 2 ) T S W = i = C 1 i = C 2 Optimized when ( w T S B w ) S w w = ( w T S W w ) S B w or w ∝ S − 1 w ( m 2 − m 1 ) Henrik I Christensen (RIM@GT) Linear Classification 25 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary The Fisher result 4 2 0 −2 −2 2 6 Henrik I Christensen (RIM@GT) Linear Classification 26 / 33

Introduction Disc. Func. LSQ for Classification Fisher’s Method Perceptrons Summary Generalization to N¿2 Define a stacked weight factor y = W T x The within class covariance generalizes to K � S w = S k k =1 The between class covariance is K � N k ( m k − m )( m k − m ) T S B = k =1 It can be shown that J ( w ) is optimized by the eigenvectors to the equation S = S − 1 W S B Henrik I Christensen (RIM@GT) Linear Classification 27 / 33

Linear Models for Classification Henrik I Christensen Robotics - PowerPoint PPT Presentation

Introduction Disc. Func. LSQ for Classification Fishers Method Perceptrons Summary Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

E9 205 Machine Learning for Signal Processing Probablistic Linear Models 30-09-2019 Linear

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Discriminant Analysis using Logistic Regression OLS1D XL4E: V0D XL4E : OLS1D V0D XL4E : OLS1D V0D

Support vector machines (SVMs) Lecture 5 David Sontag New

Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dvid Pl Yahoo

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with

Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas

The weak Bruhat order on the symmetric group is Sperner Yibo Gao Joint work with: Christian

CS675: Convex and Combinatorial Optimization Fall 2019 Introduction to Matroid Theory

Non-Uniform Stochastic Average Gradient for Training Conditional Random Fields Mark Schmidt, Reza