Linear Models for Classification II Henrik I Christensen Robotics - PowerPoint PPT Presentation

Introduction Generative Models Prob. Disc. Models Class Projects Summary Linear Models for Classification II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear Bayes Classification 1 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Outline Introduction 1 Probabilistic Generative Models 2 Probabilistic Discriminative Models 3 Class Projects 4 Summary 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 2 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Introduction Recap: Last time we talked about linear classification as an optimization problem Today - Bayesian Models for Classification Discussion of possible class projects. Summary Henrik I Christensen (RIM@GT) Linear Bayes Classification 3 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Probabilistic Generative Models Objective - p ( C k | x ) Modelling using p ( C k ) - the class priors p ( x |C k ) - the class conditionals Think two classes p ( x |C 1 ) p ( C 1 ) p ( C 1 | x ) = p ( x |C 1 ) p ( C 1 ) + p ( x |C 2 ) p ( C 2 ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 5 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Sigmoid Formulation Reformulation 1 p ( C k | x ) = 1 + e − a = σ ( a ) where a = ln p ( x |C 1 ) p ( C 1 ) p ( x |C 2 ) p ( C 2 ) Logistic Sigmoid, σ ( a ), defined by 1 σ ( a ) = 1 + e − a Note σ ( − a ) = 1 − σ ( a ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 6 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Sigmoid Function 1 0.5 0 −5 0 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 7 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Generalization beyond K > 2 Consider p ( x |C k ) p ( C k ) p ( C k | x ) = � i p ( x |C i ) p ( C i ) e − a k = � i e a i where a k = ln ( p ( x |C k ) p ( C k )) Henrik I Christensen (RIM@GT) Linear Bayes Classification 8 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary The case with Normal distributions Consider a D-dimensional distribution with mean µ k and covariance Σ The result would be p ( C k | x ) = σ ( w T x + w 0 ) where Σ − 1 ( µ 1 − µ 2 ) w = − 1 1 Σ − 1 µ 1 + 1 2 Σ − 1 µ 2 + ln p ( C 1 ) 2 µ T 2 µ T w 0 = p ( C 2 ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 9 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary The multi class Normal case For each case a k ( x ) = w T x + w k 0 then Σ − 1 µ k = w k − 1 k Σ − 1 µ k + ln p ( C k ) 2 µ T w k 0 = Henrik I Christensen (RIM@GT) Linear Bayes Classification 10 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Small multi-class Normal distribution example 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −2 −1 0 1 2 Henrik I Christensen (RIM@GT) Linear Bayes Classification 11 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary The Maximum Likelihood Solution For two class example with priors ( π, 1 − π ) Then we have p ( x n , C 1 ) = p ( C 1 ) p ( x n |C 1 ) = π N ( x n | µ 1 , Σ) The joint likelihood function is then N � [ π N ( x i | µ 1 , Σ)] t i [(1 − π ) N ( x i | µ 2 , Σ)] 1 − t i p ( t | π, µ 1 , µ 2 , Σ) = i =1 where t i is the classification result of the i ’th sample we can compute the maximum of p ( . ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 12 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary The Maximum Likelihood Solution (2) The class probabilities are then N 1 π = N 1 + N 2 the class means are µ i = 1 � t n x n N i n =1 and Σ = S = N 1 N S 1 + N 2 N S 2 In reality the results are not surprising Could we compute the optimal ML solution directly? Henrik I Christensen (RIM@GT) Linear Bayes Classification 13 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Probabilistic Discriminative Models Could we analyze the problem direct rather than through a generative model? I.e. could we perform ML directly on p ( C k | x )? Could involve less parameters! Henrik I Christensen (RIM@GT) Linear Bayes Classification 15 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Logistic Regression Consider the two class problem. Formulation as a Sigmoid p ( C 1 | φ ) = y ( φ ) = σ ( w T φ ) then p ( C 2 | φ ) = 1 − p ( C 1 | phi ) Consider d σ da = σ (1 − σ ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 16 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Logistic Regression - II For a dataset { φ n , t n } we have N � y t i i { 1 − y i } 1 − t i p ( t | w ) = i =1 Associated error function N � E ( w ) = − ln p ( t | w ) = − { t i ln y i + (1 − t i ) ln(1 − y i ) } i =1 the gradient is then N � ∇ E ( w ) = ( y i − t i ) φ i i =1 Henrik I Christensen (RIM@GT) Linear Bayes Classification 17 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Newton-Raphson Optimization We want to find an extremum of a function f ( . ) f ( x + ∆ x ) = f ( x ) + f ′ ( x )∆ x + 1 2 f ′′ ( x )∆ x 2 Extremum when ∆ x solves: f ′ ( x ) + f ′′ ( x )∆ x = 0 In vector form: x n +1 = x n − [ Hf ( x n )] − 1 ] ∇ f ( x n ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 18 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Iterated reweighted least square Formulate the optimization problem as w ( τ +1) = w ( τ ) − H − 1 ∇ E ( w ) the gradient and Hessian are given by Φ T Φ w − Φ T t ∇ E ( w ) = Φ T Φ = H Solution is “obvious” w ( τ +1) = w ( τ ) − (Φ T Φ) − 1 � � Φ T Φ w − Φ T t which results w = (Φ T Φ) − 1 Φ T t this is the LSQ solution! Henrik I Christensen (RIM@GT) Linear Bayes Classification 19 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Optimization for the cross-entropy For the function E ( w ) Φ T ( y − t ) ∇ E ( w ) = Φ T R Φ H = where R is a diagonal matrix R nn = y n (1 − y n ) The regression/discrimination is then w ( τ +1) = (Φ T R Φ) − 1 Φ R � Φ w ( τ ) − R − 1 ( y − t ) � Henrik I Christensen (RIM@GT) Linear Bayes Classification 20 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Class Projects - Examples Feature integration for robust detection Multi-recognition strategies Comparison of recognition methods Space Categorization Learning of obstacle avoidance strategy Henrik I Christensen (RIM@GT) Linear Bayes Classification 22 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Class Projects - II Problems: Novel new “research” - robotics/mobile/manipulation Comparative evaluation Integration of methods Aspects Modelling - what is a good/adequate model? What is a good benchmark/evaluation Evaluation of method - alone or in comparison Teaming 2-3 students in a group Henrik I Christensen (RIM@GT) Linear Bayes Classification 23 / 25

Introduction Generative Models Prob. Disc. Models Class Projects Summary Summary Consideration of a Bayesian formulation for class discrimination For linear systems the LSQ a solution Iterative solutions Discussion of class projects Henrik I Christensen (RIM@GT) Linear Bayes Classification 25 / 25

Linear Models for Classification II Henrik I Christensen Robotics - PowerPoint PPT Presentation

Introduction Generative Models Prob. Disc. Models Class Projects Summary Linear Models for Classification II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

E9 205 Machine Learning for Signal Processing Probablistic Linear Models 30-09-2019 Linear

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Cutting the dendrogram through permutation tests Dario Bruzzese Domenico Vistocco

CA in a cervical LN. UROTHELIAL CARCINOMA (Prim. or Metastatic site) Challenges: - Poorly

in Wireline Communications Ali Sheikholeslami University of Toronto, Canada ali@ece.utoronto.ca

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Model-Constructing Satisfiability Calculus Dejan Jovanovi c Clark Barrett Leonardo de Moura

Statistical Machine Learning Lecture 09: Classification Kristian Kersting TU Darmstadt Summer

Greedy algorithms: greed is good? Greedy algorithms Greed, for lack of a better word, Coin

Greedy algorithms: greed is good? Greedy algorithms Shortest paths in weighted graphs Greed, for