Linear Models for Classification II Henrik I Christensen Robotics - - PowerPoint PPT Presentation

linear models for classification ii
SMART_READER_LITE
LIVE PREVIEW

Linear Models for Classification II Henrik I Christensen Robotics - - PowerPoint PPT Presentation

Introduction Generative Models Prob. Disc. Models Class Projects Summary Linear Models for Classification II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280


slide-1
SLIDE 1

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Linear Models for Classification II

Henrik I Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I Christensen (RIM@GT) Linear Bayes Classification 1 / 25

slide-2
SLIDE 2

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Outline

1

Introduction

2

Probabilistic Generative Models

3

Probabilistic Discriminative Models

4

Class Projects

5

Summary

Henrik I Christensen (RIM@GT) Linear Bayes Classification 2 / 25

slide-3
SLIDE 3

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Introduction

Recap:

Last time we talked about linear classification as an optimization problem

Today - Bayesian Models for Classification Discussion of possible class projects. Summary

Henrik I Christensen (RIM@GT) Linear Bayes Classification 3 / 25

slide-4
SLIDE 4

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Outline

1

Introduction

2

Probabilistic Generative Models

3

Probabilistic Discriminative Models

4

Class Projects

5

Summary

Henrik I Christensen (RIM@GT) Linear Bayes Classification 4 / 25

slide-5
SLIDE 5

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Probabilistic Generative Models

Objective - p(Ck|x) Modelling using

p(Ck) - the class priors p(x|Ck) - the class conditionals

Think two classes p(C1|x) = p(x|C1)p(C1) p(x|C1)p(C1) + p(x|C2)p(C2)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 5 / 25

slide-6
SLIDE 6

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Sigmoid Formulation

Reformulation p(Ck|x) = 1 1 + e−a = σ(a) where a = ln p(x|C1)p(C1) p(x|C2)p(C2) Logistic Sigmoid, σ(a), defined by σ(a) = 1 1 + e−a Note σ(−a) = 1 − σ(a)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 6 / 25

slide-7
SLIDE 7

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Sigmoid Function

−5 5 0.5 1

Henrik I Christensen (RIM@GT) Linear Bayes Classification 7 / 25

slide-8
SLIDE 8

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Generalization beyond K > 2

Consider p(Ck|x) = p(x|Ck)p(Ck)

  • i p(x|Ci)p(Ci)

= e−ak

  • i eai

where ak = ln (p(x|Ck)p(Ck))

Henrik I Christensen (RIM@GT) Linear Bayes Classification 8 / 25

slide-9
SLIDE 9

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

The case with Normal distributions

Consider a D-dimensional distribution with mean µk and covariance Σ The result would be p(Ck|x) = σ(wTx + w0) where w = Σ−1(µ1 − µ2) w0 = −1 2µT

1 Σ−1µ1 + 1

2µT

2 Σ−1µ2 + ln p(C1)

p(C2)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 9 / 25

slide-10
SLIDE 10

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

The multi class Normal case

For each case ak(x) = wTx + wk0 then wk = Σ−1µk wk0 = −1 2µT

k Σ−1µk + ln p(Ck)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 10 / 25

slide-11
SLIDE 11

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Small multi-class Normal distribution example

−2 −1 1 2 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 11 / 25

slide-12
SLIDE 12

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

The Maximum Likelihood Solution

For two class example with priors (π, 1 − π) Then we have p(xn, C1) = p(C1)p(xn|C1) = πN(xn|µ1, Σ) The joint likelihood function is then p(t|π, µ1, µ2, Σ) =

N

  • i=1

[πN(xi|µ1, Σ)]ti[(1 − π)N(xi|µ2, Σ)]1−ti where ti is the classification result of the i’th sample we can compute the maximum of p(.)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 12 / 25

slide-13
SLIDE 13

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

The Maximum Likelihood Solution (2)

The class probabilities are then π = N1 N1 + N2 the class means are µi = 1 Ni

  • n=1

tnxn and Σ = S = N1 N S1 + N2 N S2 In reality the results are not surprising Could we compute the optimal ML solution directly?

Henrik I Christensen (RIM@GT) Linear Bayes Classification 13 / 25

slide-14
SLIDE 14

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Outline

1

Introduction

2

Probabilistic Generative Models

3

Probabilistic Discriminative Models

4

Class Projects

5

Summary

Henrik I Christensen (RIM@GT) Linear Bayes Classification 14 / 25

slide-15
SLIDE 15

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Probabilistic Discriminative Models

Could we analyze the problem direct rather than through a generative model? I.e. could we perform ML directly on p(Ck|x)? Could involve less parameters!

Henrik I Christensen (RIM@GT) Linear Bayes Classification 15 / 25

slide-16
SLIDE 16

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Logistic Regression

Consider the two class problem. Formulation as a Sigmoid p(C1|φ) = y(φ) = σ(wTφ) then p(C2|φ) = 1 − p(C1|phi) Consider dσ da = σ(1 − σ)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 16 / 25

slide-17
SLIDE 17

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Logistic Regression - II

For a dataset {φn, tn} we have p(t|w) =

N

  • i=1

yti

i {1 − yi}1−ti

Associated error function E(w) = − ln p(t|w) = −

N

  • i=1

{ti ln yi + (1 − ti) ln(1 − yi)} the gradient is then ∇E(w) =

N

  • i=1

(yi − ti)φi

Henrik I Christensen (RIM@GT) Linear Bayes Classification 17 / 25

slide-18
SLIDE 18

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Newton-Raphson Optimization

We want to find an extremum of a function f (.) f (x + ∆x) = f (x) + f ′(x)∆x + 1 2f ′′(x)∆x2 Extremum when ∆x solves: f ′(x) + f ′′(x)∆x = 0 In vector form: xn+1 = xn − [Hf (xn)]−1]∇f (xn)

Henrik I Christensen (RIM@GT) Linear Bayes Classification 18 / 25

slide-19
SLIDE 19

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Iterated reweighted least square

Formulate the optimization problem as w(τ+1) = w(τ) − H−1∇E(w) the gradient and Hessian are given by ∇E(w) = ΦTΦw − ΦTt H = ΦTΦ Solution is “obvious” w(τ+1) = w(τ) − (ΦTΦ)−1 ΦTΦw − ΦTt

  • which results

w = (ΦTΦ)−1ΦTt this is the LSQ solution!

Henrik I Christensen (RIM@GT) Linear Bayes Classification 19 / 25

slide-20
SLIDE 20

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Optimization for the cross-entropy

For the function E(w) ∇E(w) = ΦT(y − t) H = ΦTRΦ where R is a diagonal matrix Rnn = yn(1 − yn) The regression/discrimination is then w(τ+1) = (ΦTRΦ)−1ΦR

  • Φw(τ) − R−1(y − t)
  • Henrik I Christensen (RIM@GT)

Linear Bayes Classification 20 / 25

slide-21
SLIDE 21

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Outline

1

Introduction

2

Probabilistic Generative Models

3

Probabilistic Discriminative Models

4

Class Projects

5

Summary

Henrik I Christensen (RIM@GT) Linear Bayes Classification 21 / 25

slide-22
SLIDE 22

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Class Projects - Examples

Feature integration for robust detection Multi-recognition strategies Comparison of recognition methods Space Categorization Learning of obstacle avoidance strategy

Henrik I Christensen (RIM@GT) Linear Bayes Classification 22 / 25

slide-23
SLIDE 23

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Class Projects - II

Problems:

Novel new “research” - robotics/mobile/manipulation Comparative evaluation Integration of methods

Aspects

Modelling - what is a good/adequate model? What is a good benchmark/evaluation Evaluation of method - alone or in comparison

Teaming

2-3 students in a group

Henrik I Christensen (RIM@GT) Linear Bayes Classification 23 / 25

slide-24
SLIDE 24

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Outline

1

Introduction

2

Probabilistic Generative Models

3

Probabilistic Discriminative Models

4

Class Projects

5

Summary

Henrik I Christensen (RIM@GT) Linear Bayes Classification 24 / 25

slide-25
SLIDE 25

Introduction Generative Models

  • Prob. Disc. Models

Class Projects Summary

Summary

Consideration of a Bayesian formulation for class discrimination For linear systems the LSQ a solution Iterative solutions Discussion of class projects

Henrik I Christensen (RIM@GT) Linear Bayes Classification 25 / 25